... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.ai

Awaiting the gospel from Sarah Connor

1,954 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 1,716 of 1,954

amnon.meyers@textanalysis.com to amnon.mey...@textanalysis.com

Re: Computing with Confidence: Much Ado

14 Apr 08 12:32:33

   XPost: comp.ai.nat-lang   
   From: amnon@textanalysis.com   

   On Apr 12, 9:54 pm, "am...@textanalysis.com"   
    wrote:   
   > Subj: Computing with Confidence: Much Ado about Nothing   
   > [snip]   

   Hi,   

   Ok, in a day of prototyping I've answered this question for myself,   
   and in a surprising way (to me, at any rate).   

   Just as Bayesian Statistics looks at positive and negative evidence, a   
   self-scoring IE system can do the following:   

   1. Look for positive evidence for the datum and its correctness.  If   
   confident, give a score of 1/1.   

   2. Else, look for evidence that the datum is MISSING from the document   
   (or garbled or otherwise unfetchable).  If confident, give a score of   
   0/0.   

   3. Else, if neither confidence is above its threshold (and these two   
   thresholds may differ in general), then assign a score of 0/1.   

   I'm applying this logic to a self-scoring information extraction   
   system, testing out with a single, easy slot (a particular numeric   
   field that is required in every form).  The results look great for   
   that, and I'll apply it to tougher slots in the coming days.   

   The realization that the manual NLE building of information extraction   
   systems can dovetail with the Bayesian paradigm may be obvious to   
   some, but is a revelation for me!  With this added tool, I believe   
   that accurate self-scoring systems are within reach, at least with   
   manual or mixed-initiative methods.   

   It would have been nice to have this so clear in the old days of MUCK-   
   I, MUCK-II, MUC-3, and MUC-4 !  Together with the acceleration of a   
   programming environment for NLP, this to my mind largely solves the   
   natural language engineering problem.   

   We can build accurate self-scoring systems without the need for answer   
   keys and with reduced manual labor.  Rather, they'll generate answer   
   keys that we can fix up with a minimal amount of manual editing.   

   I intend to provide a report once more of this system is prototyped.   
   Amnon   

   Amnon Meyers   
   CTO   
   Text Analysis International, Inc   
   http://www.textanalysis.com   

   [ comp.ai is moderated ... your article may take a while to appear. ]   

   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]