home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 170 of 1,954   
   Markus to All   
   Re: Finding an HTML element   
   30 Nov 03 00:29:10   
   
   From: markus-1977@gmx.net   
      
   > I want to build a tool for data mining from an html page. I want the user to   
   > select an element from a web page, and train my application to recognize it   
   > in its later updates. For example, suppose the user wants to extract some   
   > data from a financial. He want to extract his total balance, plus the table   
   > of the last transactions. What he should do is to highlight the elements   
   > inside the html page. After doing that, the application should analyze the   
   > html element structure, and learns how to find it in similar pages (even   
   > when they are not identical). What I really need is an algorithm to   
   > "understand" a single element (by it's structure, position in page or any   
   > other methods), and then I want to look in a new page, and choose the most   
   > similar element (which should probably be the right one).   
      
   Seems you are trying to "learn" a structure, for example a grammar for   
   a pattern language. There are a bunch of algorithms out there that can   
   learn text patterns nicely.   
      
   I've seen something like what you described before, I think it was   
   with the Lexikon Project at DFKI (www.dfki.de). I don't know of any   
   publications out of the top of my head, though.   
      
   Markus   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca