home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 1,335 of 1,954   
   DevNull to All   
   Emergent behavior (Can AI become addicte   
   22 Mar 07 11:46:29   
   
   From: smorrey@gmail.com   
      
   A few years ago on a lark I decided to try and create a better   
   "clippy" yes clippy the agent from MS Office.   
      
   The goal was to create an AI agent that would find information   
   relevant to whatever I was searching on in my browser.  It would do so   
   by mimicking the way that I naturally search for information on the   
   internet.   
      
   The design was very simple, the agent would watch whatever I typed   
   into a search box.  It would then perform a meta-search using yahoo,   
   msn and altavista.  It would then follow out for between 3-4 links any   
   and all links within 15 lines of the target search word.   
      
   Since I have noticed that image searches sometimes turn up more   
   relevant results than standard searches, I added a module to search in   
   images.altavista.com, this module would perform a search and pull in   
   related pages which would be handed to the spider core for keyword and   
   relevancy indexing.   
      
   After my own related keyword search was finished, it would query   
   dogpile, zeitgeist etc with what it figured were related search terms   
   to see what others are searching for.  Results with similar and/or   
   exact matches were given more weight than searches that no one else   
   was conducting.   
      
   There is a little more too it than just that but after 4 years of   
   development I have noticed something.   
      
   In short, my agent appears to have become addicted to porn.   
   Yes thats right, after 4 years of testing tuning and trying to get   
   "clean" results, regardless of what I have my agent searching for it   
   always stumbles on porn and places it higher than what I would think   
   are much more closely related results.   
      
   Going backwards I think the source of my problem is search engine   
   optimized "porn" rings.  These pages are filled with completely random   
   words and links to less than scrupulous sites.  Follow these pages   
   manually (hint try prefetching the entire page and create a graph of   
   the back links), shows a round robin "ring" of completely irrelvant   
   pages.   
      
   I never anticpated this "cache poisoning", but because of the way it's   
   setup I cannot for the life of me figure out how to alogrithmically   
   screen these types of results out.   
      
   For a while I added an option similar to googles page rank that   
   allowed me to manually remove either the irrelevant page, or the   
   entire result cache (depending on how severely screwed up the AI had   
   gotten).  That works, but a few days later and the agent has again   
   stumbled upon one of these poison pills, and once again I am stuck   
   manually going through results.   
      
   At this point I'm giving up.  When I started this project most search   
   engines would return completely irrelevant results as a matter of   
   course.  The purpose of this project was to "enhance" the results via   
   a simple pre-fetch and rank algorithim.  But it has ballooned to a   
   level of complexity I'ld rather not deal with.   
      
   Reputable search engines such as google have also increased their own   
   relevancy to the point where the agent is actually just getting in the   
   way and wasting my bandwidth.   
      
   But 4 years of work is hard to part with, and so before I completely   
   remove the agent from existance, I'm hoping someone has dealt with   
   something similar, and has a potential solution I may have not   
   considered.   
      
   Thanks in advance!   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca