Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 1,335 of 1,954    |
|    DevNull to All    |
|    Emergent behavior (Can AI become addicte    |
|    22 Mar 07 11:46:29    |
      From: smorrey@gmail.com              A few years ago on a lark I decided to try and create a better       "clippy" yes clippy the agent from MS Office.              The goal was to create an AI agent that would find information       relevant to whatever I was searching on in my browser. It would do so       by mimicking the way that I naturally search for information on the       internet.              The design was very simple, the agent would watch whatever I typed       into a search box. It would then perform a meta-search using yahoo,       msn and altavista. It would then follow out for between 3-4 links any       and all links within 15 lines of the target search word.              Since I have noticed that image searches sometimes turn up more       relevant results than standard searches, I added a module to search in       images.altavista.com, this module would perform a search and pull in       related pages which would be handed to the spider core for keyword and       relevancy indexing.              After my own related keyword search was finished, it would query       dogpile, zeitgeist etc with what it figured were related search terms       to see what others are searching for. Results with similar and/or       exact matches were given more weight than searches that no one else       was conducting.              There is a little more too it than just that but after 4 years of       development I have noticed something.              In short, my agent appears to have become addicted to porn.       Yes thats right, after 4 years of testing tuning and trying to get       "clean" results, regardless of what I have my agent searching for it       always stumbles on porn and places it higher than what I would think       are much more closely related results.              Going backwards I think the source of my problem is search engine       optimized "porn" rings. These pages are filled with completely random       words and links to less than scrupulous sites. Follow these pages       manually (hint try prefetching the entire page and create a graph of       the back links), shows a round robin "ring" of completely irrelvant       pages.              I never anticpated this "cache poisoning", but because of the way it's       setup I cannot for the life of me figure out how to alogrithmically       screen these types of results out.              For a while I added an option similar to googles page rank that       allowed me to manually remove either the irrelevant page, or the       entire result cache (depending on how severely screwed up the AI had       gotten). That works, but a few days later and the agent has again       stumbled upon one of these poison pills, and once again I am stuck       manually going through results.              At this point I'm giving up. When I started this project most search       engines would return completely irrelevant results as a matter of       course. The purpose of this project was to "enhance" the results via       a simple pre-fetch and rank algorithim. But it has ballooned to a       level of complexity I'ld rather not deal with.              Reputable search engines such as google have also increased their own       relevancy to the point where the agent is actually just getting in the       way and wasting my bandwidth.              But 4 years of work is hard to part with, and so before I completely       remove the agent from existance, I'm hoping someone has dealt with       something similar, and has a potential solution I may have not       considered.              Thanks in advance!              [ comp.ai is moderated ... your article may take a while to appear. ]              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca