... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.ai

Awaiting the gospel from Sarah Connor

1,954 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 1,774 of 1,954

Ted Dunning to Sengly

Re: Seeking suggestion on results re-ran

15 Jun 08 00:13:18

   From: ted.dunning@gmail.com   

   On Jun 13, 7:43 pm, Sengly  wrote:   
   > Dear all,   
   >   
   > I am implementing a search system. I would like to seek your   
   > suggestion on re-ranking methodology. My problem is that I have a set   
   > of resulting documents to a query and each one of them with a matching   
   > score and also a list of relatedness score between each two of them. I   
   > would like to re-rank my resulting documents by downgrading the score   
   > of those near duplicate results. This is in order to keep the results   
   > with precision but also to enable diversity. I would like to know   
   > whether there exists any standard approach to do so?   
   >   
   > Any suggestions or pointers would be highly appreciated.   
   >   
   > Best regards,   
   >   
   > Sengly   

   There are a number of ad hoc methods for doing this.  I think that a   
   better approach is to fold highly similar documents together.   
   Clustered results that have high uniformity can be reviewed quickly by   
   searchers since they can focus or eliminate many documents in a single   
   step.  If the documents in the cluster are sufficiently similar, then   
   little error is introduced by these mass decisions.   

   Another alternative is to not re-order your results list at all, at   
   least initially.  Then allow users to rate the documents that they see   
   and apply their ratings not only to the document rated, but also to   
   very similar documents.  This allows the result list to be re-ordered   
   on the fly based on user input and should provide much the same   
   efficiencies as clustered presentations.  It also preserves what speed   
   you have in presenting the original results so that the user remains   
   well disposed toward your search system.  The cost of re-ranking as   
   users mark documents that they particularly like or dislike or that   
   they save for later consideration is less than clustering, but also   
   having there be some cost in response to a user action is not so bad.   
   You can do the re-ranking using any sort of document classifier that   
   produces reasonably conservative results with small numbers of   
   training algorithms.   

   [ comp.ai is moderated ... your article may take a while to appear. ]   

   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]