... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.ai
Awaiting the gospel from Sarah Connor
1,954 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 683 of 1,954
Sebastian Stern to All
Re: Data Mining of Preference Orderings
03 Apr 05 02:48:27
   From: sebastianstern@wanadoo.nl   
      
   Dmitry A. Kazakov:   
   | Sebastian Stern:   
   | > For a project of mine, I have recently become interested in Data   
   | > Mining. One common technique in Data Mining (used by on line book   
   | > stores for example) is the mining of Association Rules.  Each   
   | > rule has the form   
   | >   
   | >   A => B   
   | >   
   | > where A and B are sets of objects (e.g., books), and each rule   
   | > can be interpreted as stating "The possession of A implies the   
   | > possession of B".   
   |   
   | Shouldn't it be possession of A implies interest in B? Clearly,   
   | possession   
   | /= interest.   
      
   No, it really should be "the possession of A implies the possession of B".   
   Association rules are defined in terms of possession; it is then assumed   
   that possession means interest, and absense of posession means absense of   
   interest.  This binary modelling of interest is too course-grained for my   
   purposes.  Possession is indeed not the same as the _degree_ of interest;   
   that is the whole reason for me writing my post: I am looking for a way to   
   predict '_degree_ of interest', not the _probability_ of a discrete   
   'yes/no'-kind of interest (as ordinary association rule systems do).   
      
   | > The 'degree of confidence' in a rule is defined as the conditional   
   | > probability that a subject (user) is interested in an objects B under   
   | > the condition that the subject already posesses objects A.  This   
   | > confidence is thus computed using the familiar formula for   
   | > conditional probabilities:   
   | >   
   | >   confidence(A => B) :=  P(B | A) := P(A and B) / P(B)   
   |   
   | If A is a set of books, then P(A) is a probability of what? Are books   
   | random? Maybe P(A) = P(User has A)? Is this random?   
      
   P(A) is the so-called 'degree of support' of an object set: it is the   
   frequency that the set occurs in the data base, i.e., the number of subject   
   owning the object set A, i.e., P(User has A).  (Note that using absolute or   
   relative frequencies does not change the result.)   
      
   | > Such a system does _not_ distinguish between _degrees_ of preference,   
   | > i.e., it does not produce and _ordering_ of preference between   
   | > different objects; and this is the crux of my post.   
   |   
   | Yes, it is the difference between possessing and having an interest in   
   | something, which the model above does not respond to.   
      
   Precisely, and I am looking for a way to predict (degree of) interest.   
      
   | > For input, the system could present two objects at a time and let the   
   | > subject choose which he prefers.  The choice of the subject would   
   | > reflect his relative preference for one of the two objects.  The   
   | > preference relation is a strict ordering relation between objects,   
   | > parametrized on the subject and time (but let us assume that the   
   | > subject's preferences do not change over time).   
   | >   
   | >   O1 <    O2   
   | >       S,t   
   |   
   | I remotely remember a study that shown preference relation is not   
   | transitive. So a person can give sort of answers: O1 < O2 < O3 < O1. The   
   | problem is that the relation is of course fuzzy and when the answer is   
   | forced to certain Boolean, that heavily distorts the result.   
      
   Yes, the preference ordering relation may not always _appear_ to be   
   transitive, but this is not due to fuzzyness; it is due to the fact that   
   preference may change over time (that is why I included a time parameter,   
   and the parenthetical remark that it should be ignored).  Your example of   
   inconsistent preference ordering can be resolved as follows:   
      
   The subject inputs the following preferences:   
     O1 <     O2   
         S,t1   
      
     O2 <     O3   
         S,t2   
   At this point everything is consistent.  When the user inputs   
     O3 <     O1   
         S,t3   
   an inconsistency occurs.  This means the subject's preferences have changed,   
   so the system simply throws away or ignores the oldest inputs that are   
   inconsistent with the newest ones, so the remaining set becomes:   
     O2 <     O3   
         S,t2   
      
     O3 <     O1   
         S,t3   
      
   (If a subject cannot choose between two objects, what he really does is   
   choose the 'absent object' (see initial post), the representation of 'no   
   object chosen'.)   
      
   Given this method of resolving inconsistencies, the preference relation can   
   _always_ be made strict.  That is why you should ignore the time parameter,   
   and assume the ordering relation is strict.  This is really not important   
   for my request.   
      
   | > Alternatively, the input could consist of assigning a grade, or a   
   | > monetary amount to each object in some set of objects.  (This is   
   | > actually just a way of monotonically mapping the ordering relation   
   | > between objects on the ordering relation between numbers:   
   | >   value(O1) < value(O2) implies O1 < O2.)   
   |   
   | That changes nothing. I think that such measure (value : object ->   
   | ordered) simply does not exist, because preferences are not ordered   
   | in the strict sense.   
      
   See above.  Because the ordering relation _is_ strict, it can always be   
   monotonically mapped on numbers.   
      
   | > So my questions are, roughly, these:  Has such a thing been done   
   | > before?  If so, could you provide me with some references to e.g.   
   | > books and/or articles? (I have looked into 'fuzzy association rules',   
   | > but as far as I can tell these do not meet my needs.)  How should I   
   | > input and represent the preference relation?   
   |   
   | I think that preference has a fine structure which gets lost when   
   | mapped to a numeric value. Minimally one should distinguish: O1 > O2   
   | and not (O1 > O2). This immediately leads to intuitionistic fuzzy   
   | preference values:   
   |   
   | Pos(O1 > O2), Nec(O1 > O2)   
   |   
   | [ Nec(O1 > O2) = 1 - Pos(not (O1 > O2)) ]   
   |   
   | Maybe it should be split even finer. Say you have some set of basic   
   | properties, features (typical are genre, volume, price, artwork on the   
   | cover ..., but you could invent something less evident). User classify   
   | objects into features and they are compared in the feature space.   
      
   Again, given the fact that preference is indeed strict at any given moment   
   in time, the use of intuitionistic fuzzy preference values needlessly   
   complicates things.  Please assume the ordering relation is strict.   
      
   | [ Basically all that is no more than to find a space of object's images   
   | where components (of the image vector) would become clearly ordered.   
   | Though the vectors itself will be still incomparable. But at least it   
   | would more realistically model what happens in someone's head. ]   
      
   See above.   
      
   | > What algorithms can be used to predict preferences?  Where can I   
   | > find out more?  Am I making any sense? ;-)   
   |   
   | I would try to formulate it in fuzzy terms and keep it fuzzy all the way   
   | until the final stage.   
      
   What fuzzy association rules do is divide a continuous value into discrete   
   yet fuzzy steps, so e.g. the continous value 'length' which previously   
   ranged over real numbers from 1 to 10 can now assume the ternary values   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]