Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 686 of 1,954    |
|    Dmitry A. Kazakov to Sebastian Stern    |
|    Re: Data Mining of Preference Orderings     |
|    03 Apr 05 23:47:37    |
   
   From: mailbox@dmitry-kazakov.de   
      
   On Sun, 03 Apr 2005 02:48:27 GMT, Sebastian Stern wrote:   
      
   > Dmitry A. Kazakov:   
   >| Sebastian Stern:   
   >|> For a project of mine, I have recently become interested in Data   
   >|> Mining. One common technique in Data Mining (used by on line book   
   >|> stores for example) is the mining of Association Rules. Each   
   >|> rule has the form   
   >|>   
   >|> A => B   
   >|>   
   >|> where A and B are sets of objects (e.g., books), and each rule   
   >|> can be interpreted as stating "The possession of A implies the   
   >|> possession of B".   
   >|   
   >| Shouldn't it be possession of A implies interest in B? Clearly,   
   >| possession   
   >| /= interest.   
   >   
   > No, it really should be "the possession of A implies the possession of B".   
   > Association rules are defined in terms of possession; it is then assumed   
   > that possession means interest, and absense of posession means absense of   
   > interest. This binary modelling of interest is too course-grained for my   
   > purposes. Possession is indeed not the same as the _degree_ of interest;   
   > that is the whole reason for me writing my post: I am looking for a way to   
   > predict '_degree_ of interest', not the _probability_ of a discrete   
   > 'yes/no'-kind of interest (as ordinary association rule systems do).   
      
   Mmm, it could be the expectation of the degree of interest, if you wished   
   to have it finer, but there would be still   
      
   1. no link between possession and interest. (I have a lot of things I am   
   uninterested in, some of them I would be happy to get rid of.)   
      
   2. no reasonable explanation why interest is random (has the probability   
   of). Is it because you have no user-specific data?   
      
   >|> The 'degree of confidence' in a rule is defined as the conditional   
   >|> probability that a subject (user) is interested in an objects B under   
   >|> the condition that the subject already posesses objects A. This   
   >|> confidence is thus computed using the familiar formula for   
   >|> conditional probabilities:   
   >|>   
   >|> confidence(A => B) := P(B | A) := P(A and B) / P(B)   
   >|   
   >| If A is a set of books, then P(A) is a probability of what? Are books   
   >| random? Maybe P(A) = P(User has A)? Is this random?   
   >   
   > P(A) is the so-called 'degree of support' of an object set: it is the   
   > frequency that the set occurs in the data base, i.e., the number of subject   
   > owning the object set A, i.e., P(User has A). (Note that using absolute or   
   > relative frequencies does not change the result.)   
      
   I see, it is P(a randomly chosen user has A). Then you could try the   
   following. You define preference as relation, crisp, fuzzy or finer is no   
   matter. Then by questioning users you evaluate P(a>b | has {a,b}) and from   
   that P(a>b) = an arbitrary user prefers a to b. [ If > is fuzzy then you   
   will have the pair P(a>b), P(ab)+P(a|> Such a system does _not_ distinguish between _degrees_ of preference,   
   >|> i.e., it does not produce and _ordering_ of preference between   
   >|> different objects; and this is the crux of my post.   
   >|   
   >| Yes, it is the difference between possessing and having an interest in   
   >| something, which the model above does not respond to.   
   >   
   > Precisely, and I am looking for a way to predict (degree of) interest.   
      
   For this, differently from the rough approach as above, you would need a   
   model of interest. It should be a function of user and object properties.   
   It is not a direct function of popularity. Once you have the model you can   
   evaluate it assuming that users are random, as I did above.   
      
   >|> For input, the system could present two objects at a time and let the   
   >|> subject choose which he prefers. The choice of the subject would   
   >|> reflect his relative preference for one of the two objects. The   
   >|> preference relation is a strict ordering relation between objects,   
   >|> parametrized on the subject and time (but let us assume that the   
   >|> subject's preferences do not change over time).   
   >|>   
   >|> O1 < O2   
   >|> S,t   
   >|   
   >| I remotely remember a study that shown preference relation is not   
   >| transitive. So a person can give sort of answers: O1 < O2 < O3 < O1. The   
   >| problem is that the relation is of course fuzzy and when the answer is   
   >| forced to certain Boolean, that heavily distorts the result.   
   >   
   > Yes, the preference ordering relation may not always _appear_ to be   
   > transitive, but this is not due to fuzzyness; it is due to the fact that   
   > preference may change over time (that is why I included a time parameter,   
   > and the parenthetical remark that it should be ignored).   
      
   The study I referred shown that preferences are unordered not [only]   
   because they change, but because they are. In different contexts   
   preferences vary. The context is not only the time frame. It can be   
      
   O1 < O2 | I collect books of O2's author   
   O1 < O2 | I want to present it to my friend   
   O1 < O2 | I have only $20 to spent on books this month   
   O1 < O2 | My aunt asked me what I'd like to have, but she wouldn't buy   
   anything like O1   
   ...   
      
   >|> Alternatively, the input could consist of assigning a grade, or a   
   >|> monetary amount to each object in some set of objects. (This is   
   >|> actually just a way of monotonically mapping the ordering relation   
   >|> between objects on the ordering relation between numbers:   
   >|> value(O1) < value(O2) implies O1 < O2.)   
   >|   
   >| That changes nothing. I think that such measure (value : object ->   
   >| ordered) simply does not exist, because preferences are not ordered   
   >| in the strict sense.   
   >   
   > See above. Because the ordering relation _is_ strict, it can always be   
   > monotonically mapped on numbers.   
      
   That could be an artificial order not reflecting the actual preferences.   
   After all the whole system is finite and discrete. You can map all its   
   states into integers and so "order" them...   
      
   >|> What algorithms can be used to predict preferences? Where can I   
   >|> find out more? Am I making any sense? ;-)   
   >|   
   >| I would try to formulate it in fuzzy terms and keep it fuzzy all the way   
   >| until the final stage.   
   >   
   > What fuzzy association rules do is divide a continuous value into discrete   
   > yet fuzzy steps, so e.g. the continous value 'length' which previously   
   > ranged over real numbers from 1 to 10 can now assume the ternary values   
   > 'short', 'middle', and 'long'. Then association rules of the following form   
   > can be used:   
   >   
   > length(X)=short => length(Y)=long   
   >   
   > This is to coarse-grained for my project. I am looking for a way to input   
   > and predict arbitrary precision values, e.g. be able to calculate the   
   > expected value of predicted_preference(Y) given "input_preference(X) =   
   > 7.48".   
      
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca