... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.ai
Awaiting the gospel from Sarah Connor
1,954 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 1,598 of 1,954
Ted Dunning to Rob
Re: Question Regarding latent dirichlet
11 Dec 07 04:43:46
   From: ted.dunning@gmail.com   
      
   On Dec 9, 6:51 pm, Rob  wrote:   
   > 2. for each of the N words w_n:   
      
   Change the wording here to:   
      
   2. for each word POSITION n = 1..N for which the actual word is still   
   unknown   
      
   > (a)choose a topic z_n = mult(\theta)   
   > What does this mean?   
      
   This corresponds (roughly) to the intuition that there is some   
   unobservable something in your head when you generate (say, type) word   
   w_n.  It is considered to be discrete in order to make the problem   
   possible to analyze.   
      
   By making this hidden variable explicit in the model, the structure of   
   the model becomes tractable for Gibbs sampling or Jordan-style   
   optimization.   
      
   > Otherwise, how can I calculate this multinomial probability?   
      
   This is an integer, not a density.   
      
   > (b) choose a word w_n from p(w_n|z_n,\beta)   
   >    This is the most confusing part. Since the Step 2 is "for each word w_n",   
   >  why are we "choose a word w_n" here again?   
      
   We aren't choosing it again.  For each word position, we do steps 2.a   
   and 2.b.  Step 2.b is where the word itself is chosen (as opposed to   
   the hidden state z_n).   
      
   > I cannot understand this.   
      
   Sure you can!   
      
   > Or is this the conditional probability of w_n given z_n and parameters?   
      
   Basically, this is a multi-step process for getting this probability.   
      
   > I'm confused by the generative process, how do you actually "generate"   
   > words in real application, aren't they contained in the document?   
      
   Yes.  They are.   
      
   But they were generated by something.   
      
   And having a model that reflects the generation process even very   
   approximately is at the least comforting and at best informative.   
      
   So the generative model tries to explain what happens when the   
   documents were selected to be in the document using model parameters   
   such as \theta, \beta and z.   
      
   THEN, based on the observations that we get to make of which words are   
   in which document, we try to estimate the distribution of the   
   parameters and draw conclusions from this distribution.   
      
   > For example, if given the following training set   
   >   
   >             w_1  w_2   w_3   w_4   
   >     d1      1     0        3       5   
   >     d2      0     3        1       2   
   > .............................................   
   >   
   > what does the generative process look like? Anyone can help give a   
   > walkthrough example?   
      
   Imagine that d1 was written by person P1.  P1 had thoughts z_1 ... z_9   
   in their head as they wrote 9 words which we now call d1.  Person P2   
   then had thoughts z10 ... z_15 in their head as they wrote 6 words   
   that we now call d2.  P1 and P2 could be the same person (we can't   
   know).   
      
   That is the generative process.   
      
   Your job is to divine the parameters describing the mapping from   
   thoughts (z) to words (w), the distribution of thoughts expressed in   
   each document, and also to estimate the thoughts that caused each   
   word.   
      
   Of course, you can only get distributional estimates of this.   
      
   Using these estimates, you want to answer real world questions such as   
   was P1 thinking the same sort of stuff as P2.   
      
   Look at Buntine and Jakulin for an interesting alternative look at   
   this.   
      
   > Great thanks.   
   >   
      
   [ comp.ai is moderated ... your article may take a while to appear. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]