... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.ai

Awaiting the gospel from Sarah Connor

1,954 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 613 of 1,954

Ted Dunning to All

Re: Logistic Regression Details and Pseu

21 Feb 05 10:30:26

   From: ted.dunning@gmail.com   
      
   I would recommend that you read the SAS documentation.   
      
   Of course, there is still a bit of construction on top of that since   
   many functions in the SAS implementation are essentially obsolete.  For   
   instance, in variable selection (what SAS calls effect selection, I   
   think), the only version that anybody uses in practice is step-wise   
   selection.  Similarly, all of the options for inverting the order of   
   the target values are just to repair an error in the original   
   specification.  Also, the ability to select whether you use Fisher   
   scoring or Newton Raphson iteration to find the optimum makes   
   absolutely no difference to the normal practitioner (if you are using   
   logistic regression, that is... for probit regression and the others   
   this makes a tiny difference).   
      
   If you really want to make a useful utility as opposed to a SAS clone,   
   I would recommend the following functionality:   
      
   1)  simple logistic regression on binary  or multi-nomial targets   
      
   2) the ability to regularize this solution by supplying a penalty   
   matrix.  This matrix would add an additional term to the log-likehood   
   of the form beta' A beta where beta are the coefficients and A is the   
   penalty matrix.  If A is diagonal, then you have the equivalent of   
   weight decay in neural networks.  If A is more complex you can encode   
   various kinds of expected results such as temporal or geographic   
   continuity.   
      
   3) the ability to do step-wise variable selection.  This is really just   
   a special form of (2), but isn't expressed as easily as a matrix.   
      
   4) the ability to integrate easily with a general framework for   
   transforming the inputs and outputs.  This handles all of the issues   
   with exploring interactions and such.   
      
   5) there should be *separate* facilities for easily doing cross   
   validation for overall performance evaluation, bootstrapping of   
   jack-knife to evaluate confidence bounds on parameters and graphical   
   presentation of results.  These should emphatically NOT be built into   
   the logistic regression if only so that they can be tested separately.   
      
   Hope this helps.  Your mileage may vary.   
      
   Remember that this is free advice although it wasn't free for me.   
      
   [ comp.ai is moderated.  To submit, just post and be patient, or if ]   
   [ that fails mail your article to , and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]