home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.ai      Awaiting the gospel from Sarah Connor      1,954 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 880 of 1,954   
   Belera to All   
   implementation of REINFORCE   
   02 Jan 06 00:05:26   
   
   XPost: comp.ai.neural-nets   
   From: belera@gmail.com   
      
   I've been trying to implement Ronald J. William's REINFORCE algorithm,   
   based on these two papers:   
   1- Simple statistical gradient-following algorithms for connectionist   
   reinforcement learning:   
   ftp://ftp.ccs.neu.edu/pub/people/rjw/conn-reinf-ml-92.ps   
   2- Function optimization using connectionist reinforcement learning   
   algorithms: ftp://ftp.ccs.neu.edu/pub/people/rjw/func-opt-cs-91.ps   
   I'm using a simple function of the form: -abs(p1-39)-abs(p2-73); where   
   p1 and p2 are parameters.   
   It doesn't converge at all. For a case of only one parameter, it'll   
   converge in about 800 turn, but for more than that, no convergence.   
   Worst of all, is that I don't know what's wrong with my code   
   (obviously)!   
      
   Can anyone give me a hand, please?  Or is there any implementation of   
   it available so that I can learn what to do?   
   Here's my code in Matlab:   
      
   function rltest(p1,p2);   
      
   if nargin<2   
       p1=19;   
       p2=42;   
   end   
   % x is Input vector   
   x=[p1 p2];   
   % m, no. of inpus and n, no. of output cells. I've assumed output to be   
   % between 0 and 127   
   m=size(x,2);   
   n=6*m;   
   % Learning Rate:   
   alpha=.00001;   
   % Ybar ans rbar, to be traces of output and reward, respectavely   
   ybar=zeros(n,1);   
   rbar=0;   
   % Weight Decay rate   
   delta=0.01;   
   gama=0.9;   
   turn=0;   
      
   % Matrix of Weights, and Weight vector for bias input:   
   W=50*rand(n,m);   
   W0=50*rand(n,1);   
      
   serie=[2.^[0:(n/m-1)]];   
      
   r=-10000;   
   while r<0   
       for i=1:n   
   %       s is weighted summation of inputs to each output unit   
           s(i)=sum(W(i,:).*x)+W0(i);   
   %       Output units are Bernoulli Logistic   
           f(i)=1/(1+exp(-s(i)));   
           y(i)=double(rand, and ]   
   [ ask your news administrator to fix the problems with your system. ]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca