Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.ai    |    Awaiting the gospel from Sarah Connor    |    1,954 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 880 of 1,954    |
|    Belera to All    |
|    implementation of REINFORCE    |
|    02 Jan 06 00:05:26    |
      XPost: comp.ai.neural-nets       From: belera@gmail.com              I've been trying to implement Ronald J. William's REINFORCE algorithm,       based on these two papers:       1- Simple statistical gradient-following algorithms for connectionist       reinforcement learning:       ftp://ftp.ccs.neu.edu/pub/people/rjw/conn-reinf-ml-92.ps       2- Function optimization using connectionist reinforcement learning       algorithms: ftp://ftp.ccs.neu.edu/pub/people/rjw/func-opt-cs-91.ps       I'm using a simple function of the form: -abs(p1-39)-abs(p2-73); where       p1 and p2 are parameters.       It doesn't converge at all. For a case of only one parameter, it'll       converge in about 800 turn, but for more than that, no convergence.       Worst of all, is that I don't know what's wrong with my code       (obviously)!              Can anyone give me a hand, please? Or is there any implementation of       it available so that I can learn what to do?       Here's my code in Matlab:              function rltest(p1,p2);              if nargin<2        p1=19;        p2=42;       end       % x is Input vector       x=[p1 p2];       % m, no. of inpus and n, no. of output cells. I've assumed output to be       % between 0 and 127       m=size(x,2);       n=6*m;       % Learning Rate:       alpha=.00001;       % Ybar ans rbar, to be traces of output and reward, respectavely       ybar=zeros(n,1);       rbar=0;       % Weight Decay rate       delta=0.01;       gama=0.9;       turn=0;              % Matrix of Weights, and Weight vector for bias input:       W=50*rand(n,m);       W0=50*rand(n,1);              serie=[2.^[0:(n/m-1)]];              r=-10000;       while r<0        for i=1:n       % s is weighted summation of inputs to each output unit        s(i)=sum(W(i,:).*x)+W0(i);       % Output units are Bernoulli Logistic        f(i)=1/(1+exp(-s(i)));        y(i)=double(rand |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca