Look the next link to other thread: Unsupervised Learning : Week3 : Learning the state-value function - #3 by Luis.BR