programmingbee.net
RL Part 4.1 Dynamic Programming. Iterative Policy Evaluation.
So far in the series we’ve got an intuitive idea about what RL is, we described the system using Markov Reward Process and Markov Decision Process. We know what the policy is, what the optima…