programmingbee.net
RL part 3. Markov Decision Process, policy, Bellman Optimality Equation.
Recall that in part 2 we introduced a notion of a Markov Reward Process which is really a building block since our agent was not able to take actions. It was simply transitioning from one state to …