Exercises
- Calculate -step transition probabilities for the robot using the Markov chain model we introduced with the state initialized at . You will notice that it will take a bit more time for the system to reach the steady state.
- Modify the Markov chain to include the absorbing state for the robot crashing into the wall. What does your look like for a large ?
- Using the state values in Figure 4.7, calculate the value of a corner state using the estimates for the neighboring state values.
- Iteratively estimate the state values in the grid world MRP using matrix forms and operations instead of using a for loop.
- Calculate the action value where the policy π corresponds to taking no actions in any state using the values in Figure 4.7. Based on how compares to , would you consider changing your policy to take the action 'up' instead of no actions in state ?