Questions
Let's evaluate our newly acquired knowledge by answering these questions:
- How does RL differ from other ML paradigms?
 - What is called the environment in the RL setting?
 - What is the difference between a deterministic and a stochastic policy?
 - What is an episode?
 - Why do we need a discount factor?
 - How does the value function differ from the Q function?
 - What is the difference between deterministic and stochastic environments?