Thus far, our discussion of RL has looked at simpler techniques for building agents with bandits and Q-learning. Q-learning is a popular algorithm, and as we learned, deep Q neural networks provide us with a great foundation to use to solve more difficult problems, such as a cart balancing a pole. The following table summarizes the various RL algorithms, what conditions they are capable of working in, and how they function:
| Algorithm | Model | Policy | Action | Observation | Operator |
| Q-Learning | Model-free | Off-policy | Discrete | Discrete | Q value |
| SARSA – State Action Reward State Action | Model-free | On-policy | Discrete | Discrete | Q value |
| DQN – Deep Q Network | Model-free | Off-policy | Discrete | Continuous | Q value |
| DDPG – Deep Deterministic Policy Gradient | Model-free | Off-policy | Continuous | Continuous | Q value |
| TRPO – Trust Region Policy Optimization... |