CS138 | Reinforcement Learning | Notion

Definitions - learn from rewards/ punishment of actions

MDP (Markov Decision Process)

state
transition function

Q-learning - q-table

The heuristic would be computed with parameters like the q-value, etc

binary rewards — without actually tracking the pancake, like being blind