policy
Concept in [[decision-theory]], specifically [[markov-decision-process]]
For history that accumulates all past states and actions , the policy is written as . For a stationary process, we can simply write this as , omitting the time-dependence.
The definition of an optimal policy is one that maximizes the [[utility]]:
Deterministic policies
Stochastic policies
#needs-expanding
Backlinks
markov-decision-process
A [[policy]] maps states to actions. In an MDP, we assume the next state depends only on the current, so we can write $\pi(s)$, also denoted as a *stationary policy*.