In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.[1]
Let and be measurable spaces. A Markov kernel with source and target , sometimes written as :(X,{\mathcal {A}})\to (Y,{\mathcal {B}})}
, is a function :{\mathcal {B}}\times X\to [0,1]}
with the following properties:
- For every (fixed) , the map is -measurable
- For every (fixed) , the map is a probability measure on
In other words it associates to each point a probability measure on such that, for every measurable set , the map is measurable with respect to the -algebra .[2]
Take , and (the power set of ). Then a Markov kernel is fully determined by the probability it assigns to singletons for each :
- .
Now the random walk that goes to the right with probability and to the left with probability is defined by
where is the Kronecker delta. The transition probabilities for the random walk are equivalent to the Markov kernel.
General Markov processes with countable state space
More generally take and both countable and .
Again a Markov kernel is defined by the probability it assigns to singleton sets for each
- ,
We define a Markov process by defining a transition probability where the numbers define a (countable) stochastic matrix i.e.
We then define
- .
Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.
Markov kernel defined by a kernel function and a measure
Let be a measure on , and a measurable function with respect to the product -algebra such that
- ,
then i.e. the mapping
- :{\mathcal {B}}\times X\to [0,1]\\\kappa (B|x)=\int _{B}k(y,x)\nu (\mathrm {d} y)\end{cases}}}
defines a Markov kernel.[3] This example generalises the countable Markov process example where was the counting measure. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on with standard Lebesgue measure and
- .
Given measurable spaces , we consider a Markov kernel :{\mathcal {B}}\times X\to [0,1]}
as a morphism . Intuitively, rather than assigning to each a sharply defined point the kernel assigns a "fuzzy" point in which is only known with some level of uncertainty, much like actual physical measurements. If we have a third measurable space , and probability kernels and , we can define a composition by the Chapman-Kolmogorov equation
- .
The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure ) is the unit for this composition.
This composition defines the structure of a category on the measurable spaces with Markov kernels as morphisms, first defined by Lawvere,[4] the category of Markov kernels.
A composition of a probability space and a probability kernel :(X,{\mathcal {A}})\to (Y,{\mathcal {B}})}
defines a probability space , where the probability measure is given by
Semidirect product
Let be a probability space and a Markov kernel from to some . Then there exists a unique measure on , such that:
Regular conditional distribution
Let be a Borel space, a -valued random variable on the measure space and a sub--algebra. Then there exists a Markov kernel from to , such that is a version of the conditional expectation for every , i.e.
It is called regular conditional distribution of given and is not uniquely defined.
Transition kernels generalize Markov kernels in the sense that for all , the map
can be any type of (non negative) measure, not necessarily a probability measure.
Erhan, Cinlar (2011). Probability and Stochastics. New York: Springer. pp. 37–38. ISBN 978-0-387-87858-4.
- §36. Kernels and semigroups of kernels