# Probability axioms

(Redirected from Kolmogorov axioms)

The Kolmogorov axioms are the foundations of probability theory introduced by Andrey Kolmogorov in 1933.[1] These axioms remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases.[2] An alternative approach to formalising probability, favoured by some Bayesians, is given by Cox's theorem.[3]

## Axioms

The assumptions as to setting up the axioms can be summarised as follows: Let (Ω, FP) be a measure space with ${\displaystyle P(E)}$ being the probability of some event E, and ${\displaystyle P(\Omega )}$ = 1. Then (Ω, FP) is a probability space, with sample space Ω, event space F and probability measure P.[1]

### First axiom

The probability of an event is a non-negative real number:

${\displaystyle P(E)\in \mathbb {R} ,P(E)\geq 0\qquad \forall E\in F}$

where ${\displaystyle F}$ is the event space. It follows that ${\displaystyle P(E)}$ is always finite, in contrast with more general measure theory. Theories which assign negative probability relax the first axiom.

### Second axiom

This is the assumption of unit measure: that the probability that at least one of the elementary events in the entire sample space will occur is 1

${\displaystyle P(\Omega )=1.}$

### Third axiom

This is the assumption of σ-additivity:

Any countable sequence of disjoint sets (synonymous with mutually exclusive events) ${\displaystyle E_{1},E_{2},\ldots }$ satisfies
${\displaystyle P\left(\bigcup _{i=1}^{\infty }E_{i}\right)=\sum _{i=1}^{\infty }P(E_{i}).}$

Some authors consider merely finitely additive probability spaces, in which case one just needs an algebra of sets, rather than a σ-algebra.[4] Quasiprobability distributions in general relax the third axiom.

## Consequences

From the Kolmogorov axioms, one can deduce other useful rules for studying probabilities. The proofs[5][6][7] of these rules are a very insightful procedure that illustrates the power the third axiom, and its interaction with the remaining two axioms. Four of the immediate corollaries and their proofs are shown below:

### Monotonicity

${\displaystyle \quad {\text{if}}\quad A\subseteq B\quad {\text{then}}\quad P(A)\leq P(B).}$

If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.

#### Proof of monotonicity[5]

In order to verify the monotonicity property, we set ${\displaystyle E_{1}=A}$ and ${\displaystyle E_{2}=B\setminus A}$, where ${\displaystyle A\subseteq B}$ and ${\displaystyle E_{i}=\varnothing }$ for ${\displaystyle i\geq 3}$. It is easy to see that the sets ${\displaystyle E_{i}}$ are pairwise disjoint and ${\displaystyle E_{1}\cup E_{2}\cup \cdots =B}$. Hence, we obtain from the third axiom that

${\displaystyle P(A)+P(B\setminus A)+\sum _{i=3}^{\infty }P(E_{i})=P(B).}$

Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to ${\displaystyle P(B)}$ which is finite, we obtain both ${\displaystyle P(A)\leq P(B)}$ and ${\displaystyle P(\varnothing )=0}$.

### The probability of the empty set

${\displaystyle P(\varnothing )=0.}$

In some cases, ${\displaystyle \varnothing }$ is not the only event with probability 0.

#### Proof of probability of the empty set

As shown in the previous proof, ${\displaystyle P(\varnothing )=0}$. However, this statement is seen by contradiction: if ${\displaystyle P(\varnothing )=a}$ then the left hand side ${\displaystyle [P(A)+P(B\setminus A)+\sum _{i=3}^{\infty }P(E_{i})]}$ is not less than infinity; ${\displaystyle \sum _{i=3}^{\infty }P(E_{i})=\sum _{i=3}^{\infty }P(\varnothing )=\sum _{i=3}^{\infty }a={\begin{cases}0&{\text{if }}a=0,\\\infty &{\text{if }}a>0.\end{cases}}}$

If ${\displaystyle a>0}$ then we obtain a contradiction, because the sum does not exceed ${\displaystyle P(B)}$ which is finite. Thus, ${\displaystyle a=0}$. We have shown as a byproduct of the proof of monotonicity that ${\displaystyle P(\varnothing )=0}$.

### The complement rule

${\displaystyle P\left(A^{c}\right)=P(\Omega \setminus A)=1-P(A)}$

#### Proof of the complement rule

Given ${\displaystyle A}$ and ${\displaystyle A^{c}}$are mutually exclusive and that ${\displaystyle A\cup A^{c}=\Omega }$:

${\displaystyle P(A\cup A^{c})=P(A)+P(A^{c})}$ ... (by axiom 3)

and, ${\displaystyle P(A\cup A^{c})=P(\Omega )=1}$ ... (by axiom 2)

${\displaystyle \Rightarrow P(A)+P(A^{c})=1}$

${\displaystyle \therefore P(A^{c})=1-P(A)}$

### The numeric bound

It immediately follows from the monotonicity property that

${\displaystyle 0\leq P(E)\leq 1\qquad \forall E\in F.}$

#### Proof of the numeric bound

Given the complement rule ${\displaystyle P(E^{c})=1-P(E)}$ and axiom 1 ${\displaystyle P(E^{c})\geq 0}$:

${\displaystyle 1-P(E)\geq 0}$

${\displaystyle \Rightarrow 1\geq P(E)}$

${\displaystyle \therefore 0\leq P(E)\leq 1}$

## Further consequences

Another important property is:

${\displaystyle P(A\cup B)=P(A)+P(B)-P(A\cap B).}$

This is called the addition law of probability, or the sum rule. That is, the probability that A or B will happen is the sum of the probabilities that A will happen and that B will happen, minus the probability that both A and B will happen. The proof of this is as follows:

Firstly,

${\displaystyle P(A\cup B)=P(A)+P(B\setminus A)}$ ... (by Axiom 3)

So,

${\displaystyle P(A\cup B)=P(A)+P(B\setminus (A\cap B))}$ (by ${\displaystyle B\setminus A=B\setminus (A\cap B)}$).

Also,

${\displaystyle P(B)=P(B\setminus (A\cap B))+P(A\cap B)}$

and eliminating ${\displaystyle P(B\setminus (A\cap B))}$ from both equations gives us the desired result.

An extension of the addition law to any number of sets is the inclusion–exclusion principle.

Setting B to the complement Ac of A in the addition law gives

${\displaystyle P\left(A^{c}\right)=P(\Omega \setminus A)=1-P(A)}$

That is, the probability that any event will not happen (or the event's complement) is 1 minus the probability that it will.

## Simple example: coin toss

Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair.

We may define:

${\displaystyle \Omega =\{H,T\}}$
${\displaystyle F=\{\varnothing ,\{H\},\{T\},\{H,T\}\}}$

Kolmogorov's axioms imply that:

${\displaystyle P(\varnothing )=0}$

The probability of neither heads nor tails, is 0.

${\displaystyle P(\{H,T\}^{c})=0}$

The probability of either heads or tails, is 1.

${\displaystyle P(\{H\})+P(\{T\})=1}$

The sum of the probability of heads and the probability of tails, is 1.

3. ^ Terenin Alexander; David Draper (2015). "Cox's Theorem and the Jaynesian Interpretation of Probability". arXiv:1507.06597. Bibcode:2015arXiv150706597T. Cite journal requires |journal= (help)