17
$\begingroup$

Can someone please provide a useful reference on the definition of probabilistic distribution.

A very popular site (top of Google search) states:

A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence.

https://stattrek.com/probability-distributions/probability-distribution.aspx

I feel that this definition is very unsatisfactory. I need a better one with a reference.

Thank you!

$\endgroup$
3
  • 1
    $\begingroup$ I think the term "probability distribution" is more common than "probabilistic distribution" (which I haven't heard anyone use), so maybe for clarity the question should ask for a definition of "probability distribution". I too would like to know the consensus about what precisely this term means, because I have heard it used in different ways. Sometimes it seems to be used interchangeably with the term "probability measure". However, I think the term "distribution" refers specifically to the distribution of a random variable $X$, which is the probability measure on $\mathbb R$ induced by $X$. $\endgroup$
    – littleO
    Commented Feb 1, 2019 at 6:52
  • 4
    $\begingroup$ Before you answer, bounty hunters: I'm the person who made the bounty. The specific concern I have is as follows. People usually define distributions with respect to random variables. I sometimes hear the term "distribution" being applied to the probability function itself rather than any random variable associated with the probability space. For instance, a finite probability space $\Omega$ with $\mu(\{\omega\})=\frac{1}{|\Omega|}$ for each $\omega \in \Omega$ can be said to have a uniform distribution, but there is no random variable here. $\endgroup$ Commented Sep 14, 2020 at 19:30
  • $\begingroup$ Your example of finite probability space $\Omega=\{\omega_1, ..., \omega_n\}$ can trivially be turned into a random variable problem by the mapping $X(\omega_1)=1, X(\omega_2)=2, ..., X(\omega_n)=n$. A robot may not like someone saying "uniform distribution" without first introducing this random variable $X$, but a human can fill in the gap pretty easily. It is common for a human to describe such a case of equally likely outcomes on a finite probability space as a "uniform distribution" because it is assumed that the audience can fill in the minor gap on their own. $\endgroup$
    – Michael
    Commented Sep 18, 2020 at 15:46

6 Answers 6

19
+500
$\begingroup$

To formally introduce the definition of probability distribution one has to have an appropriate notion of probability. Based on the axioms of Probability laid down by Kolmogorov, let's start with a probability space $(\Omega,\mathscr{F},\mu)$ where

  1. $\Omega$ is some non-meaty space (sample space),
  2. $\mathscr{F}$ is a $\sigma$-algebra of subsets of $\Omega$ (measurable events),
  3. and $\mu$ is a positive, countably additive function on $\mathscr{F}$ with $\mu(\Omega)=1$.

Given another measurable space $(R,\mathscr{R})$, a random variable on $\Omega$ taking values on $R$ is a function $X:\Omega\rightarrow R$ such that $X^{-1}(A):=\{x\in\Omega: X(\omega)\in A\}\in\mathscr{F}$ for all $A\in\mathscr{R}$. $X$ is also said to be $(\Omega,\mathscr{F})$-$(R,\mathscr{R})$ measurable.

Definition 1. The distribution of $X$ (which we may denote as $\mu_X$) is defined as the measure on $(R,\mathscr{R})$ induced by $X$, that is $$\begin{align} \mu_X(A):=\mu\big(X^{-1}(A)\big), \quad A\in\mathscr{R}\tag{1}\label{one} \end{align} $$

Note to address one of the concerns of the bounty sponsor Often in the literature (mathematical physics, probability theory, economics, etc) the probability measure $\mu$ in the triplet$(\Omega,\mathscr{F},\mu)$ is also refereed to as probability distribution. This apparent ambiguity (there is no random variable to speak of) can be resolved by definition (1). To see this, consider the identity map $X:\Omega\rightarrow\Omega$, $\omega\mapsto\omega$. $X$ can be viewed a a random variable taking values in $(\Omega,\mathscr{F})$. Since $X^{-1}(A)=A$ for all $A\in\mathscr{F}$ $$\mu_X(A)=\mu(X^{-1}(A))=\mu(A),\quad\forall A\in\mathscr{F}$$


A few examples:

To fixed ideas, consider $(\Omega,\mathscr{F},\mu)=((0,1),\mathscr{B}((0,1)),\lambda_1)$ the Steinhause space, that is $\Omega$ is the unit interval, $\mathscr{F}$ is the Borel $\sigma$-algebra on $(0,1)$, and $\mu$ is the Lebesgue measure $\lambda_1$.

  1. The identity map $X:(0,1)\rightarrow(0,1)$, $t\mapsto t$, considered as a random variable from $((0,1),\mathscr{B}(0,1))$ to $((0,1),\mathscr{B}(0,1))$, has the uniform distribution on $(0,1)$, that is, $\mu_X((a,b])=\lambda_1((a,b])=b-a$ for all $0\leq a<b<1$.

  2. The function $Y(t)=-\log(t)$, considered as a random variable from $((0,1),\mathscr{B}(0,1))$ to $(\mathbb{R},\mathscr{B}(\mathbb{R}))$ has the exponential distribution (with intensity $1$), i.e. $\mu_Y\big((0,x]\big)=1-e^{-x}$

  3. $Z(t)=\mathbb{1}_{(0,1/2)}(t)$, viewed as a random variable from $((0,1),\mathscr{B}(0,1))$ to $(\{0,1\},2^{\{0,1\}})$ has the Bernoulli distribution (with parameter $1/2$), that is $$ \mu_Z(\{0\})=\mu_Z(\{1\})=\frac12 $$

  4. Any $t\in(0,1)$ admits a unique binary expansion $t=\sum^\infty_{n=1}\frac{r_n(t)}{2^n}$ where $r_n(t)\in\{0,1\}$ and $\sum_nr_n(t)=\infty$. It can be shown that the each map $X_n(t)=r_n(t)$ is a Bernoulli random variable (as in example 3). Furthermore, the distribution of $X:(0,1)\rightarrow\{0,1\}^\mathbb{N}$, as a random variable from $((0,1),\mathscr{B}(0,1))$ to the space of sequences of $0$-$1$'s, the latter equipped with the product $\sigma$-algebra (the $\sigma$-algebra generated by sets $\{\mathbf{x}\in\{0,1\}^\mathbb{N}:x(1)=r_1,\ldots,x(m)=r_m\}$, where $m\in\mathbb{N}$ and $r_1,\ldots.r_m\in\{0,1\}$) is such that $\{X_n:n\in\mathbb{N}\}$ becomes an independent endemically distributed (i.i.d.) sequence of Bernoulli (parameter $1/2$) random variable.


Cumulative distribution function

In many applications of Probability, the random variables of interest take values on the real line $\mathbb{R}$. The real line has a natural measurable structure given by the $\sigma$-algebra $\mathscr{B}(\mathbb{R})$ generated by the open intervals in $\mathbb{R}$. This $\sigma$-algebra is known as the Borel $\sigma$-algebra.

  • It turns out that $X$ is a (real-valued) random variable if and only if $\{X\leq a\}:=X^{-1}((\infty,a])\in\mathscr{F}$ for all $a\in\mathbb{R}$.

  • The distribution $\mu_X$ of $X$ can be encoded by the function $$F_X(x):=\mu_X((-\infty,x])=\mu(\{X\leq x\})$$

  • $F_X$ has the following properties: $\lim_{x\rightarrow-\infty}F_X(x)=0$, $F$ is monotone non-decreasing, right-continuous, and $\lim_{x\rightarrow\infty}F_X(x)=1$.

  • It turns out that any function $F$ that has the properties listed above gives rise to a probability measure $\nu$ on the real line. This is based on basic facts of measure theory, namely the Lebesgue-Stieltjes theorem.

  • For that reason, $F_X$ is commonly known as the cumulative distribution function of $X$, and very often it is simply referred to as the distribution function of $X$.


Final Comments:

All these things are now discussed in courses on probability. At the basic level -by no means trivial- (Feller, Introduction to Probability, Vol I), people discuss mainly cumulative distribution functions of random variables; at the more advanced level (Feller, Introduction to Probability, Vol II), people work with more general random variables and so the "general" notion of distribution (as in $\eqref{one}$) is discussed.

$\endgroup$
6
  • $\begingroup$ I sometimes hear the term "distribution" being applied to the probability function itself (which you denote $\mu$) rather than any random variable associated with the probability space. For instance, a finite probability space $\Omega$ with $\mu(\{\omega\})=\frac{1}{|\Omega|}$ for each $\omega \in \Omega$ can be said to have a uniform distribution, but there is no random variable here. $\endgroup$ Commented Sep 14, 2020 at 19:20
  • $\begingroup$ Isn't the codomain of a random variable a measurable space (usually $\mathbb{R}$)?...You have it as $\Omega$, which is the sample space. $\endgroup$ Commented Sep 14, 2020 at 19:36
  • $\begingroup$ @MathematicsStudent1122: All I am saying is that the identity map can be viewed as a random variable from $(\Omega,\mathscr{F})$ to $(\Omega,\mathscr{F})$. This is not very interesting example of course, but the following modification is: $X:(\Omega,\mathscr{F})\rightarrow(\Omega,\mathscr{A})$ given by $\omega\mapsto\omega$ where $\mathscr{A}$ is a $\sigma$-algebra contained in $\mathscr{F}$. Here $\mu_X$ is the restriction of $\mu$ to $\mathscr{A}$. This random variable is at the heart of the notion of conditional probability. $\endgroup$
    – Mittens
    Commented Sep 14, 2020 at 19:47
  • $\begingroup$ Oliver, +1 for the nice answer. Howewer, for the sake of completeness, let me play the role of the contrarian: despite not being an expert probabilist, recently I had to refresh my memories on the foundation of the theory of probability, and I can say that Axiom 3 is a strict characteristics of Kolmogorov's axiomatics, to which your answer adheres. (Continues in the following comment). $\endgroup$ Commented Sep 15, 2020 at 9:35
  • 1
    $\begingroup$ @DanieleTampieri: If $(\Omega,\mathscr{F},\mu)$ is a general measure space or a $\mathscr{F}$ is an algebra and $\mu$ is a charge (finitely additive set function that vanishes at $\emptyset$), then the push forward or $\mu$ by $X$ (still $X$ is $\mathscr{F}$-measurable) given by $\mu_X(A)=\mu(X^{-1}(A))$ is also a measure or a charge respectively. So yes, the notion of distribution as in produced in (1) still carries over to the setting of deFinetti. $\endgroup$
    – Mittens
    Commented Sep 15, 2020 at 13:52
6
$\begingroup$

To have a nice definition you need to have a nice object to define, so first of all, instead of speaking of "probability distribution" is better to refer, for example, to

Cumulative Distribution Function -

The Cumulative Distribution Function, CDF (sometimes called also Probability Distribution Function) of a random variable $X$, denoted by $F_X(x)$, is defined to be that function with domain the real line and counterdomain the interval $[0;1]$ with satisfies

$$F_X(x)=\mathbb{P}[X \leq x]=\mathbb{P}[\{\omega:X(\omega)\leq x\}]$$

for every real number $x$

A cumulative distribution function is uniquely defined for each random variable. If it is known, it can be used to find probabilities of events defined in terms of its corresponding random variable.

This definition is taken from: Mood Graybill Boes, Introduction to the Theory of Statistics - McGraw Hill

$\endgroup$
3
$\begingroup$

The term "probability distribution" is ambiguous: it means two different things. One meaning is "probability measure", the precise definition of which is given in any modern probability textbook. The other is one particular way of uniquely specifying a probability measure on the real numbers $\mathbb R$, or on $\mathbb R^n$, namely, the "probability distribution function", a.k.a. "cumulative distribution function".

The intuition behind both is that they describe how "probability mass" is spread out over the space of possibilities. Given a probability measure $\mu$ on $\mathbb R$, one can recover its distribution function via $F(t)=\mu((-\infty,t])$; in fact, there is a theorem to the effect that given a probability distribution function $F$ there is a unique probability measure $\mu$ for which $F(t)=\mu((-\infty,t])$ holds for all $t$. So in a sense the distinction is not that important. Neither concept strictly speaking requires the concept of "random variable", by itself, even though their study is the main use of probability distributions.

This state of affairs, that there are two distinct but similar objects with similar names, arose about 100 years ago, as mathematicians were groping towards generalizations of the Lebesgue integral (such as the Lebegue-Stieltjes integral) and so on. 150 years ago there were various discrete probability distributions (the Poisson, the binomial, etc), and various continuous distributions with densities (the Gaussian, the Cauchy, etc), and it was not clear that they were instances of the same sort of thing. The discovery of the Stieltjes integral was big news then, and more or less finished the measure theory of the real line: if you knew the probability distribution function, you knew (in principle) everything you needed to know, about a real-valued random variable.

One attraction of the more abstract-seeming Kolmogorov version of probability theory was that it applied to such things as random functions, random sequences of events, and so on, not just random points in $\mathbb R^n$.

$\endgroup$
3
$\begingroup$

Perhaps it might help to define what probability is first. The easiest way to think about it, if you don't want to get into measure-theoretic definitions, is that a probability is a number between $0$ and $1$, assigned to a logical statement, that represents how likely it is to be true. A logical statement can be something like, "It will rain tomorrow" or "A fair coin was tossed $10$ times and came up heads $5$ times." The statement itself can only be true or false, but you don't know for certain; the probability then tells you how likely it is to be true. Such logical statements are called events. A probability measure is a function $P$ defined on the set of all events in your universe and obeying consistency properties such as "if event $A$ implies event $B$, then $P\left(A\right) \leq P\left(B\right)$".

If an event is a logical statement whose truth or falsity you don't know, then a random variable is a number whose value you don't know. If $X$ is such an unknown number, then you can come up with events related to that number, such as "$X \leq x$" for different fixed values of $x$. Since a probability measure maps events into $\left[0,1\right]$, any such event has a probability. The probability distribution of $X$ is characterized by the function

$$F\left(x\right) = P\left(X \leq x\right)$$

defined on all $x\in\mathbb{R}$. This is called the "cumulative distribution function" or cdf. The cdf always exists for every random variable. The distribution can also be characterized using other objects that sometimes can be constructed from the cdf, but the cdf is the fundamental object that determines the distribution.

The above answer is not fully rigorous; in reality, events are defined to be subsets of a certain abstract "sample space" $\Omega$, and in order to define a probability measure, the set of events has to be "rich enough" (i.e., it has to be a sigma-algebra). A random variable is then a function $X:\Omega\rightarrow\mathbb{R}$. Nonetheless, even here you can still define events in terms of logical statements, e.g.,

$$\left\{X\leq x\right\} = \left\{\omega\in\Omega\,:\,X\left(\omega\right)\leq x\right\}$$

is one possible event. For the vast majority of modeling and computational problems that you may encounter in probability, you can solve them using the more intuitive notion of an event as a logical statement. It is quite rare that you actually need to dig into the sample space in detail. If I say that $X$ is normally distributed with mean $0$ and variance $1$, that fully characterizes the cdf of $X$ without really saying anything about $\Omega$ (I am implicitly assuming that some such $\Omega$ exists and $X$ is defined on it, but I don't know anything about the objects $\omega\in\Omega$).

Of course, for a deep understanding of the theory you will need to delve into the measure-theoretic foundation. If you want a good reference on measure-theoretic probability, I recommend "Probability and Stochastics" by Cinlar.

$\endgroup$
0
1
$\begingroup$

1: Formal definitions

To start with this question, one should define a probability space: A tuple of three items usually denoted $(\Omega,\mathcal{E},\Bbb{P})$ [or something of this nature].

$\Omega$ is the sample space - the set of all possible outcomes (not to be confused with events!) of our procedure, experiment, whatever. For instance, consider flipping a coin once: In this case, $\Omega=\{\text{H},\text{T}\}$. A random variable $X$ is the "result" of this experiment. You could define $X$ in this case as $$X=\begin{cases} 1 & \text{If coin lands heads}\\ 0 & \text{If coin lands tails} \end{cases}$$ Formally, one can define a measurement $M$ as a bijective map $M:\Omega\to\mathcal{X}$ that maps an outcome of our experiment to a value of the random variable. Here $\mathcal{X}$ is the set of all possible values of $X$. In this coin case, the "measurement" could be writing down a $0$ or $1$ in your notebook if you see a tails or heads accordingly. Bijective means one-to-one: No two outcomes can have the same measurement, and no two measurements could have come from the same outcome.

$\mathcal{E}$ is the event space, which is the set of all subsets (or powerset) of the sample space $\Omega$. In set notation, $\mathcal{E}=\mathcal{P}(\Omega).$ In the coin case mentioned above, $\mathcal{E}=\{\varnothing,\{\text{H}\},\{\text{T}\},\{\text{H},\text{T}\}\}$.

$\mathbb{P}$ is a probability function or probability measure, which is a map or function that maps an event in the event space to a probability. Formally, $\mathbb{P}:\mathcal{E}\to[0,1].$ $\Bbb{P}$ always satisfies three conditions:

1: $\Bbb{P}(e)\in[0,1]~\forall e\in\mathcal{E}$

2: $\Bbb{P}(\varnothing)=0.$

3: $\Bbb{P}(\Omega)=1.$

In words, 1: Every event has a probability. 2: Our experiment must have a result, or, the probability of nothing happening is $0$. 3: Something will happen, or, the probability of getting any result is $1$.

2: Distributions

A probability distribution is a map or function $p$ that assigns a number (positive or zero), not necessarily between $0$ and $1$, to every possible value of $X$. Formally, $p:\mathcal{X}\to\Bbb{R}_{\geq 0}$. In the discrete case, it is quite closely related to the probability measure mentioned before. Let $x\in\mathcal{X}$ be the result of a measurement of some possible outcome, say $x=M(\omega)$ for some $\omega\in\Omega$. It actually turns out that in the discrete case, $$p(x)=\Bbb{P}(\omega).$$ So one might ask: what is the difference between these two closely related things? Well, note that in the continuous case, the above equality does not hold. Since $\Omega$ is uncountably infinite, the probability of any single outcome, or indeed any countable subset of outcomes, is zero. That is, $$\mathbb{P}(\omega)=0$$ regardless of the value of $p(x)$.

In the discrete case, $p$ must satisfy the condition $$\sum_{x\in\mathcal{X}}p(x)=1$$ And in the continuous case $$\int_{\mathcal{X}}p(x)\mathrm{d}x=1$$

How can we interpret the value of $p(x)$? In the discrete case this is rather simple: $p(x)$ is the probability of measuring a value $x$ from out experiment. That is, $$p(x)=\mathbb{P}(X=x).$$

But in the continuous case, one must be more careful with how we interpret things. Consider two possible measurements $x_1$ and $x_2$. If $p(x_1)>p(x_2)$, then $\exists\delta>0$ such that $\forall\epsilon<\delta$ (with $\epsilon>0$), $$\Bbb{P}(X\in[x_1-\epsilon,x_1+\epsilon])>\Bbb{P}(X\in[x_2-\epsilon,x_2+\epsilon])$$ In simple terms, we are more likely to measure a value close to $x_1$ than close to $x_2$.

I would recommend watching 3Blue1Brown's video about probability density functions.

$\endgroup$
4
  • $\begingroup$ Throughout this answer I'm doing my best to avoid measure theory. Mainly because I know very little about it, but also because it's probably too advanced to explain this concept. $\endgroup$
    – K.defaoite
    Commented Sep 14, 2020 at 15:48
  • $\begingroup$ -1: Your definition of "distribution" is incorrect. The equation $p(x) = P[\{\omega\}]$ does not make sense as the left-hand-side has $x$ and the right-hand-side has $\omega$. You are blurring the distinction between a density and a probability. Note that a density $f_X(x)$ is not a probability and it can take values larger than 1. A density function $f_X(x)$ must be defined over all $x \in \mathbb{R}$ because it must satisfy (by definition) $P[X\leq x] = \int_{-\infty}^{x} f_X(t)dt$ for all $x \in \mathbb{R}$. $\endgroup$
    – Michael
    Commented Sep 18, 2020 at 15:18
  • $\begingroup$ @Michael That was for the discrete case. As for the use of omega, $\omega$ signifies the event in the sample space such that $M(\omega)=x$ as stated. I distinctly made the point that this does not hold for the continuous case. $\endgroup$
    – K.defaoite
    Commented Sep 18, 2020 at 18:57
  • $\begingroup$ No, you are using $X$ and also $M$, the connection is unclear, and “the event in the sample space such that $M(\omega)=x$” does not make sense (for example why is there only one such $\omega$?) It also seems to assume there are only discrete or continuous cases and no others. You may want to read the Tommik answer. $\endgroup$
    – Michael
    Commented Sep 18, 2020 at 22:45
0
$\begingroup$

One reputable source which is commonly used as a textbook for undergraduates and graduates is Rick Durrett's "Probability: Theory and Examples", which is available as a free PDF at that link.

Many high-school and college level textbooks start by differentiating between "discrete" and "continuous" random variables, and define "probability mass functions" and "probability density functions" specific to these random variables. As @mathematicsstudent1122 requests, Durrett instead defines a "probability distribution" not in terms of a random variable, but a sample space.

Per Durrett, a "probability distribution" on a sample space $\Omega$ is a measure $P$ on $\Omega$ with the property that $P(\Omega) = 1$. "Events" are then just the measurable subsets of $\Omega$, and the "probability of an event" $E \subseteq \Omega$ is just the measure $P(\Omega)$. If $\mathcal{S}$ is some other measure space, an $\mathcal{S}$-valued "random variable" $X$ on $\Omega$ is then a function $X: \Omega \to \mathcal{S}$ which is measurable with respect to $P$.

The first chapter of Durrett's text is devoted to building up the standard relevant machinery of measure theory ($\sigma$-algebras, integration, and so forth). He offers an admirably lucid and concise encapsulation of what differentiates "probability theory" from "measure theory on a space of total measure $1$" at the start of Chapter 2:

"Measure theory ends and probability begins with the definition of independence."

The rest of the text lives up to that level of elegance and insight, and Durrett also offers thought-provoking exercises, including a resolution of the infamous St. Petersburg Paradox (on page 65). Durrett's presentation can be jarringly flippant at times, as exemplified by the following exercise on the Poisson process:

enter image description here

but especially in terms of free resources, you can't do better than Durrett as an introduction to the subject.

Remark: This gives the common definition of a "probability distribution" from the perspective of a working mathematician. Philosophically speaking, what one actually means by a "probability distribution" in everyday life may not exactly correspond to the mathematical formalisms. The Stanford Encyclopedia of Philosophy has an excellent overview of different interpretations of probability, not all of which are equivalent to the standard Kolmogorov axiomatization (which is the basis of Durrett's treatment of the subject, as well as any other textbook on standard probability theory).

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .