# Ewens's sampling formula

In population genetics, **Ewens's sampling formula**, describes the probabilities associated with counts of how many different alleles are observed a given number of times in the sample.

## Definition[edit]

Ewens's sampling formula, introduced by Warren Ewens, states that under certain conditions (specified below), if a random sample of *n* gametes is taken from a population and classified according to the gene at a particular locus then the probability that there are *a*_{1} alleles represented once in the sample, and *a*_{2} alleles represented twice, and so on, is

for some positive number *θ* representing the population mutation rate, whenever *a*_{1}, ..., *a*_{k} is a sequence of nonnegative integers such that

The phrase "under certain conditions" used above is made precise by the following assumptions:

- The sample size
*n*is small by comparison to the size of the whole population; and - The population is in statistical equilibrium under mutation and genetic drift and the role of selection at the locus in question is negligible; and
- Every mutant allele is novel. (See also infinite-alleles model.)

This is a probability distribution on the set of all partitions of the integer *n*. Among probabilists and statisticians it is often called the **multivariate Ewens distribution**.

## Mathematical properties[edit]

When *θ* = 0, the probability is 1 that all *n* genes are the same. When *θ* = 1, then the distribution is precisely that of the integer partition induced by a uniformly distributed random permutation. As *θ* → ∞, the probability that no two of the *n* genes are the same approaches 1.

This family of probability distributions enjoys the property that if after the sample of *n* is taken, *m* of the *n* gametes are chosen without replacement, then the resulting probability distribution on the set of all partitions of the smaller integer *m* is just what the formula above would give if *m* were put in place of *n*.

The Ewens distribution arises naturally from the Chinese restaurant process.

## See also[edit]

- Chinese restaurant table distribution
- Coalescent theory
- Unified neutral theory of biodiversity
- Biomathematics

This article includes a list of references, related reading or external links, but its sources remain unclear because it lacks inline citations. (August 2011) (Learn how and when to remove this template message) |

## Notes[edit]

- Warren Ewens, "The sampling theory of selectively neutral alleles",
*Theoretical Population Biology*, volume 3, pages 87–112, 1972. - H. Crane. (2016) "The Ubiquitous Ewens Sampling Formula",
*Statistical Science*, 31:1 (Feb 2016). This article introduces a series of seven articles about Ewens Sampling in a special issue of the journal. - J.F.C. Kingman, "Random partitions in population genetics",
*Proceedings of the Royal Society of London, Series B, Mathematical and Physical Sciences*, volume 361, number 1704, 1978. - S. Tavare and W. J. Ewens, "The Multivariate Ewens distribution." (1997, Chapter 41 from the reference below).
- N.L. Johnson, S. Kotz, and N. Balakrishnan (1997)
*Discrete Multivariate Distributions*, Wiley. ISBN 0-471-12844-9.