Statistics Reference

Probability Distributions: A Practical Guide

A probability distribution is a complete inventory of what a random quantity can do: every possible outcome, each weighted by how likely it is. Learn a handful of standard distributions and you gain reusable models for coin flips, call volumes, waiting times, and measurement error. This guide profiles the five families you will meet first and shows how to pick among them.

Two Kinds of Randomness: Discrete and Continuous

Every distribution belongs to one of two camps, and the split decides how probability is even written down.

Discrete

The variable takes separate, countable values: 0, 1, 2, and so on. A probability mass function (PMF) assigns a probability to each value directly, and questions like P(X = 3) have nonzero answers. Counts of defects, goals, clicks, and arrivals live here.

Continuous

The variable can land anywhere in a range: 12.4 minutes, 12.41, 12.417. A probability density function (PDF) spreads probability over intervals, so only ranges have probability — P(X = 12.4 exactly) is zero, but P(12 < X < 13) is an area under the curve. Times, weights, and concentrations live here.

The camp matters practically: discrete models answer "how many?" while continuous models answer "how much?" or "how long?". Misclassifying the variable is the fastest way to pick the wrong distribution, so settle this question before anything else. The probability calculator handles the basic event arithmetic that underlies both camps.

Binomial: Successes in a Fixed Number of Trials

The binomial distribution counts successes across a fixed number of independent yes-or-no trials that share the same success probability. Its two parameters are the number of trials n and the per-trial success probability p.

mean = np    variance = np(1 − p)

  • Trials are fixed in advance — you know n before collecting data.
  • Each trial ends in exactly one of two outcomes.
  • Trials are independent with a constant p throughout.

Mini example

A student guesses on 10 true-false questions, so n = 10 and p = 0.5. The probability of exactly 5 correct answers is C(10,5) × 0.5¹⁰ = 252 ÷ 1024 ≈ 0.2461 — comfortably the most likely single outcome, yet still under 25%. The expected score is np = 5 correct.

Typical uses: conversion counts out of a fixed number of visitors, defective units in a fixed batch, patients responding out of n enrolled. Explore any n and p combination with the binomial distribution calculator.

Poisson: Event Counts at an Average Rate

The Poisson distribution counts events that occur randomly but at a steady average rate over a fixed stretch of time or space — with no fixed ceiling on how many can occur. Its single parameter λ is the average number of events per window, and it does double duty:

mean = λ    variance = λ

  • Events occur one at a time, independently of each other.
  • The average rate is stable across the window you are modeling.
  • There is no natural fixed n — the count could in principle be any size.

Mini example

A print shop averages λ = 3 equipment jams per week. The probability of a jam-free week is P(X = 0) = e⁻³ ≈ 0.0498 — about one week in twenty. That mean-equals-variance signature is also a diagnostic: if real counts vary much more than their average, the Poisson model is suspect.

Typical uses: support tickets per hour, typos per page, mutations per genome segment, accidents per month. Work through rates and counts with the Poisson distribution calculator.

Normal: The Bell Curve of Accumulated Influences

The normal distribution is the symmetric bell curve that emerges whenever a quantity is the sum of many small, independent influences — which, by the central limit theorem, covers an enormous range of real measurements and virtually all sample means. Its parameters are the center μ and spread σ.

mean = μ    variance = σ²

  • Values cluster symmetrically around a single peak.
  • About 68% of values fall within 1σ of μ, about 95% within 2σ.
  • Extreme values are rare but never strictly impossible.

Mini example

Adult male heights in a population run roughly normal with μ = 70 inches and σ = 3 inches. The chance a randomly chosen man stands over 76 inches is P(Z > 2) ≈ 0.0228 — about 1 in 44. Two standard deviations does the work: 76 = 70 + 2 × 3.

Typical uses: measurement error, standardized test scores, process output around a target, and the sampling distributions behind confidence intervals and z-tests. Compute areas and critical values with the normal distribution calculator.

Uniform: Every Value in a Range Equally Likely

The continuous uniform distribution is the model of pure ignorance within known bounds: any value between a minimum a and a maximum b is equally plausible, and nothing outside is possible. Its density is a flat line of height 1/(b − a).

mean = (a + b) / 2    variance = (b − a)² / 12

  • Hard limits exist on both sides and are known.
  • No region inside the limits is favored over any other.
  • Probability of an interval is just its length divided by (b − a).

Mini example

A shuttle departs every 30 minutes and you arrive at a random moment, so your wait is Uniform(0, 30). The average wait is (0 + 30) ÷ 2 = 15 minutes, the variance is 30² ÷ 12 = 75, and P(wait < 10) = 10 ÷ 30 ≈ 0.3333.

Typical uses: rounding error, arrival offsets against a fixed schedule, and random number generation — the uniform on (0, 1) is the raw material from which simulations build every other distribution.

Exponential: Waiting Time Until the Next Event

The exponential distribution measures the waiting time until the next event when events occur at a steady average rate — it is the continuous companion of the Poisson. Its parameter is the rate λ (events per unit time), giving a heavily right-skewed curve: short waits are common, long waits increasingly rare.

mean = 1 / λ    variance = 1 / λ²    P(X > t) = e^(−λt)

  • Models time (or distance) between independent events.
  • Memoryless: having already waited 10 minutes does not change the distribution of the remaining wait.
  • If counts per window are Poisson(λ), gaps between events are Exponential(λ).

Mini example

A sensor fails on average once every 10 hours, so failures follow an exponential with mean 10 (λ = 0.1 per hour). The chance the sensor survives beyond 15 hours is P(X > 15) = e^(−1.5) ≈ 0.2231.

Typical uses: time between support calls, component lifetimes without wear-out effects, gaps between radioactive decays, time to the next website signup.

Side-by-Side Comparison

DistributionTypeParametersMeanVarianceQuestion It Answers
BinomialDiscreten (trials), p (success prob.)npnp(1 − p)How many successes in n fixed trials?
PoissonDiscreteλ (average rate)λλHow many events in a fixed window?
NormalContinuousμ (mean), σ (std. dev.)μσ²Where does a measurement fall?
UniformContinuousa (min), b (max)(a + b) / 2(b − a)² / 12Where in a range, all spots equal?
ExponentialContinuousλ (rate), mean 1/λ1 / λ1 / λ²How long until the next event?

Notice the family resemblances: binomial and Poisson both count, normal and uniform both measure, and exponential times the gaps that Poisson counts. Those pairings are the backbone of the decision process below.

How to Choose the Right Distribution

Distribution choice is a modeling decision, not a lookup. The reliable path is to interrogate the process that generates the data:

  1. Count or measurement? Counts push you toward binomial or Poisson; measurements toward normal, uniform, or exponential.
  2. Is there a fixed number of trials? A known n with yes-or-no outcomes says binomial. Events accumulating over open-ended time or space say Poisson.
  3. Modeling the wait instead of the count? If the question is "how long until the next event?" rather than "how many events?", switch from Poisson to exponential.
  4. Symmetric pile-up or hard bounds? Many small additive influences suggest normal. Known hard limits with no interior preference suggest uniform.
  5. Check the fit. Compare a histogram against the candidate shape, check whether the mean-variance relationship holds (Poisson counts should have variance near their mean; binomial variance below it), and be suspicious of normal models for data that cannot go below zero and sit close to it.

When two models seem plausible, prefer the one whose assumptions you can defend from the mechanism — and remember that a distribution can be good enough for one purpose (approximating a mean) while failing another (predicting extreme tails).

A Complete Worked Example: One Website, Three Distributions

A small e-commerce site receives signups at a steady average of 4 per hour. Watch how three different questions about the same process each pull in a different distribution.

  1. How many signups in the next hour? Counts at a steady rate over a fixed window: Poisson with λ = 4. P(exactly 2) = e⁻⁴ × 4² ÷ 2! = 8e⁻⁴ ≈ 0.1465, and a completely empty hour has probability P(0) = e⁻⁴ ≈ 0.0183 — rare, but it will happen roughly once every 55 hours.
  2. How long until the next signup? Same process, but now the question is a waiting time: exponential with mean 60 ÷ 4 = 15 minutes. The probability of a gap longer than 30 minutes is P(X > 30) = e^(−30/15) = e⁻² ≈ 0.1353. A half-hour silence is uncommon, not alarming.
  3. How many of the next 10 signups buy something? Now there is a fixed n = 10 and a per-signup conversion probability, say p = 0.2: binomial. P(exactly 3 buyers) = C(10,3) × 0.2³ × 0.8⁷ ≈ 0.2013, with an expected np = 2 buyers.

One business process, three questions, three distributions — chosen by the structure of each question (count in a window, gap between events, successes in fixed trials), not by staring at a histogram. This is the habit worth building: let the mechanism nominate the model, then verify with data.

Try the Binomial Distribution Calculator

Compute exact and cumulative binomial probabilities for any number of trials and success probability, with mean and variance included.

Frequently Asked Questions

What is the difference between discrete and continuous distributions?

A discrete distribution describes a variable that can only take separate, countable values, such as the number of defects in a batch or heads in ten coin flips; each possible value gets its own probability. A continuous distribution describes a variable that can take any value in a range, such as weight or waiting time; probability is spread over intervals and computed as area under a density curve rather than assigned to single points.

How do I know which probability distribution fits my situation?

Start from the mechanism that generates the data rather than from the data alone. Ask whether the outcome is a count or a measurement, whether there is a fixed number of trials, whether events occur at a steady average rate, and whether values are bounded. A fixed number of independent yes-or-no trials points to the binomial; counts of events in a fixed window point to the Poisson; waiting times between such events point to the exponential; sums of many small influences point to the normal. Then check the fit against actual data before relying on it.

When should I use the Poisson distribution instead of the binomial?

Use the binomial when there is a fixed, known number of trials n and each trial either succeeds or fails, such as 200 emails of which some bounce. Use the Poisson when events occur at an average rate over a continuum of time or space with no meaningful upper limit on the count, such as calls arriving per hour. The Poisson also serves as a limiting approximation of the binomial when n is large and p is small with np moderate.

Why is the probability of any single exact value zero in a continuous distribution?

Because a continuous variable has infinitely many possible values in any interval, the probability assigned to one exact point is zero — there is no area under a curve above a single point. That is why continuous distributions answer questions about ranges, such as the probability a delivery takes between 20 and 30 minutes, and why the density function can exceed 1 without contradiction: only areas, not heights, are probabilities.

What do the parameters of a distribution actually control?

Parameters are the tuning knobs that pin a specific member out of a family of curves. The normal family needs a center (the mean) and a spread (the standard deviation); the binomial needs the number of trials and the success probability; the Poisson needs only the average event rate. Changing a parameter changes the location, spread, or shape of the distribution, but not the family's basic mathematical form.

What should I do if my data do not match any standard distribution?

First try transformations or mixtures: right-skewed data sometimes become normal after a log transform, and data drawn from two regimes may be a mixture of two simple distributions. If no parametric family fits, nonparametric methods — empirical distributions, bootstrapping, rank-based tests — let you proceed without assuming a specific shape. A poor fit is information: it often reveals outliers, dependence, or a data-generating process different from the one you assumed.

References

  • Rice. Mathematical Statistics and Data Analysis.
  • Casella and Berger. Statistical Inference.
  • Ross. A First Course in Probability.
  • Moore, McCabe, and Craig. Introduction to the Practice of Statistics.

Last reviewed: July 2, 2026

Maintained by MathCalculate Editorial as part of the public math and statistics reference library.