Statistics Reference

Confidence Intervals Explained

A point estimate like "average order value: $50" sounds precise and hides everything that matters: a different sample would have given a different number. A confidence interval replaces the single value with an honest range — and comes with an interpretation that is famously easy to get wrong. This guide covers what the interval really claims, how it is built, and what makes it wider or narrower.

What a Confidence Interval Is

A confidence interval is a range of plausible values for an unknown population quantity — a mean, a proportion, a difference between groups — computed from sample data by a procedure with a known long-run success rate. The "95%" in a 95% interval is that success rate: it describes how often the procedure captures the truth across repeated use, not the odds attached to any single result.

The interval bundles three facts into one statement: where the estimate landed, how much sampling noise surrounds it, and how cautious you asked the procedure to be. That is why intervals are more informative than bare estimates and more informative than a plain "significant or not" verdict from a hypothesis test: they show magnitude, direction, and precision all at once.

The Interpretation Most People Get Wrong

Suppose a 95% interval for mean order value comes out as (48.04, 51.96). The natural reading — "there is a 95% probability the true mean is between 48.04 and 51.96" — is the one interpretation the frequentist framework does not license. Here is the distinction:

Correct

"This interval was produced by a method that captures the true mean in 95% of repeated samples. We act as if this is one of the successful 95%."

Not quite

"There is a 95% probability that the true mean lies inside this particular interval."

Why the fuss? Once the data are in, nothing is random anymore. The true mean is a fixed (unknown) number; the computed interval is a fixed range. The interval either caught the mean or missed it — probability no longer applies to this one case, only to the procedure's track record. Think of a ring-toss player who rings the peg on 95% of throws: after a throw lands, the ring is either on the peg or not. The 95% describes the player, not the throw lying on the ground.

The distinction is not pedantry. Treating one interval as a personal 95% probability invites overconfidence in results that happened to come from an unlucky sample — and roughly 1 interval in 20 does, by design. If you want genuine probability statements about the parameter itself, that requires the Bayesian framework and a credible interval, which answers a different question from different ingredients.

Anatomy of the Margin of Error

Nearly every basic confidence interval has the same skeleton:

interval = point estimate ± (critical value × standard error)

The part after the ± sign is the margin of error, and each of its two factors answers to a different master:

Point estimate — the sample's best single guess, such as the sample mean x̄. It anchors the center of the interval.
Critical value — set by the confidence level you demand: 1.645 for 90%, 1.96 for 95%, 2.576 for 99% under the normal model. More confidence means a larger multiplier and a wider interval. This is the only knob that costs no data.
Standard error — the sampling noise in the estimate, σ/√n for a mean. It shrinks only with more data (larger n) or a less variable population (smaller σ).

Reading results in this decomposition is a useful habit: a wide interval from a huge critical value reflects a cautious analyst, while a wide interval from a huge standard error reflects noisy or scarce data. The margin of error calculator lets you isolate exactly this part of the computation.

z or t: Which Critical Value?

The critical value comes from a reference distribution, and there are two candidates. Use z (standard normal) when the population standard deviation σ is genuinely known — rare outside of quality-control settings with long process history — or as a convenient approximation when the sample is large. Use t when σ is estimated by the sample standard deviation s, which is the everyday case. The t distribution's heavier tails widen the interval just enough to account for the extra uncertainty of estimating spread from the same data.

How different are they? The 95% critical value tells the story as the sample grows — t carries a small-sample penalty that fades:

t (df = 9) = 2.262

t (df = 29) = 2.045

t (df = 99) = 1.984

z = 1.960

At ten observations the t multiplier is 15% larger than z; at a hundred, the gap is barely 1%. The safe default is t whenever s comes from the data — it can only make you slightly more honest. The same distribution powers the t-test, which is a confidence interval's decision-making twin.

A Complete Worked Example

An online store samples n = 100 orders and finds a mean order value of x̄ = $50 with sample standard deviation s = $10. Build the 95% confidence interval for the true mean order value:

Standard error: SE = 10 ÷ √100 = 10 ÷ 10 = 1.
Critical value: at 95% confidence with a large sample, z = 1.96.
Margin of error: 1.96 × 1 = 1.96.
Interval: 50 ± 1.96 = (48.04, 51.96).

The store's report reads: "we are 95% confident the true average order value lies between $48.04 and $51.96" — with "confident" carrying the long-run meaning from earlier, a claim about the reliability of the method rather than a probability for this specific range.

Since s was estimated from the sample, a purist would use t with 99 degrees of freedom: 50 ± 1.984 × 1 = (48.02, 51.98). The two intervals differ by two cents on a fifty-dollar mean, which is why large-sample practice tolerates z. At n = 10 the same substitution would visibly matter. The confidence interval calculator runs both versions of this exact computation.

What Moves the Width

Staying with the order-value example (x̄ = 50, SE = 1), watch the interval respond to its two levers:

Confidence level. At 90%, the margin is 1.645; at 95% it is 1.96; at 99% the interval becomes 50 ± 2.576 = (47.42, 52.58). Raising confidence from 95% to 99% stretches the width by about 31% — you buy security with vagueness, using the same data.
Sample size. Quadrupling the sample to n = 400 halves the standard error to 0.5, shrinking the 95% interval to 50 ± 1.96 × 0.5 = 50 ± 0.98. Precision follows √n: each halving of the width costs four times the data.
Population spread. A more variable population (larger σ) widens the interval proportionally, and nothing about the analysis can change it — though better measurement or stratification sometimes can.

One subtlety: the confidence level is chosen, not discovered. Reporting a 90% interval because the 95% one looked too wide is moving the goalposts after the kick — the level belongs in the analysis plan, before the data arrive.

Planning Sample Size Backward

The margin-of-error formula runs in reverse, which turns it into a planning tool. Fix the margin E you can tolerate, and solve for the sample size that delivers it:

n = (z × σ / E)²

Suppose the store wants next quarter's estimate pinned to within ±$1 at 95% confidence, and past data suggest σ ≈ 10. Then n = (1.96 × 10 ÷ 1)² = 384.16, rounded up to 385 orders — rounding down would miss the target margin. Tightening the goal to ±$0.50 quadruples the requirement to 1,537. Running this calculation before collecting data is one of the cheapest quality controls in statistics, and it is exactly what the sample size calculator automates. Skipping it risks the most common outcome in underpowered studies: an interval technically valid and practically useless, spanning every answer anyone proposed.

Other Misreadings to Avoid

It is not a range for individuals. (48.04, 51.96) bounds the plausible mean, not typical orders. Individual orders spread with σ = 10, so most fall far outside the interval.
It is not a 95% catchment for future sample means. A future study has its own sampling noise; predicting its mean needs a wider prediction interval.
Overlapping intervals do not prove no difference. Two group intervals can overlap while the interval for their difference excludes zero. Compare differences directly.
Coverage depends on assumptions. Independence, random sampling, and (for small n) approximate normality do real work. Biased sampling breaks the 95% guarantee silently — the formula computes either way.

Try the Confidence Interval Calculator

Build z and t intervals for means and proportions, and see how the margin of error responds to your sample size and confidence level.

Frequently Asked Questions

What does 95% confident actually mean?

The 95% describes the procedure, not the single interval in front of you. If you repeated the same study many times and built an interval from each sample the same way, about 95% of those intervals would capture the true parameter and about 5% would miss it. Any one interval either contains the parameter or it does not — the confidence level is the long-run success rate of the method that produced it.

Is it wrong to say there is a 95% probability the true mean is in my interval?

In the frequentist framework that defines confidence intervals, yes. Once the interval is computed, the true mean is a fixed number and the interval is a fixed range, so there is no probability left — the statement is either true or false, you just do not know which. Probability statements about the parameter itself belong to Bayesian credible intervals, which are built from a different framework with an explicit prior distribution.

When should I use a t interval instead of a z interval?

Use t whenever you estimate the population standard deviation from the sample itself, which is nearly always in practice. The t distribution has heavier tails that pay for the extra uncertainty in that estimate. With large samples the two converge — at 100 observations the t critical value is 1.984 versus 1.96 for z — so the choice matters most for small samples, where t intervals are meaningfully wider and z intervals would be overconfident.

Does a 95% confidence interval contain 95% of the data?

No. A confidence interval for a mean describes uncertainty about the average, not the spread of individual observations. In the worked example, the interval (48.04, 51.96) says the population mean order value is probably near 50; individual orders still range far outside it. An interval meant to cover most individual values is a prediction or tolerance interval, and it is much wider because individual values vary by the full standard deviation, not the standard error.

How much does doubling the sample size narrow the interval?

Width shrinks with the square root of the sample size, so doubling n divides the width by about 1.41, not by 2. To cut the margin of error in half you must quadruple the sample: in the worked example, moving from 100 to 400 orders shrinks the 95% margin from 1.96 to 0.98. This square-root economics is why precision gets expensive quickly and why studies plan sample size around a target margin in advance.

Why not always use a 99% confidence interval to be safe?

Higher confidence costs width. With the same data, moving from 95% to 99% swaps the critical value 1.96 for 2.576, stretching the interval by about 31% and making it less informative for decisions. An interval so wide it includes every plausible option offers certainty about nothing. The working compromise in most fields is 95%, with 90% used when narrower ranges are acceptable and 99% when the cost of missing the parameter is high.

References

Moore, McCabe, and Craig. Introduction to the Practice of Statistics.
Casella and Berger. Statistical Inference.
Rice. Mathematical Statistics and Data Analysis.
Wasserman. All of Statistics: A Concise Course in Statistical Inference.

Last reviewed: July 2, 2026

Maintained by MathCalculate Editorial as part of the public math and statistics reference library.