Outlier Calculator
Screen any data set for outliers with two standard methods and see exactly which values get flagged, which thresholds did the flagging, and what your data looks like with the outliers set aside. Both methods are stated precisely below, because they do not always agree — and the disagreement is often the most informative part.
Choosing a Method
1.5 × IQR fences (Tukey's rule): flags values far from the quartiles. Robust — the suspect values cannot distort the thresholds that judge them. The default choice, especially for small or skewed data sets.
Z-score threshold: flags values more than a chosen number of standard deviations from the mean (3 is conventional). Appropriate for larger samples that are roughly bell-shaped; see the caveats below for why it struggles on small ones.
Data Entry Tips
- Separate values with commas, spaces, or line breaks; the tool sorts internally, so order is irrelevant.
- Screen one variable at a time — mixing units or merging groups manufactures false outliers.
- A flagged value is a question, not a verdict. The sections after the calculator cover when to remove and when to keep.
Enter numbers separated by commas or spaces
Related Calculators
Quartile Calculator
Calculate Q1, Q2, Q3, IQR, and Tukey fences with automatic outlier detection.
Empirical Rule Calculator
Apply the 68-95-99.7 rule to get one, two, and three standard deviation ranges.
Normal Distribution Calculator
Work with the normal curve, cumulative probabilities, and related z-score outputs.
Both Methods, Stated Precisely
Outlier detection only means something when the rule is written down. These are the exact rules this calculator applies:
IQR method (Tukey's fences):
outlier if x < Q1 − 1.5 × IQR or x > Q3 + 1.5 × IQR
extreme outlier if x < Q1 − 3 × IQR or x > Q3 + 3 × IQR
Z-score method:
z = (x − x̄) / s, outlier if |z| > t
Where:
- Q1, Q3 = quartiles by the median-split (Moore & McCabe) method, IQR = Q3 − Q1
- x̄ = mean, s = sample standard deviation (n − 1 divisor)
- t = your threshold (default 3)
The quartiles here are computed exactly as in our quartile calculator and IQR calculator, so all three tools flag identical values. Individual z-scores for any single value can be explored in the z-score calculator.
Why the Two Methods Disagree
The z-score method has a structural weakness called masking: the outlier it is hunting inflates both the mean and the standard deviation used to judge it, so the culprit drags the goalposts toward itself. The IQR method is immune, because quartiles depend on ranks — a value can wander arbitrarily far without moving Q1 or Q3 by a hair.
Small samples make masking mathematically inescapable. Shiffler (1988) proved that the largest possible |z| in a sample of size n is (n − 1)/√n, which stays below 3 whenever n ≤ 10. In other words, a 10-value data set cannot contain a z-score-3 outlier no matter how extreme the value is — the worked example below shows this happening to a value ten times larger than the rest of its data set. The two methods also part ways on skewed data: z-scores assume symmetric spread around the mean, so on a right-skewed distribution they over-flag the long tail and under-flag the short one, while quartile-based fences adapt to where the data actually sits.
Remove, Keep, or Investigate?
Detection is the easy part; the decision is where analyses go wrong. A defensible workflow:
- Trace the value first. A decimal slip, unit mix-up, or sensor glitch should be corrected at the source or removed — with a note saying so.
- Keep genuine rare events. If the 30-hour delivery really happened, deleting it fabricates a rosier process than the one you run. Consider reporting medians and IQRs, which tolerate the value without being dominated by it.
- Report both versions when in doubt. Running the analysis with and without flagged values shows readers exactly how much the conclusions lean on a handful of points.
- Never delete silently. Documented exclusion criteria, set before looking at results, are what separate cleaning from cherry-picking.
The cleaned data set preview in the results is for exactly this comparison work — it is a what-if view, not a recommendation to discard.
Mild vs Extreme Outliers
Tukey's original scheme used two rings of fences. Values between 1.5 × IQR and 3 × IQR beyond the quartiles are mild outliers — unusual, worth a look, but expected occasionally even in clean data (roughly 0.7% of values from a normal distribution land there). Values beyond 3 × IQR are extreme outliers, which clean normal data essentially never produces; they nearly always trace back to an error or to a genuinely different regime, such as a system failure mixed in with routine measurements. In a box plot, both kinds appear as individual points beyond the whiskers; this calculator separates them so you can prioritize which values to chase down first.
Worked Example: One Value, Two Verdicts
Take the ten values 1, 2, 3, 4, 5, 6, 7, 8, 9, 100 — nine small numbers and one that is obviously different. Both methods, by hand:
- IQR method: the lower half is 1, 2, 3, 4, 5, so Q1 = 3; the upper half is 6, 7, 8, 9, 100, so Q3 = 8; IQR = 5.
- Inner fences: 3 − 7.5 = −4.5 and 8 + 7.5 = 15.5. The value 100 lies far outside, so it is flagged.
- Outer fences: 3 − 15 = −12 and 8 + 15 = 23. The value 100 clears these too — an extreme outlier.
- Z-score method: the mean is 145 ÷ 10 = 14.5 and the sample standard deviation is √(8182.5 ÷ 9) ≈ 30.1524 — both hugely inflated by the 100.
- The verdict flips: z for 100 = (100 − 14.5) ÷ 30.1524 ≈ 2.8356, just under the threshold of 3. The z-score method reports no outliers at all.
This is masking in action, and also Shiffler's bound: with n = 10, no z-score can exceed 9/√10 ≈ 2.846, so the conventional |z| > 3 rule literally cannot fire. Lowering the threshold to 2.5 makes the z-score method flag the 100 — but the deeper lesson is that on small samples, the IQR fences are the trustworthy screen.
Frequently Asked Questions
Which detection method should I use?
Default to the 1.5 x IQR fences: they are robust, work on skewed data, and behave sensibly on small samples. Choose the z-score method when your data set is large (a few dozen values or more) and roughly bell-shaped, or when your field's reporting conventions are written in standard deviations. When the two methods disagree, trust the IQR verdict on small or skewed data.
Why did the z-score method find nothing when one value is obviously extreme?
The extreme value inflates the very mean and standard deviation used to test it, an effect called masking. There is also a hard ceiling: in a sample of n values, no z-score can exceed (n - 1)/sqrt(n), which is below 3 for any sample of 10 or fewer values. Either lower the threshold or, better, switch to the IQR method.
Does the z-score method here use the population or sample standard deviation?
The sample standard deviation, with the n - 1 divisor, following the standard NIST definition of the z-score screen. On small data sets this makes the standard deviation slightly larger than the population version, so the screen is slightly more conservative. With more than a few dozen values the two versions give practically identical z-scores.
What is the difference between an outlier and an extreme outlier?
Both come from Tukey's fences. Values beyond 1.5 x IQR from the quartiles are outliers (mild ones, in Tukey's terms); values beyond 3 x IQR are extreme outliers. Mild outliers occur naturally in about 0.7% of normally distributed data, so a few are unremarkable in large samples. Extreme outliers essentially never arise from clean normal data and deserve immediate investigation.
Is it statistically acceptable to delete the outliers this tool finds?
Only with justification. Remove a value when you can show it is an error - a typo, a broken instrument, a unit mix-up. Keep it when it is a real observation, and consider robust summaries (median, IQR) or reporting results with and without it. Deleting real data points solely because they are inconvenient biases every statistic you compute afterward.