Covariance Calculator
Measure whether two variables move together. Paste paired X and Y data, choose the sample or population formula, and get the covariance with the means, the Pearson correlation for context, and — for small data sets — a full deviation-products table showing every step of the computation.
Population or Sample Covariance?
Sample covariance (divide by n − 1): your pairs are a sample standing in for a larger population — the usual case in coursework and research.
Population covariance (divide by n): your pairs are the entire group of interest, with nothing left unmeasured.
The choice changes only the divisor, but on small data sets the difference is visible, so decide before you quote a number.
Preparing Paired Data
- X and Y must have the same number of values, and order matters: the third X is paired with the third Y.
- Separate values with commas, spaces, or line breaks; two spreadsheet columns pasted one after the other work directly.
- Do not sort either list independently — sorting breaks the pairing and destroys the covariance you are trying to measure.
Enter the first variable, separated by commas or spaces
Enter the second variable, in the same order as X
Related Calculators
Variance Calculator
Compute population or sample variance with every deviation and squared term shown.
Standard Deviation Calculator
Calculate standard deviation, variance, and spread with clear statistical outputs.
Mean Absolute Deviation (MAD) Calculator
Calculate the average distance from the mean step by step, plus the median absolute deviation.
What Covariance Measures
Covariance asks one question of every pair: when X sits above its mean, does Y tend to sit above its mean too? Each pair contributes the product of its two deviations. Both deviations positive or both negative — the pair pushes the covariance up; one of each — the pair pulls it down. Summing over all pairs and dividing by the chosen divisor gives the final number.
Sample: cov(X, Y) = Σ(xᵢ − x̄)(yᵢ − ȳ) / (n − 1)
Population: cov(X, Y) = Σ(xᵢ − μₓ)(yᵢ − μᵧ) / n
A positive covariance means the variables rise together, a negative one means one rises as the other falls, and a value near zero means no linear pattern. What the raw number cannot tell you is how strong the relationship is: covariance carries the units of X times the units of Y, so measuring height in centimeters instead of meters multiplies the covariance by 100 without changing the relationship at all.
From Covariance to Correlation
The fix for the units problem is standardization: divide the covariance by the product of the two standard deviations. The result is the Pearson correlation coefficient — literally a standardized covariance — which always lands between −1 and +1 regardless of units:
r = cov(X, Y) / (sₓ · sᵧ)
This page reports r alongside the covariance because the pair answers different questions: covariance gives the direction and the raw co-movement in original units (which matrix methods like PCA and portfolio variance need), while r grades the strength of the linear relationship on a universal scale. Conveniently, r is the same whether you standardize sample or population covariance — the divisors cancel. For scatter plots, interpretation guides, and significance testing of r, see the correlation calculator.
Sample vs Population: The n − 1 Question
Dividing by n − 1 (Bessel's correction) compensates for using the sample means x̄ and ȳ instead of the unknown population means. Deviations measured from a sample's own means are systematically a little too small — the sample means chase the data — so dividing by the slightly smaller n − 1 re-inflates the estimate to make it unbiased. It is the same correction, for the same reason, as in the variance calculator; in fact, the variance of X is just cov(X, X).
With 500 pairs the two divisors differ by 0.2% and the distinction is academic. With 5 pairs, the sample covariance is 25% larger than the population version — exactly the ratio you can verify in the worked example below.
Worked Example: Study Hours and Exam Scores
Five students report study hours X = 2, 4, 6, 8, 10 and score Y = 65, 70, 75, 85, 90 on the exam. Step by step:
- Means: x̄ = 30 ÷ 5 = 6 hours; ȳ = 385 ÷ 5 = 77 points.
- Deviations: X gives −4, −2, 0, 2, 4; Y gives −12, −7, −2, 8, 13.
- Products: (−4)(−12) = 48, (−2)(−7) = 14, (0)(−2) = 0, (2)(8) = 16, (4)(13) = 52.
- Sum: 48 + 14 + 0 + 16 + 52 = 130.
- Sample covariance: 130 ÷ (5 − 1) = 32.5 hour·points; population covariance: 130 ÷ 5 = 26.
- Correlation: with Σ(xᵢ − x̄)² = 40 and Σ(yᵢ − ȳ)² = 430, r = 130 ÷ √(40 × 430) ≈ 0.9912.
Every deviation product is zero or positive: students above the average study time were above the average score without exception. The covariance of 32.5 reports that co-movement in raw hour·point units, and the correlation of 0.9912 grades it — an almost perfectly linear relationship. Note the sample figure (32.5) is exactly 5/4 of the population figure (26), the n/(n − 1) ratio at n = 5.
Frequently Asked Questions
What does the sign of covariance tell me, and what does it not tell me?
The sign gives direction: positive means the variables tend to rise together, negative means one rises as the other falls, near zero means no linear pattern. The magnitude, however, is not a strength grade - it depends on the units of both variables, so a covariance of 500 can describe a weaker relationship than a covariance of 0.5 measured on different scales. Use the Pearson r shown alongside for strength.
Why did I get a different covariance than my textbook for the same data?
Almost certainly a divisor mismatch. This calculator offers both conventions: sample covariance divides by n - 1 and population covariance divides by n. Textbook exercises usually specify which to use; if your answer is exactly n/(n-1) times the expected one, you picked the other convention.
Can covariance be zero even when the variables are clearly related?
Yes. Covariance only detects linear relationships. A perfect U-shaped pattern, where Y is high at both extremes of X and low in the middle, produces deviation products that cancel out, giving a covariance near zero. Always plot paired data before concluding that no relationship exists.
Why does the calculator say Pearson r is undefined for my data?
Correlation divides by the standard deviations of both variables. If every X value is identical, or every Y value is identical, that variable has zero standard deviation and the division is impossible. The covariance itself is still computed - it is zero in that case - but no strength grade exists for a variable that never moves.
Does the order of my X and Y lists matter?
Pairing matters; which list you call X does not. Covariance is symmetric, so cov(X, Y) = cov(Y, X) and swapping the two boxes gives the same result. But the i-th X must belong with the i-th Y: shuffling or sorting one list independently rewires the pairs and produces a covariance for data that never existed.