Bayes' Theorem Calculator

Update a probability in the light of new evidence. Give the calculator a prior P(A), the likelihood P(B|A), and the false positive rate P(B|¬A), and it returns the posterior P(A|B) — the revised probability of A once B has actually been observed — together with the total probability of the evidence and a plain-language reading of the result.

The Three Inputs

P(A) — prior: how probable the hypothesis is before any evidence, such as the prevalence of a condition.

P(B|A) — likelihood: how often the evidence appears when A is true; for a diagnostic test this is the sensitivity.

P(B|¬A) — false positive rate: how often the same evidence appears when A is false.

Percentages or Decimals — Both Work

Enter probabilities on either scale. If every value is at most 1, the calculator reads them as decimals (0.01 means 1%). If any value is greater than 1, all three are read as percentages (1 means 1%). The result panel states which reading was used, so mixing scales in one calculation is never silent — keep all three inputs on the same scale.

Baseline probability of the hypothesis, e.g. 1 for 1% or 0.01

Probability of the evidence when A is true, e.g. sensitivity 95 or 0.95

Probability of the evidence when A is false, e.g. 5 or 0.05

Bayes' Theorem and the Law of Total Probability

Bayes' theorem inverts a conditional probability: from how often evidence follows the hypothesis, it recovers how probable the hypothesis is once the evidence is seen. The denominator comes from the law of total probability, which splits the evidence across the two ways it can arise:

P(A|B) = P(B|A) × P(A) / P(B)

P(B) = P(B|A) × P(A) + P(B|¬A) × P(¬A)

P(A|¬B) = (1 − P(B|A)) × P(A) / (1 − P(B))

Terminology:

  • P(A) — prior probability of the hypothesis
  • P(B|A) — likelihood of the evidence given A
  • P(A|B) — posterior probability after observing B
  • P(¬A) = 1 − P(A) — prior of the complement

The numerator P(B|A) × P(A) is the "true positive" path: A happens and produces B. The second term of P(B) is the "false positive" path: A does not happen, yet B appears anyway. The posterior is simply the share of all B-occurrences that came through the first path.

Why a 95% Accurate Test Can Be Wrong Most of the Time

When the condition is rare, the small false positive rate applies to a huge group, while the high sensitivity applies to a tiny one — so false positives can outnumber true positives. With a 1% prevalence, 95% sensitivity, and a 5% false positive rate, the two evidence paths contribute 0.95 × 0.01 = 0.0095 and 0.05 × 0.99 = 0.0495. The false-positive path is more than five times heavier, and the posterior lands at just 16.1%. Ignoring that imbalance — judging from the test's accuracy alone — is the base-rate fallacy.

The lesson generalizes far beyond medicine: spam filters, fraud detection, security screening, and quality inspection all hinge on the same arithmetic. Whenever the target event is rare, the prior dominates, and a positive signal mostly reflects the size of the unaffected group. For the conceptual background behind priors and posteriors, see our guide to Bayesian statistics.

Thinking in Natural Frequencies

The same computation is easier to trust when phrased as counts of people instead of probabilities. Take 10,000 people through the screening example:

With the condition: 10,000 × 0.01 = 100 → test positive: 100 × 0.95 = 95

Without the condition: 10,000 × 0.99 = 9,900 → test positive: 9,900 × 0.05 = 495

All positives: 95 + 495 = 590

P(A|B) = 95 / 590 ≈ 0.161017

Out of 590 people who receive a positive result, only 95 actually have the condition. Framing the problem this way makes the posterior almost self-evident and is a reliable way to sanity-check the calculator's output — the fraction 95/590 equals 0.0095/0.0590, the exact ratio Bayes' theorem produces.

Chaining Evidence: the Posterior Becomes the Next Prior

Bayesian updating is iterative. After one piece of evidence, the posterior simply becomes the prior for the next piece. In the screening example, a person who tests positive carries a 16.1017% probability of the condition. Feed that back in as P(A) with the same test characteristics, and a second independent positive lifts the probability to about 78.48% — the evidence compounds. This is why confirmatory retesting is standard practice after a surprising positive.

The chain is only as good as its independence assumption: if the second test fails for the same reason the first did (the same interfering condition, the same lab error), the update overstates the certainty. In practice, confirmatory tests are chosen to use a different mechanism precisely to keep the errors independent.

Worked Example: A Medical Screening Test

A condition affects 1% of a population. A screening test has 95% sensitivity and a 5% false positive rate. A randomly screened person tests positive — how probable is it that they actually have the condition? Enter 1, 95, and 5 (read as percentages), and the calculator works through:

  1. Normalize: P(A) = 0.01, P(B|A) = 0.95, P(B|¬A) = 0.05, so P(¬A) = 0.99.
  2. True-positive path: P(B|A) × P(A) = 0.95 × 0.01 = 0.0095.
  3. False-positive path: P(B|¬A) × P(¬A) = 0.05 × 0.99 = 0.0495.
  4. Total evidence: P(B) = 0.0095 + 0.0495 = 0.0590.
  5. Posterior: P(A|B) = 0.0095 ÷ 0.0590 = 0.161017, i.e. about 16.1%.

A positive result therefore raises the probability of the condition from 1% to about 16.1% — a sixteen-fold update, yet still far more likely to be a false alarm than a true detection. The calculator also reports P(A|¬B) = (1 − 0.95) × 0.01 ÷ (1 − 0.0590) = 0.0005 ÷ 0.9410 ≈ 0.000531: a negative result shrinks the probability to about 0.05%, which is why a clean screening result is genuinely reassuring even when a positive one is inconclusive.

Frequently Asked Questions

How does the calculator decide between percentages and decimals?

If every input is 1 or less, all three are read as decimal probabilities, so 0.01 means 1%. If any input is greater than 1, all three are read as percentages, so 1 means 1%. The result panel always states which reading was applied. Keep all inputs on one scale: entering 50 and 0.5 together would read both as percentages, making 0.5 mean half a percent.

Why is the posterior only 16% when the test is 95% accurate?

Because the condition is rare, the 5% false positive rate applies to the 99% of people without it, producing 495 false positives per 10,000 screened, while 95% sensitivity applied to the 1% with it produces only 95 true positives. Among the 590 positives, just 95/590 = 16.1% are genuine. Test accuracy alone never determines the posterior; the prior always matters.

What is the difference between the false positive rate and specificity?

They are complements: specificity is the probability of a negative result in people without the condition, and the false positive rate P(B|not A) equals 1 minus specificity. If a test is quoted as 95% specific, enter 5 (or 0.05) in the false positive field.

Can I apply the calculator to two tests in a row?

Yes. Run the first test, then use the resulting posterior as the new prior for the second run. In the screening example, a 16.1017% posterior fed back in with the same test characteristics gives about 78.48% after a second positive. This treats the tests as independent given the true condition; if both can fail for the same reason, the chained result is too confident.

Why do I get an error saying P(B) is zero?

P(B) = P(B|A)P(A) + P(B|not A)P(not A) is the overall probability of ever observing the evidence. It is zero only when both paths are impossible, for example a prior of 0 combined with a false positive rate of 0. Conditioning on an impossible event is undefined, so no posterior exists; give at least one path a positive probability.