Confidence Intervals

Estimation & Confidence Intervals

There are two ways to estimate a value. With a point estimate you represent the value you’re trying to estimate with one number. Commonly, we’ll use the sample mean (x). Other numbers we could use are the sample mode or sample median, but typically we use the sample mean.

With an interval estimate we represent the value we’re trying to estimate with a range of values and how confident we think our value is within that range. A common place you’ll see this is in polling data for an election. Pollsters report that “we believe Candidate O is 5% ahead of Candidate R plus or minus 3 percentage points. 5% ± 3%. The thing they don’t tell you is how sure they, that is how confident they are, of that estimation. Usually it is implied that they are 95% confident.

More on Confidence Intervals

The confidence intervals we use most often are the 95% confidence intervals, however these are not the only ones. You can also construct 99%, 90% or 12% or 25% confidence intervals. The 95% represents our confidence that our interval contains the population mean (the population mean, μ, is the value we are interested in but can never really know). When we add a higher level of confidence, that is going from 95% to 99%, we have more confidence that the population mean is in our interval. What happens though is the interval gets larger, so of course it’s more likely to contain μ. Here are some commonly used confidence intervals, with the confidence level in red and the precision in green.

90% CI: x ± 1.64 σ_x
95% CI: x ± 1.96 σ_x
99% CI: x ± 2.58 σ_x

We can make the confidence intervals smaller by increasing sample size, n. Remember that σ_x = σ / √n. So we can make the standard error (σ_x) smaller, by increasing the sample size (n).

One final point. We are not saying that there is a 95% probability that the mean is in the confidence interval. Technically that is not correct. That’s why we don’t call it a probability interval. Instead we call it a “confidence” interval because we feel that confident that the mean is in there. If we took 100 samples of size n, and constructed 100 confidence intervals, 95% of the time we’ll get the population mean (I said “sample mean in the video, that’s not correct) in the interval. 5% of the time the population mean will not be in the interval.

Precision and Accuracy

Accuracy of an estimate is the degree of closeness to the actual (true) value. In our examples, this means how close the sample mean (x) is to the population mean (μ), the true value we are trying to estimate. The precision of our estimate can also be called reproducibility or repeatability. It is how often repeated measurements will show the same value, that is how closely clustered together they are.

Student’s T-test

The final piece of the puzzle with our confidence intervals is the fact that we have been assuming that we know what the standard error of the sample mean (σ_x) is. Chances are that we don’t. This is usually an unknown quantity. However we can estimate it using the standard deviation (s) of the sample. What we get is the estimated standard error (s_x). The formula for this is:

s_x = s / √n

And we’ll use this estimate in our formula for calculating the confidence intervals.

95% CI: x ± 1.96 σ_x

… now becomes…

95% CI: x ± 1.96 s_x

If we do this, we can no longer use our z-scores and z-tables. We instead need to use something called a t-scores and t-tables. These tables take into account the estimates that we’re making.

The t-tables also takes into account the sample size (n). The bigger the sample size the more closely the t-distribution will look like the z-distribution. The smaller the sample, the more squished flat (widely distributed) the t-distribution would look compared to the t-distribution. When using the t-table, you need to find the one that corresponds to the degrees of freedom, which is n – 1 (one less than the sample size).

Student’s T-test Problem Example

So now we’ll use this t-score and t-table to calculate a 95% confidence interval.

95%-CI = x ± t (s_x) = x ± t (s / √n)

One fun piece of statistics trivia (and perhaps the ONLY fun piece of statistics trivia): the student’s t-test was actually created by a mathematician working at the Guinness Brewery in Dublin. He was afraid to publish using his real name (Gosset), so he used the pseudonym “Student” since he was also taking career advancement classes at the time. Just goes to show: Guinness is good for you!

Test your comprehension

With this problem set on confidence intervals.