Hypothesis Testing

Let’s Go To Court

Before we start our journey into hypothesis testing, let’s first take a look at an analogy. The courtroom.

The 10 steps

To do hypothesis testing, we really need to go through 10 different steps. This allows us to make sure we don’t forget anything. It can be done in less, and in some of the examples afterward I’ll show you how, but for now let’s look at these ten steps:

Data
Assumptions
Hypothesis
Test Statistic
Distribution of the test statistic
Decision Rule
Calculation of the test statistic
Statistical decision
Conclusion
p-values

An example

In this example, we’ll go through the 10 steps and see if a sample of a population can support the hypothesis that the mean age of people in that population is 30.

Data: the sample of 10 people whose age we take, they have a sample mean age of 27
Assumptions: here we know the population’s standard deviation (σ). This is something you typically won’t know. For this example, we know it. In the next example, we don’t.
Hypothesis: the null hypothesis is that the mean age IS 30. The alternative hypothesis, the one we want to prove, is that it is NOT 30.
Test Statistic: because we know know the population’s standard deviation we can use this = (x – μ)/σ_x = (x – μ)/σ/√n
Distribution of the test statistic: we’ll use the z-distribution (since the initial distribution was normal and we know σ
Decision Rule: calculated from the z-table
Calculation of the test statistic
Statistical decision
Conclusion
p-values: also calculated from the z-table

We can do the same with confidence intervals

You sat through some long videos, so here’s a short one.

If we don’t know population parameter sigma, use the t-table

As we said before, you normally will not know the standard deviation of the population. Most population parameters are unknown. So we need to estimate it using the standard deviation of the sample (s). Using this estimate means we need to use the t-distribution instead of z. σ_x = s/√n.

Why confidence intervals are better than P-values

Now that we’ve looked at hypothesis testing, confidence intervals and the difference between clinical and statistical significance, we can examine the difference between p-values and CI’s. Most journals report a point-estimate of what they’re trying to measure with a p-value. This gives you an idea of whether this point-estimate is statistically significant.

Many journals are now requiring the reporting of confidence intervals over p-values. CI’s give you additional information:

First they give you an idea of the precision of the point-estimate. You’re given a range. A huge range is worse than a nice, tight, narrow range.
Secondly they give you an idea of whether the estimate is clinically significant. If the extreme ends of that range include values that clinically are important, then the study found a clinically significant difference.

Reporting only p-values and point-estimates eliminates this extra (and very useful) information.

The other point covered in this video is the concept of the point of no difference to determine statistical significance. There are two ways to compare numbers: subtraction and division.

In subtraction type comparisons, the point-of-no-difference will be zero. One thing minus the same thing equals zero. If your confidence interval includes zero, the study is not statistically significant. You can recognize subtraction type comparisons by words such as “difference” or “reduction” (e.g., “risk reduction”).
In division type comparisons, the point-of-no-difference will be one. One thing divided by the same thing equals one. If your confidence interval includes one, the study is not statistically significant. You can recognize these division type comparisons by the word “ratio” (e.g., “odds ratio” or “risk ratio”).

Test your comprehension

With this problem set on hypothesis testing.