Distribution of Sample Means

Populations and Samples, Parameters and Statistics

Inferential statistics (as opposed to descriptive statistics) allows us to make informed guesses about values we don’t know. First it is important to distinguish between a population and a sample. A population is the entire group from which we want to know something, for instance the height of everyone in the USA or how all adults will respond to a given drug. Populations tend to be too large to take measurements of everyone, so instead we take a smaller subgroup of the population and get a sample.

We can calculate descriptions of central tendency (like the mean) and variability (like the standard deviation) for both of these distributions, the population and sample. For the population, these are called parameters. This is the true value we want to know, but it’s too hard to measure. For samples, these are called statistics. These we are able to measure, but they don’t include everyone. However we can use the statistics to estimate the parameters.

Distribution of the Sample Means and The Central Limit Theorem

From our population we can take samples and calculate statistics of the sample, such as the mean (which is represented by x-bar, or x).

We can take several samples, each producing its own mean: x₁, x₂, x₃, etc). If you collect all these mean and form their own distribution, you get the distribution of sample means. Now when you take enough samples, and the sample size is big enough (at least 30), then central limit theorem tells you three things about this distribution of the sample means:

The distribution of sample means will be approximately normal shaped if you take enough samples and the sample size (n) is big enough, usually around 30
The mean of the distribution of sample means equals the mean of the population: μ_x = μ
The standard deviation of the distribution of sample means, also called the standard error of the mean is equal to the population standard deviation divided by the square root of the sample size: σ_x = σ/√n

Sample Problem

The mean and standard deviation of serium iron values for helthy men are 120 mcg/dL and 15 mcg/dL, respectively. What is the probability that a random sample of 50 normal men will have a mean between 115 and 125 mcg/dL? You’ll need these z tables.

Sample Problem #2

If the uric acid levels in males is approximately normally distributed with a mean of 5.7 mg percent and a standard deviation of 1 mg percent, find the probability that a sample size of 9 will yield a mean:

more than 6
between 5 and 6
less than 5.2

You’ll need these z tables.

Test your comprehension

With this distribution of sample means problem set.