Lecture Notes

Business Statistics (QMB 2100)

Sampling, Confidence Intervals, and Hypothesis Testing

Introduction

In general, we do not know what the mean and standard deviation of a population under study are. In such cases, we can't, for example, take a single data value, compute its z-score, and use the binomial or normal models to determine if it is "unusual".

This chapter discusses procedures for estimating the mean and standard deviation of a population by sampling, to a desired level of confidence. We can then use these estimates to test hypotheses about the populations.


Sampling and What Can Go Wrong

Here are five key definitions we need for the material in this chapter:

Population
The entire group of objects about which you want some information.
Census
The collection of data from all members of a population.
Sample
Data collection from part of a population, used to gain knowledge about the population.
Parameter
A numerical fact about a population (like mean or standard deviation, usually not known exactly, unless determined by a census.
Statistic
A numerical fact about a sample, used to estimate a parameter.

To illustrate: A hospital wants to shift to a new supplier of a latex gloves due to cost considerations. The gloves they had been using had a mean defect rate of one in 1,000. The new supplier insists their gloves meet or exceed this standard. The hospital has asked for a trial shipment that can be tested to validate this claim.

  • Population: All the new gloves that will be obtained from the new supplier over the life of the contract
  • Parameter: The actual mean defective rate for the population of new gloves. (The population mean is referred to as μ (the lowercase Greek letter "mu"). This can be calculated only by testing every pair of gloves (i.e., a census), but this is not practical, since the gloves are rendered unusable after the test.
  • Sample: The collection of gloves the hospital will actually test (the initial shipment)
  • Statistic: The calculated mean of the sample data (usually called the sample mean, and given the symbol x, or x-bar).

Pictorially, this example looks like this:

It is obvious that the accuracy of our estimate of the population mean depends on how representative a sample the test shipment is of the population. There are two factors that affect how representative a sample is.

Sample Bias. If a sample statistic consistently under- or overestimates the population parameter, it is due to sample bias, of which there are three main type:

  1. Selection Bias occurs when the sample is chosen in such a way as to systematically exclude or over-represent some segment of the population. The goal is to obtain a simple random sample, in which each individual in the population has an equal chance of being included in the sample. (The glove sample above is not a simple random sample. Why?) Simply increasing sample size will not remove selection bias.

  2. Response Bias is a factor in public opinion polling, where individual's responses may be influenced by the interviewer or the wording of a question. Increasing a sample size will not remove response bias.

  3. Non-response Bias. In any poll or survey, members of a chosen sample may choose not to participate. To the extent non-respondents differ from respondents, this can result in the equivalent of selection bias: some segment of the sample (and therefore the population) will be underestimated. Increasing the sample size by re-sampling non-responders can reduce non-response bias.

Sampling Error. Even if we eliminate all sample bias, a sample statistic from any particular sample may not accurately estimate the true population parameter. This is the sampling error. The more spread out the population around the mean, the higher the probability that even a simple random sample may, by chance, include some outliers which will affect our estimates of the population parameters. Since standard deviation is a measure of spread, we can conclude that the smaller the population sample deviation, the smaller will be the sampling error for the same sample size. Increasing an unbiased sample's size will decrease sample error. There are statistical techniques to determine the sample size required to attain a given degree of confidence in estimating a population parameter.

How Gallup Selects Its Sample

A reputable polling organization like Gallup carefully manages its sampling to minimize or correct for both sample bias and sampling error. Through the use of techniques like multistage cluster sampling, sample sizes of only 1,500 respondents can predict within a few percentage points the true population parameters of the the USA on issues and candidates.


The Sampling Distribution and the Central Limit Theorem

The purpose of  inferential statistics is to estimate population parameters from sample statistics, then use these estimates to test claims about the populations. Different samples are likely to yield different sample statistics, and hence different estimates of the population parameters. Which is correct?

If we take every possible simple sample (of the same size), we could calculate a sample statistic (like x-bar) for each. The outcomes would form a probability distribution, which is called the sampling distribution. We will use an example below, where the population is size 5, and we take all possible samples of size 2 and calculate the mean.

Some definitions for the discussion that follows:

Terminology Table
  Sample
Statistic
Population
Parameter
Sampling
Distribution
Size n N n
Mean x-bar
mu
(µ)
mu(x-bar)
Variance s2 sigma2
2)
sigma2(x-bar)
Standard Deviation s sigma
(σ)
sigma(x-bar)

Here is a population of 5 students in a typing class, and the number of words per minute each has achieved on a mid-term test: Sue (58), Sam (51), Sal (40), Sid (42), and Sol (68).

Assume we want to estimate the population mean (the mean of the the five students typing speed) by finding the sample mean (x-bar) of a sample of size 2. What would we find? Here are all the possible outcomes as a table and a histogram.

Samples of Size 2
Sample xi x-bar
Sue, Sam 58, 51 54.5
Sue, Sal 58, 40 49.0
Sue, Sid 58, 42 50.0
Sue, Sol 58, 68 63.0
Sam, Sal 51, 40 45.5
Sam, Sid 51, 42 46.5
Sam, Sol 51, 68 59.5
Sal, Sid 40, 42 41.0
Sal, Sol 40, 68 54.0
Sid, Sol 42, 68 55.0

mu(x-bar)

51.8

Mean of the Sampling Distribution. As you can see, depending on which of the 10 samples we picked, the resulting x-bar would be different. But if we look at the x-bar from all the possible samples, their resulting histogram takes on a roughly "normal" shape, implying that the means of the samples will tend toward the mean of the the population. In fact, if you change the vertical axis on the histogram from "Number of Samples" to "Probability" (0.1, 0.2, 0.3 ...), you will see that we really have a probability distribution.

Probability distributions have a mean, which we can calculate as mu(x-bar) in the table on the left. The result of 51.8 is exactly the same as the true population mean. The sample mean is called an unbiased estimator of the population mean, in than it tends neither to over- or understate the the population mean.

Standard Deviation of the Sampling Distribution. The sampling distribution also has a standard deviation, sigma(x-bar), which in our example can be calculated to be 6.3. We don't directly calculate this value, though, because in real life we would never have the results of all possible samples of size n. We instead use this formula:

Or, more commonly in statistics texts, with this notation:

This is called the standard error of the mean. We can make two observations: (1) To find it, we need to know the standard deviation of the population (sigma) and, (2) to reduce the standard error of the mean, we use as large a sample size as practical.

What Good is Knowing the Mean and Standard Error of the Sampling Distribution? These are the two factors we will need to draw inductive inferences about a population from a sample.

The Central Limit Theorem. The key to our being able to draw inductive inferences is a theorem we will not prove (but see a good Wikepedia article on the CLT if you like math):

For a population with any distribution, the sampling distribution approaches a normal distribution as the sample size increases. And, if the original population has mean mu and standard deviation sigma, then the mean of the sampling distribution will also be mu, and the standard deviation of the sampling distribution will be sigma / n.

Here is a picture of the sampling distribution for a large sample size (n). It is symmetric about the mean of the sampling distribution mu(x-bar), and the population mean mu. The horizontal axis is given in units of standard error of the mean, sigma / n.


Drawing Inductive Inferences Through Confidence Intervals

The key to this section, and chapter, is to realize that when we estimate a parameter by calculating a sample statistic, there is a degree of uncertainty in our estimation. There is no way of knowing whether or not we picked a sample whose mean is close to the population mean or not. But we can, using the Central Limit Theorem, construct a confidence interval that will contain the true population mean with any stated level of confidence.

This confidence interval (assuming the population standard deviation in known) is defined as:

Or, in conventional statistics notation:

where z is the number of standard deviations from the mean (found in a normal table). The term z · sigma / n is the margin of error, indicating how close the sample mean is likely to be from the actual population mean. This table gives the required z value for the most common desired levels of confidence:

Confidence Levels and Required z Value
Desired Level
of Confidence
Required z Value
(Critical Value)
80% 1.282
90% 1.645
95% 1.960
99% 2.575

To see why this is true, consider the this table of the normal curve, showing why a z of 1.96 is required for a confidence level of 95% – twice the area (probability) under the curve from the mean to the z value of 0.475.

Example: A random sample of 28 families of the 322 in a precinct indicated a median family income of $2,830 per month. Construct the 90% and 95% confidence intervals for the mean monthly income of all families in the precinct. (The standard deviation of the population is known to be 750 from previous studies.)

90% confidence interval = 2,830 ± margin of error
     = 2,830 ± (1.645 · 750 / √28)
     = 2,830 ± 232.78
     = [2,597.22, 3,062.78]

95% confidence interval = 2,830 ± margin of error
     = 2,830 ± (1.960 · 750 / √28)
     = 2,830 ± 277.36
     = [2,552,64, 3,107.36]

We can also use the equation above to solve for the sample size n required to estimate a population mean at a given confidence level with a desired margin of error E:

Example (from Triola p. 352): How many adults must be randomly selected to estimate the mean FICO (credit rating) score of working adults in the United State. We want 95% confidence that the sample mean is within 3 points of the population mean, and the population standard deviation is 68.

So Who Really Knows the Population Standard Deviation?

We usually don't know the population standard deviation (sigma), unless there have been previous studies on our population (like IQ, for example, which has been extensively studied). When we don't know sigma, we can use the sample standard deviation (sigma(x-bar)) in its place, but we can no longer use the normal distribution as above. Rather, we will use the Student's t distribution, which has the effect of expanding our confidence intervals.

The photograph at left is Sir R.A. Fisher, 1890-1962, English statistician and geneticist, who studied the t distribution and devised the "t-test" discussed below.

There are different curves for the t distribution, which depend on the sample size. Specifically, the curves represent different degrees of freedom, where the degrees of freedom is equal to the sample size minus one. We need to use a different table to obtain the appropriate t value for the formulas that follow:

A Short Table of t Values
Sample Size
Minus One
(n - 1)
Desired Level of Confidence
80% 90% 95% 99%
4 1.533 2.132 2.776 4.604
9 1.383 1.833 2.262 3.250
16 1.337 1.746 2.120 2.921
25 1.316 1.708 2.060 2.787
36 1.306 1.689 2.030 2.722
Normal curve
(z value)
1.282 1.645 1.960 2.575

As you can see from the table above, the larger the sample size, the more closely the t value approaches the z value at the same desired level of confidence. For sample sizes above 100, they are nearly identical. For a more complete table, see here.

Using this table, we can estimate the population standard deviation with the sample standard deviation, which we have already seen to be:

Constructing Confidence Intervals for the Unknown Population Mean when the Population Standard Deviation is Unknown. Here is the formula for the confidence interval for the mean when the population standard deviation is unknown:

Example: Assume you are determining the number of pounds of fish eaten per week per family, and have these results from a sample of 10 families:

1.5, 2.0, 0.0, 1.0, 2.5, 2.0, 0.0, 2.0, 3.5, 1.0

You can compute the sample mean to be 1.55, and the standard deviation to be 1.09. What is your estimate for the population mean at the 90% confidence level?

If you know the population standard deviation was 1.09 (say from a previous national study), what would be your estimate at the same level of confidence?

Constructing Confidence Intervals for the Unknown Population Percentage (Proportion). Sometimes we are interested in estimating a population proportion (such as "what proportion of American adults drink milk at least once a week"). We do this much like our estimate of the population mean, but with different notation.

We will call the sample percentage "p-hat", r the number of sample "successes" and n the sample size. Then:

And the confidence interval for an unknown population proportion is:

Example: (From Triola, p. 342): A study of 420,095 Danish cell phone users found that 135 of them developed cancer of the brain or nervous system. Prior to this study of cell phone use, the rate of such cancer was found to be 0.0340% for those not using cell phones. The data are from the Journal of the National Cancer Institute.

(a) Construct a 95% confidence interval estimate of the percentage of cell phone users who develop cancer of the brain or nervous system:


(b) Do cell phone users appear to have a rate of cancer of the brain or nervous system that is different from the rate of such cancer among those not using cell phones?

No, because the rate of cancer among non-cell phone users (0.0340%) is within the confidence interval computed above.


Some Final Thoughts on Confidence Intervals

Summarizing what we have done above:

  • Confidence level and confidence interval size are directly related: that is, the higher the confidence you want, the wider the confidence interval will be (for a fixed sample size)
  • Sample size and confidence interval size are inversely related: the larger the sample size, the small the confidence interval (at a given confidence level)
  • If we can reduce the inherent variability in the population, we reduce the population standard deviation, which in turn decreases the confidence interval size; this is the basic principle underlying statistical quality control

Hypothesis Testing: An Alternative Approach to Confidence Intervals

Although based on the same principles, hypothesis testing is often used in statistics because it is easier to state problems using hypotheses than by using confidence intervals. This section discusses how to test hypotheses about  population means and population proportions.

Testing Hypotheses About Unknown Population Means. A hypothesis is a claim about an unknown population mean or proportion. Each problem has two hypotheses:

  • The null hypothesis (called "h-naught" and abbreviated Ho) is the claim of the status quo: that a treatment had no effect on the mean, for example. The null hypothesis is of the form mu ≥ 5, or mu ≤ 2.34. The null hypothesis will always have a claim of equality in it (greater than or equal to, less than or equal to).
  • The alternative hypothesis (abbreviated H1) is the opposing claim, for example mu < 30 or mu > 2.34

We assume the null hypothesis is true unless we can demonstrate, based on sample data at the desired level of confidence, that the alternative hypothesis is true. The level of confidence is related to two potential types of errors we can make:

Type I and Type II Errors
  Reality
Decision H0 is true H0 is false
Reject H0 Type I Error
P(type I Error) = α
Correct decision
Fail to reject H0 Correct decision Type II Error
P(type II error) = β

For any given problem you will need to define the hypotheses and estimate the consequences of making a Type I or Type II error, in setting the level of significance: the maximum risk you are willing to accept in making a Type I error (called α). The test suggests:

  1. If the Type I error is potentially costly and the the Type II error is not, set α very low, at 0.05 or less; this is the most common case and most common value for α
  2. If the Type I error is not costly but the Type II error is, set α higher, perhaps ≥ 0.25
  3. If both types of errors are costly, set α very low and increase the sample size to reduce the probability of Type II errors

Once we can state the hypotheses, evaluate the consequences, and choose a level of significance, the statistics underlying a hypothesis test is straightforward: if our sample statistic is less likely to have (randomly) occurred than our level of significance, we reject the null hypothesis; otherwise, we fail to reject the null hypothesis. Formally:

  • The critical value (CV) is the value of the test statistic that equates to our tolerance of a Type I error
  • The test statistic for claims about an unknown population mean is the z-score of sample mean:

Or, in common statistics notation:

Here are the rules for rejecting or failing to reject Ho.

Critical Values for Population Mean Tests
α Left-tailed (H1: mu < x)
test z < CV to reject
Right-tailed (H1: mu > x)
test z > CV to reject
0.10 -1.282 +1.282
0.05 -1.645 +1.645
0.02 -2.055 +2.055
0.01 -2.575 +2.575

Example (Triola p. 430): When 40 people used the Weight Watchers diet for one year, their mean weight loss was 3.0 lb. Assume the standard deviation of all such weight changes is 4.9 lb and use a 0.01 significance level to test the claim that the mean weight loss is greater than 0. Based on these results, does the diet appear to be effective?  Does the diet appear to have practical significance?

Ho: mu 0 lb weight loss
H1: mu > 0 lb weight loss
α = 0.01
n = 40
sigma = 4.9
x-bar = 3.0

CV = 2.575

Since z > CV we reject Ho. There is sufficient evidence to support the claim that the mean weight loss is greater than 0. But since the mean weight loss over a year was only 3 lb, the practicality of the diet is in question.

Testing Hypotheses About Unknown Population Proportions. Testing claims about population proportions is analogous to testing claims about means, except the test statistic is:

where p-hat is the sample proportion, p is the population proportion (the value used in the null hypothesis), and n is the sample size.

Example: The company Drug Test Success provides a "1-Panel-THC" test for marijuana usage. Among 300 tested subjects, results from 27 subjects were wrong (either false positive or false negative). Use a 0.05 significance level to test the claim that less than 10% of the test results are wrong. Does the test appear to be good for most purposes?

Ho: p ≥ 0.10 of test results are wrong
H1: p < 0.10 of test results are wrong
α = 0.05
n = 27
p-hat = 27 / 300 = 0.09

CV = -1.645

Since this is a left-tailed test, and z is not < the critical value, we do not reject the null hypothesis that 10% or more of the results are wrong. The test appears to be of limited value due to its error rate.

 Updated 11.14.2009