Lecture Notes
Business Statistics (QMB 2100)
Sampling, Confidence Intervals, and
Hypothesis Testing
Introduction
In general, we do not know what the mean and standard
deviation of a population under study are. In such cases, we
can't, for example, take a single data value, compute its z-score,
and use the binomial or normal models to determine if it is
"unusual".
This chapter discusses procedures for estimating the mean and
standard deviation of a population by sampling, to a
desired level of confidence. We can then use these
estimates to test hypotheses about the populations.
Sampling and What Can Go Wrong
Here are five key definitions we need for the material in
this chapter:
- Population
- The entire group of objects about which
you want some information.
- Census
- The collection of data from all members
of a population.
- Sample
- Data collection from part of a
population, used to gain knowledge about the
population.
- Parameter
- A numerical fact about a population
(like mean or standard deviation, usually
not known exactly, unless determined by a
census.
- Statistic
- A numerical fact about a sample, used to
estimate a parameter.
|
To illustrate: A hospital wants to shift to a new
supplier of a latex gloves due to cost considerations. The
gloves they had been using had a mean defect rate of one in
1,000. The new supplier insists their gloves meet or exceed
this standard. The hospital has asked for a trial shipment
that can be tested to validate this claim.
- Population: All the new gloves that will be
obtained from the new supplier over the life of the
contract
- Parameter: The actual mean defective rate for
the population of new gloves. (The population mean is
referred to as μ (the
lowercase Greek letter "mu"). This can be
calculated only by testing every pair of gloves (i.e.,
a census), but this is not practical, since
the gloves are rendered unusable after the test.
- Sample: The collection of gloves the hospital
will actually test (the initial shipment)
- Statistic: The calculated mean of the sample
data (usually called the sample mean, and given
the symbol x,
or x-bar).
Pictorially, this example looks like this:

It is obvious that the accuracy of our estimate
of the population mean depends on how representative a
sample the test shipment is of the population. There are two
factors that affect how representative a sample is.
Sample Bias. If a sample statistic
consistently under- or overestimates the population
parameter, it is due to sample bias, of which there are
three main type:
-
Selection Bias occurs when the sample
is chosen in such a way as to systematically exclude or
over-represent some segment of the population. The goal is
to obtain a simple random sample, in which each
individual in the population has an equal chance of being
included in the sample. (The glove sample above is not a
simple random sample. Why?) Simply increasing sample size
will not remove selection bias.
-
Response Bias is a factor in public
opinion polling, where individual's responses may be
influenced by the interviewer or the wording of a question.
Increasing a sample size will not remove response bias.
-
Non-response Bias. In any poll or
survey, members of a chosen sample may choose not to
participate. To the extent non-respondents differ from
respondents, this can result in the equivalent of selection
bias: some segment of the sample (and therefore the
population) will be underestimated. Increasing the sample
size by re-sampling non-responders can reduce non-response
bias.
Sampling Error. Even if we eliminate all
sample bias, a sample statistic from any particular sample may
not accurately estimate the true population parameter. This is
the sampling error. The more spread out the population
around the mean, the higher the probability that even a simple
random sample may, by chance, include some outliers which will
affect our estimates of the population parameters. Since
standard deviation is a measure of spread, we can conclude that
the smaller the population sample deviation, the smaller will be
the sampling error for the same sample size. Increasing an
unbiased sample's size will decrease sample error. There are
statistical techniques to determine the sample size required to
attain a given degree of confidence in estimating a population
parameter.
How Gallup Selects Its Sample
A reputable polling organization like Gallup
carefully manages its sampling to minimize or correct for both
sample bias and sampling error. Through the use of techniques
like multistage cluster sampling, sample sizes of only
1,500 respondents can predict within a few percentage points the
true population parameters of the the USA on issues and
candidates.
The Sampling Distribution and the Central Limit
Theorem
The purpose of inferential statistics
is to estimate population parameters from sample statistics,
then use these estimates to test claims about the populations.
Different samples are likely to yield different sample
statistics, and hence different estimates of the population
parameters. Which is correct?
If we take every possible simple sample (of the
same size), we could calculate a sample statistic (like x-bar)
for each. The outcomes would form a probability distribution,
which is called the sampling distribution. We will use an
example below, where the population is size 5, and we take all
possible samples of size 2 and calculate the mean.
Some definitions for the discussion that
follows:
| Terminology Table |
| |
Sample
Statistic |
Population
Parameter |
Sampling
Distribution |
| Size |
n |
N |
n |
| Mean |
x-bar
 |
mu
(µ) |
mu(x-bar)
 |
| Variance |
s2 |
sigma2
(σ2) |
sigma2(x-bar)
 |
| Standard Deviation |
s |
sigma
(σ) |
sigma(x-bar)
 |
Here is a population of 5 students in a typing class, and the
number of words per minute each has achieved on a mid-term test:
Sue (58), Sam (51), Sal (40), Sid (42), and Sol (68).
Assume we want to estimate the population mean (the mean of
the the five students typing speed) by finding the sample mean
(x-bar) of a sample of size 2. What would we find? Here are all
the possible outcomes as a table and a histogram.
| Samples of Size 2 |
| Sample |
xi |
x-bar |
| Sue, Sam |
58, 51 |
54.5 |
| Sue, Sal |
58, 40 |
49.0 |
| Sue, Sid |
58, 42 |
50.0 |
| Sue, Sol |
58, 68 |
63.0 |
| Sam, Sal |
51, 40 |
45.5 |
| Sam, Sid |
51, 42 |
46.5 |
| Sam, Sol |
51, 68 |
59.5 |
| Sal, Sid |
40, 42 |
41.0 |
| Sal, Sol |
40, 68 |
54.0 |
| Sid, Sol |
42, 68 |
55.0 |
|
mu(x-bar) |
51.8 |
|
 |
Mean of the Sampling Distribution. As you can see,
depending on which of the 10 samples we picked, the resulting
x-bar would be different. But if we look at the x-bar from all
the possible samples, their resulting histogram takes on a
roughly "normal" shape, implying that the means of the samples
will tend toward the mean of the the population. In fact, if you
change the vertical axis on the histogram from "Number of
Samples" to "Probability" (0.1, 0.2, 0.3 ...), you will see that
we really have a probability distribution.
Probability distributions have a mean, which we can calculate
as mu(x-bar) in the table on the left. The result of 51.8 is
exactly the same as the true population mean. The sample mean is
called an unbiased estimator of the population mean, in
than it tends neither to over- or understate the the population
mean.
Standard Deviation of the Sampling Distribution. The
sampling distribution also has a standard deviation, sigma(x-bar),
which in our example can be calculated to be 6.3. We don't
directly calculate this value, though, because in real life we
would never have the results of all possible samples of size
n. We instead use this formula:
Or, more commonly in statistics texts, with this notation:
This is called the standard error of the mean. We can
make two observations: (1) To find it, we need to know the
standard deviation of the population (sigma) and, (2) to
reduce the standard error of the mean, we use as large a sample
size as practical.
What Good is Knowing the Mean and Standard Error of the
Sampling Distribution? These are the two factors we will
need to draw inductive inferences about a population from a
sample.
The Central Limit Theorem. The key to our being able
to draw inductive inferences is a theorem we will not prove (but
see a good
Wikepedia article on the CLT if you like math):
| For a population with any
distribution, the sampling distribution
approaches a normal distribution as the sample
size increases. And, if the original population
has mean mu and standard deviation
sigma, then the mean of the sampling
distribution will also be mu, and the
standard deviation of the sampling distribution
will be sigma /
√n. |
Here is a picture of the sampling distribution for a large
sample size (n). It is symmetric about the mean of the
sampling distribution mu(x-bar), and the population mean mu. The
horizontal axis is given in units of standard error of the mean,
sigma / √n.

Drawing Inductive Inferences Through Confidence Intervals
The key to this section, and chapter, is to realize that when
we estimate a parameter by calculating a sample statistic, there
is a degree of uncertainty in our estimation. There is no way of
knowing whether or not we picked a sample whose mean is close to
the population mean or not. But we can, using the Central Limit
Theorem, construct a confidence interval that will
contain the true population mean with any stated level of
confidence.
This confidence interval (assuming the population standard
deviation in known) is defined as:
Or, in conventional statistics notation:
where z is the number of standard deviations from the
mean (found in a normal table). The term z
· sigma /
√n
is the margin of error, indicating how close
the sample mean is likely to be from the actual population mean.
This table gives the required z value for the most common
desired levels of confidence:
| Confidence Levels and Required z
Value |
Desired Level
of Confidence |
Required z Value
(Critical Value) |
| 80% |
1.282 |
| 90% |
1.645 |
| 95% |
1.960 |
| 99% |
2.575 |
To see why this is true, consider the this table of the
normal curve, showing why a z of 1.96 is required for a
confidence level of 95% – twice the area (probability) under the
curve from the mean to the z value of 0.475.

Example: A random sample of 28 families of the
322 in a precinct indicated a median family income of $2,830 per
month. Construct the 90% and 95% confidence intervals for the
mean monthly income of all families in the precinct. (The
standard deviation of the population is known to be 750 from
previous studies.)
90% confidence interval = 2,830
± margin of error
= 2,830 ± (1.645 · 750 / √28)
= 2,830 ± 232.78
= [2,597.22, 3,062.78]
95% confidence
interval = 2,830 ± margin of error
= 2,830 ± (1.960 · 750 / √28)
= 2,830 ± 277.36
= [2,552,64, 3,107.36]
We can also use the
equation above to solve for the sample size n required to
estimate a population mean at a given confidence level with a
desired margin of error E:
Example (from
Triola p. 352): How many adults must be randomly selected to
estimate the mean FICO (credit rating) score of working adults
in the United State. We want 95% confidence that the sample mean
is within 3 points of the population mean, and the population
standard deviation is 68.

So Who Really
Knows the Population Standard Deviation?
We
usually don't know the population standard deviation (sigma),
unless there have been previous studies on our population (like
IQ, for example, which has been extensively studied). When we
don't know sigma, we can use the sample standard
deviation (sigma(x-bar)) in its place, but we can no
longer use the normal distribution as above. Rather, we will use
the Student's t distribution, which has the effect of
expanding our confidence intervals.
The photograph at left is Sir R.A. Fisher,
1890-1962, English statistician and geneticist, who studied the
t distribution and devised the "t-test" discussed
below.
There are different curves for the t
distribution, which depend on the sample size. Specifically, the
curves represent different degrees of freedom, where the
degrees of freedom is equal to the sample size minus one. We
need to use a different table to obtain the appropriate t
value for the formulas that follow:
| A Short Table of t
Values |
Sample Size
Minus One
(n - 1) |
Desired Level of Confidence |
| 80% |
90% |
95% |
99% |
| 4 |
1.533 |
2.132 |
2.776 |
4.604 |
| 9 |
1.383 |
1.833 |
2.262 |
3.250 |
| 16 |
1.337 |
1.746 |
2.120 |
2.921 |
| 25 |
1.316 |
1.708 |
2.060 |
2.787 |
| 36 |
1.306 |
1.689 |
2.030 |
2.722 |
Normal curve
(z value) |
1.282 |
1.645 |
1.960 |
2.575 |
As you can see from the table above, the larger the sample
size, the more closely the t value approaches the z
value at the same desired level of confidence. For sample sizes
above 100, they are nearly identical. For a more complete table,
see
here.
Using this table, we can estimate the population standard
deviation with the sample standard deviation, which we have
already seen to be:
Constructing Confidence Intervals for the
Unknown Population Mean when the Population Standard Deviation
is Unknown. Here is the formula for the confidence interval
for the mean when the population standard deviation is unknown:
Example: Assume you are determining the number of pounds of fish
eaten per week per family, and have these results from a sample
of 10 families:
1.5, 2.0, 0.0, 1.0, 2.5, 2.0, 0.0, 2.0, 3.5, 1.0
You can compute the sample mean to be 1.55, and the standard
deviation to be 1.09. What is your estimate for the population
mean at the 90% confidence level?

If you know the population standard deviation was 1.09 (say
from a previous national study), what
would be your estimate at the same level of confidence?

Constructing Confidence Intervals for the Unknown
Population Percentage (Proportion). Sometimes we are
interested in estimating a population proportion (such as "what
proportion of American adults drink milk at least once a week").
We do this much like our estimate of the population mean, but
with different notation.
We will call the sample percentage "p-hat", r
the number of sample "successes" and n the sample size.
Then:
And the confidence interval for an unknown population
proportion is:
Example: (From Triola, p. 342): A study of 420,095 Danish
cell phone users found that 135 of them developed cancer of the
brain or nervous system. Prior to this study of cell phone use,
the rate of such cancer was found to be 0.0340% for those not
using cell phones. The data are from the Journal of the
National Cancer Institute.
(a) Construct a 95% confidence interval estimate of the
percentage of cell phone users who develop cancer of the
brain or nervous system:


(b) Do cell phone users appear to have a rate of cancer of
the brain or nervous system that is different from the rate
of such cancer among those not using cell phones?
No, because the rate of cancer among non-cell phone users
(0.0340%) is within the confidence interval computed above.
Some Final Thoughts on Confidence Intervals
Summarizing what we have done above:
- Confidence level and confidence interval size are
directly related: that is, the higher the confidence you
want, the wider the confidence interval will be (for a fixed
sample size)
- Sample size and confidence interval size are inversely
related: the larger the sample size, the small the
confidence interval (at a given confidence level)
- If we can reduce the inherent variability in the
population, we reduce the population standard deviation,
which in turn decreases the confidence interval size; this
is the basic principle underlying statistical quality
control
Hypothesis Testing: An Alternative Approach to Confidence
Intervals
Although based on the same principles, hypothesis testing is
often used in statistics because it is easier to state problems
using hypotheses than by using confidence intervals. This
section discusses how to test hypotheses about population
means and population proportions.
Testing Hypotheses About Unknown Population Means. A
hypothesis is a claim about an unknown population
mean or proportion. Each problem has two hypotheses:
- The null hypothesis (called "h-naught" and
abbreviated Ho) is the claim of the status quo:
that a treatment had no effect on the mean, for example. The
null hypothesis is of the form mu
≥ 5, or mu
≤ 2.34. The null hypothesis
will always have a claim of equality in it (greater than or
equal to, less than or equal to).
- The alternative hypothesis (abbreviated H1)
is the opposing claim, for example mu < 30 or mu
> 2.34
We assume the null hypothesis is true unless we can
demonstrate, based on sample data at the desired level of
confidence, that the alternative hypothesis is true. The level
of confidence is related to two potential types of errors we can
make:
| Type I and Type II Errors |
| |
Reality |
| Decision |
H0 is true |
H0 is false |
| Reject H0 |
Type I
Error
P(type I Error) = α |
Correct
decision |
| Fail to reject H0 |
Correct
decision |
Type II
Error
P(type II error) =
β |
For any given problem you will need to define the hypotheses
and estimate the consequences of making a Type I or Type II
error, in setting the level of significance: the maximum
risk you are willing to accept in making a Type I error (called
α). The test suggests:
- If the Type I error is potentially costly and the the
Type II error is not, set α
very low, at 0.05 or less; this is the most common case and
most common value for α
- If the Type I error is not
costly but the Type II error is, set α higher, perhaps ≥
0.25
- If both types of errors are
costly, set α very low and increase the sample size to
reduce the probability of Type II errors
Once we can state the
hypotheses, evaluate the consequences, and choose a level of
significance, the statistics underlying a hypothesis test is
straightforward: if our sample statistic is less likely to have
(randomly) occurred than our level of significance, we reject
the null hypothesis; otherwise, we fail to reject the null
hypothesis. Formally:
- The critical value (CV)
is the value of the test statistic that equates to our
tolerance of a Type I error
- The test statistic
for claims about an unknown population mean is the
z-score of sample mean:
Or, in common statistics notation:
Here are the rules for rejecting or failing to reject Ho.
| Critical Values for
Population Mean Tests |
| α |
Left-tailed (H1: mu < x)
test z < CV to reject |
Right-tailed (H1: mu > x)
test z > CV to reject |
| 0.10 |
-1.282 |
+1.282 |
| 0.05 |
-1.645 |
+1.645 |
| 0.02 |
-2.055 |
+2.055 |
| 0.01 |
-2.575 |
+2.575 |
Example (Triola p. 430): When 40
people used the Weight Watchers diet for one year, their mean
weight loss was 3.0 lb. Assume the standard deviation of all
such weight changes is 4.9 lb and use a 0.01 significance level
to test the claim that the mean weight loss is greater than 0.
Based on these results, does the diet appear to be effective?
Does the diet appear to have practical significance?
Ho: mu ≤
0 lb weight loss
H1: mu > 0 lb weight loss
α = 0.01
n = 40
sigma = 4.9
x-bar = 3.0
CV = 2.575

Since z > CV we reject Ho. There is sufficient
evidence to support the claim that the mean weight loss is
greater than 0. But since the mean weight loss over a year was
only 3 lb, the practicality of the diet is in question.
Testing Hypotheses About Unknown Population Proportions.
Testing claims about population proportions is analogous to
testing claims about means, except the test statistic is:
where p-hat is the sample proportion, p is the
population proportion (the value used in the null hypothesis),
and n is the sample size.
Example: The company Drug Test Success provides a
"1-Panel-THC" test for marijuana usage. Among 300 tested
subjects, results from 27 subjects were wrong (either false
positive or false negative). Use a 0.05 significance level to
test the claim that less than 10% of the test results are wrong.
Does the test appear to be good for most purposes?
Ho: p ≥
0.10 of test results are wrong
H1: p < 0.10 of test results are wrong
α = 0.05
n = 27
p-hat = 27 / 300 = 0.09
CV = -1.645

Since this is a left-tailed test, and z is not < the critical
value, we do not reject the null hypothesis that 10% or more of
the results are wrong. The test appears to be of limited value
due to its error rate.
|