Lecture Notes
Business Statistics (QMB 2100)
Probability
Distributions
Introduction
When we calculated the probability of an
event in the last chapter, we assumed that all
the outcomes of a trial were equally likely.
This is an overly-restrictive assumption. This
chapter develops techniques to estimate
probabilities when the outcomes of a trial have
different probabilities, described by a probability distribution.
Random Variables and Probability Models
Random Variables. We will define a random variable this way:
| A random variable is an
event (the outcome of a trial) for
which you would like to calculate a
probability. |
The possible
number of values for the event (the number
of possible outcomes of the trial) must be
greater than one; events that can only be a
single value are constants.
There are two
types of random variables:
- Discrete random variables are
those whose possible values can be
enumerated (a finite set)
- Continuous random variables are those that can take on any
of a set of values that cannot be
enumerated (an infinite or very large
set)
Here is a table
of the examples given in the text (p 72):
Examples of Discrete
and Continuous Random Variables |
| Event |
Possible Values |
Type of Random Variable |
| Number of boys in three children |
0, 1, 2, 3 |
? |
Number of ovens not working out of five in the bakery |
0, 1, 2, 3, 4, 5 |
? |
| Lifetime of a computer chip |
800-1,300 hours |
? |
| Weight of a package |
1 to 5 lbs |
? |
Probability Models,
Distributions, and Histograms
A
probability model is used to
help determine the probabilities of
events.
This chapter discusses two commonly used
probability models:
-
For discrete random
variables, the binomial
probability model
-
For continuous random variables, the
normal probability model
To make a probability
model useful, we need a probability
distribution:
| A discrete probability
distribution consists of all the
possible values of a random variable
and their associated probabilities.
|
For example, for the random variable
"the number of heads in 3 flips of a coin",
we can determine from the previous chapter
that the possible values are 0, 1, 2, and 3,
with the probabilities 0.125, 0.375, 0.375
and 0.125, respectively. (Draw a tree
diagram to see this); this constitutes a
probability distribution, because every
possible outcome has been identified, with
its corresponding probability of occurrence.
We can also construct a probability
histogram, which displays the same
information:

This histogram is a picture
of a probability distribution; note the
total area of the bars is 1.000 (because the
probabilities in a distribution must sum to
1).
Here is the probability histogram for the random
variable "rolling a pair of dice". The probabilities are based
on the tree diagram in the text (p. 35).

A Bernoulli Process and Statistical
Independence
The Characteristics of a Bernoulli
Process.

Jacob (Jacques)
Bernoulli (1654-1705) was a Swiss
mathematician, most noted for his work
in differential equations and
probability theory. He defined and
studied the probability distributions
describing a Bernoulli process,
that describes a
series of trials whose characteristics
are:
-
Each trial has only two
possible outcomes
-
The outcome
probabilities remain
constant from trial to trial
|
The first condition can usually be
satisfied by the design of the
experiment; for example, if the results
of a survey can be reported as
"opposed", "indifferent", and "in
favor", we could classify the actual
results as "opposed" and "not opposed".
The second condition is a characteristic
of the physical process, and has to do
with whether successive trials are
statistically independent; if they are
not, we do not have a Bernoulli process;
for example:
-
A coin flip, or
drawing from a jar of M&Ms of two
colors with replacement are
both Bernoulli processes, because
successive trials are independent
-
Drawing from a jar of
M&Ms of two colors without
replacement is not a Bernoulli
process, because the results of one
draw will affect the next (because
you are affecting the relative
ratios of the two colors)
Computing Joint
Probabilities for a Bernoulli Process.
Joint probability is the probability
that two or more events occur. The joint
probability of events in a Bernoulli process
can be calculated with the special rule
of multiplication:
| The joint probability of
two or more independent
events is equal to the
product of their simple
probabilities. |
Or, describing joint
probability with a formula for two
independent events A and B:
For example: The probability
of randomly drawing a red M&M from a jar
containing 3 red and 7 green M&Ms on one
trial is 0.3. What is the probability of
drawing two red M&Ms in succession, with replacement?
P(Red and Red) = (3 / 10)
· (3
/ 10) =
0.090 (or 9.0%)
Here is a
"puzzler" from Car Talk©: Hank
and Moe went job-hunting. At one likely
employer, the boss said, "You guys look
promising, but I need a guarantee you can be
here 90% of the time. Hank said, "Geez, we
both have old cars. Mine only starts 80% of
the time, and Moe's is even worse – it only
starts 70% of the time." The boss hired them
anyway and they turned out to be the
hardest-working, if not the brightest,
workers he ever had. What did the boss know?
Characteristics of a
non-Bernoulli Process. A non-Bernoulli
process is one in which there are more than
two possible outcomes for each trial, or the
probabilities of each outcome are not the
same from trial to trial. The first we can
usual account for by proper categorizing of
the result into two categories. The second
may not be possible to adjust for.
For example, consider the
jar of 3 red and 7 green M&Ms. If we
randomly draw two M&Ms with replacement,
we have a Bernoulli process. If we draw
two M&Ms without replacement, the
probability of outcomes on successive
trials is not the same, because they is
affected by what has already happened: If we
draw a red M&M on trial 1, there are only 2
red and 7 green M&Ms left in the jar for the
next draw, so the probability of drawing a
red is 2 / 9, not 3 / 10.

Computing Joint
Probabilities for a non-Bernoulli Process.
To compute the joint probability of a
non-Bernoulli process, we need the special rule of multiplication:
| The joint probability of
two or more dependent
events is equal to the
product of their simple and
conditional probabilities. |
As a formula, for two dependent
events A and B:
For the M&M example without
replacement,
P(Red and
Red) = (3 / 10)
· (2 / 9) =
0.067 (or 6.7%)
Here is another example.
Assume there is a department in your company
that has 2 men and 5 women employees. Two
employees are to be selected at random to
serve on the Holiday Party Committee. What
is the probability of two men being
selected? Two women? One man and one woman?
Here is a table (similar to the tree
diagrams in the text):
| Diagram
for the Holiday Party
Committee |
| Choice 1 |
Choice 2 | Choice 1 |
Outcome |
|
M (2/7) |
M (1/6) |
MM (2/7
·1/6 = 0.048) |
| W (5/6) |
MW (2/7
· 5/6 = 0.238)
|
| W (5/7) |
M (2/6) |
WM (5/7
· 2/6 = 0.238) |
| W (4/6) |
WW (5/7
· 4/6 = 0.476) |
Note the probabilities
sum to 1, which is a requirement of a
probability distribution. So we have
these calculated joint probabilities:
P(man and man) =
4.8% P((man and woman) or (woman
and
man)) = 23.8% + 23.8% = 47.6% P(woman
and woman) = 47.6%
More on a Bernoulli Process
Sometimes it is obvious if a
process meets the requirements to be a
Bernoulli process, other times it is less
clear and you have to to think about it a
little. For example:
-
There is a 2.5%
probability that an individual selected
at random in the United States will have an IQ above 130.
What is the probability that 3
individuals selected at random from
across the country will all have IQ's above
130?
-
What is the probability
that three siblings from families
selected at random in the United States
will all have IQ's above 130?
In both cases there are only two outcomes
per trial (IQ > 130, IQ ≤ 130),
so one condition is satisfied. As to the other (condition of
independent outcomes), the first case meets it because
choosing 3 people at random, even without replacement, will
only make a very small difference in the probabilities of
outcomes because the population in question is so large. The
second case does not meet this condition, assuming we
believe there is a genetic component to IQ. If one sibling
has an IQ > 130, the probability that his/her siblings has a
high IQ is greater than 2.5%.
The question of whether a
process qualifies as a Bernoulli process or
not is critical, because as we will see, we
can only use the binomial model to calculate
probabilities if the problem can be
represented as a Bernoulli process.
A process can only be a
Bernoulli process if the trials are
statistically independent. We have already
seen how to estimate independence: with a
contingency table. Here is part of the
contingency table given in the text (p 84):
Latest Child |
Number and Sex of
Previous Children |
| None |
1 Boys |
2 Boys |
3 Boys |
1 Girls |
2 Girls |
3 Girls |
| Boy |
487 |
488 |
500 |
600 |
480 |
460 |
380 |
| Girl |
513 |
512 |
500 |
400 |
520 |
540 |
620 |
| Total |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
1000 |
Are the chances of having a
boy independent of the outcomes of previous
births? We don't need to compute every
probability. Consider just:
P(boy having no
previous children) = 487 / 1000 = 0.487
while at the same time,
P(boy having 3
previous girls) = 380 / 1000 = 0.380
Since these two
probabilities are significantly different,
we would estimate that the event of the sex
of a child is not independent of the sex of
previous children. Hence, we do not have a
Bernoulli process, and will not be able to
use the Binomial model described in the next
section.
The Binomial Model
Probability models are used
to calculate the probabilities for random
variables. Random variables can be discrete
or continuous. This chapter discusses the binomial model and how we can use it to
calculate the probabilities for a random
variable obtained through a Bernoulli
process. That is, the binomial model will
calculate the probability of outcome a
in n trials, where each trial has
only two possible outcomes, and the trials
are independent.
Estimating Binomial
Probabilities by Simulation. One way to
estimate a binomial probability is by
simulating a real world process with random
numbers. A random number is a number from a
series of numbers, each of which has the
property that its probability of appearance
cannot be determined from the pattern of the
previous numbers. In other words, the
numbers are independent of each other.
The text gives a small table
of random digits. Here is another, generated
by this program (if
you are interested in the C source code,
here it is: random.c).
Like the table in the text, these random digits are grouped
by two's just to make the table a little easier to read.
| A Small Table of Random
Digits |
24 47 39 49 85 56 74
60 98 51 89 91 91 79 30 91 10 02 23 12 00 53 88 58 38 41 30 51 21 53 81 25 24 56 84 44 18 49 15 69 35 15 07 07 43 06 85 46 72 94 47 12 65 08 88 18 80 59 45 46 76 77 86 59 55 95 96 55 56 73 46 96 64 57 57 11 67 86 90 88 09 24 05 47 15 44 10 04 02 83 97 06 86 43 18 16 12 54 75 39 |
We can use this table to
simulate what might happen in the real
world. Here is the "bakery problem" from
the text (p. 87):
A bakery has five
ovens. Four of the ovens must be
working in order to meet customer
demand on any given day. The owner
keeps records for major repairs. A
major repair is on that requires a
service call and puts an oven out of
service for the day. A minor repair
... takes only a few minutes to fix
and their are no lost sales.
Based on the owner's
records, each oven has a 10% chance of a
major breakdown on any given day. What
is the probability that at least four
ovens are working on a given day?
First, note that this
problem does describe a Bernoulli
process: each trial (one oven on one
day) has only two possible outcomes
(working, not working), and that the
trials are independent (that status of
one oven does not effect the status of
another).
We can use our table of
random digits to solve this problem this
way: let the digits 0-8 represent the
condition "oven is working", and the
digit 9 to mean "oven is not working."
We will run the simulation by picking
five random digits from the table, one
for each oven, each "day", and recording
the results. Using the table above, the
results for the first day would be
2-4-4-7-3, which we will interpret as
"all ovens working." For the second day,
the series would be 9-4-9-8-5. This
indicates that two ovens are not working
that day.
If we kept up this
procedure, we could determine the number
of days at least four ovens were
working, which, divided by the total
number of days, would approximate the
true probability of at least four ovens
working. The longer we ran the
simulation, the more accurate our
estimate would be. Here is a program
that runs this simulation (executable,
C source code),
and the results of one simulation run of
10,000 days:
| Bakery Simulation |
How
many days to run simulation?
10000
Number of days (of 10000) with n
ovens working:
0: 0 (0.000) 1: 5 (0.001) 2: 74 (0.007) 3: 665 (0.067) 4: 3267 (0.327) 5: 5989 (0.599) |
From the above, we can
calculate that the probability of at
least 4 ovens working on any given day
will be approximately:
P(at least 4
ovens working) = 0.327 + 0.599 =
0.919 (or 91.9%)
Note that this differs
from the text estimate of 0.96, because
a different set of random numbers were
used, and because the above simulation
ran for 10,000 days, rather than the 25
in the text (one of the advantages of
simulation by computer).
Computing Binomial
Probabilities. If you don't have a
computer program handy, or a table of
random numbers or the time to use it to
run a simulation of sufficient length,
we can determine probabilities directly
from the Binomial probability
distribution.
Let's assume we have n trials, each of which can result
in the outcomes {success, failure}, each
with probabilities p and q,
respectively. (Note that since there can
be only two outcomes in a Bernoulli
process, q = 1 – p.) We
designate P(x) as the
probability of x successes in n trials.
The text describes how
to calculate P(x):
| The probability of
x successes in
the number of ways x
can occur, times the
probability of any one
way to obtain x successes. |
For example, what is the
probability of two heads (successes) in
three flips of a coin? The probability
of one way of obtaining two heads, say
HHT, is 0.5 x 0.5 x 0.5 = 0.125. But
there are three ways to obtain two heads
in three throws: HHT, HTH, and THH. So:
P(2 heads in
three flips) = 3 x 0.125 = 0.375 (or
37.5%)
The difficulty is
usually not figuring out the probability
of one way for x to occur, but
how many possible ways x can
occur in n trials. For this, we
need the math concepts of factorial, so
we may as well introduce the binomial
probability formula:
Don't let this equation
scare you! Just because it has a lot of
! in it! Really! It is just another
"plug and chug" formula. The first
factor is the number of ways x
successes can occur in n trials.
(In math it is called a combination.)
The second factor is is the the
probability of any one way to obtain x successes. The exclamation point
is the factorial operator, which
just means that number multiplied by all
the whole numbers less than itself, down
to 1; for example: 5! = 5
· 4
· 3
· 2 · 1 =
120. So, what is the probability of
getting 2 heads in 3 flips:



Another example: What is
the probability of four ovens out of
five working? How about all five? How
about at least four?



Compare this with the
estimated probability obtained by
simulation, above.
Expected Value and
Standard Deviation of the Binomial
Probability Distribution. Every
probability distribution has an expected
value (mean) and standard deviation. The
formulas for these were given in Chapter
2. For the binomial distribution, these
are:


Example (from Triola, pg 232): The
midterm exam in a nursing course
consists of 75 true/false questions.
Assume that an unprepared student makes
random guesses for each of the answers.
(a) Find the mean and standard deviation
for the number of correct answers for
such students. (b) Would it be unusual
for a student to pass this exam by
guessing and getting at least 50 correct
answers?



We would expect an
unprepared student to "guess" correctly
on 37.5 questions, with a standard
deviation of 4.3. Guessing 50 (or more)
answers correctly would fall outside of
our range of "usual" values.
The Binomial Model
and Decision-Making.
We can use the binomial
probability distribution model to
support our management decision-making.
For example (from the text): assume the
cost in orders not fulfilled when at
least 4 ovens are not available on a
given day is $100. Also assume the
bakery can buy another oven for
$175/month (monthly payment on a
business loan). Neglecting taxes,
depreciation, salvage costs and
opportunity costs, would this be a good
investment?
We have already
determined the probability of at least
four ovens working out of five is 0.918,
so the probability of two or more ovens
broken is 1 – 0.918 or 0.082. So, in a
30-day month, we would expect np
= 30 ·
0.082 = 2.46 days in which two or more
ovens are broken, each day of which we
will lose $100, or an expected monthly
loss of $246. We can do the same
calculations to determine the
probability that three or more ovens
broken out of six (if we buy the new
oven), which is 0.016. The expected
number of days in which this will occur
in a month is 30 · 0.016 = 0.48 days,
for an expected loss of $48. So:
Expected
Revenue/Loss for Oven Buying Decision |
| Decision |
Expected Monthly Revenue/Loss from Lost
Orders |
| Don't Buy |
– $246 |
| Buy |
–$48 |
So buying the oven looks
like it will reduce our expected monthly
loss in revenue due to lost orders by
$198. If the oven costs only $175/month,
we would have a decrease in
expected net loss of $23.
Continuous Random
Variables and the Normal Model
So
far we have only looked at calculating
the probabilities of discrete random
variables – those that have a finite
number of outcomes. A Bernoulli random
variable has only two outcomes. But many
processes have continuous outcomes, such
as the expected lifetime of a computer
chip, or the weight of a package, where
the random variable can take on
non-discrete values. The most common
model for calculating probabilities for
a continuous random variable is the
normal (Gaussian) model, studied by Carl
Gauss, German mathematician, 1777-1855
and many others.
Why is the Normal
Model so Useful? The normal model is
useful because:
-
Many processes in
nature are naturally "normal"
-
The normal model can
be used to approximate other models,
particularly when the number of
trials is high
With regard to the
second, here is a probability histogram
of the number of heads in 100 flips of a
coin:

You can see the familiar
normal curve emerging. If we
increased the number of trials, it would
more closely approximate a real normal
curve.
Features of the
Normal Curve. The normal curve has
these characteristics:
-
The normal curve is
symmetric about its mean value
-
The total area under
the normal curve is 1
-
The normal curve
never falls below the x-axis
(it is never negative).
Computing Normal
Probabilities. There are an infinite
number of normal curves, each one
describing a population with a different
mean or standard deviation. The standard normal curve is the curve
that describes a population whose mean
is 0 and standard deviation 1. Of course
there are almost no populations in real
life that meet this specification. But
we can normalize any population
by converting the data into their
corresponding z-scores:
| A z-score
tells you how many
standard deviations the
value of the random
variable is above or
below the mean. |
You calculate a z-score
for a data value xi from a population with mean
x and standard deviation
s like this:
A negative z-score indicates the
value is less than the mean; a positive z-score
indicates a value is greater than the mean.
Example: What is the z-score for an IQ 0f 132 (IQ mean =
100, st. dev. = 15):

Since z-scores
change any normal distribution into one
with a mean of 0 and standard deviation
of 1, we can use just one curve (and one
statistical table) to model any normal
distribution.

Here is a small table of
the standard normal curve, giving the
area under the curve between the mean
and the given z-score. This value
can be interpreted as the probability of
a random variable having a value between
the mean and that z-score.
| Area
Under the Standard Normal
Curve – A Small Table
|
Number
of Standard Units (z-score) |
Area
Between the Mean and the Number of Standard Units Away from the Mean |
| 0.5 |
0.1915 |
| 0.75 |
0.2734 |
| 1.0 |
0.3413 |
| 1.2 |
0.3849 |
| 1.5 |
0.4342 |
| 1.6 |
0.4452 |
| 1.75 |
0.4599 |
| 2.0 |
0.4772 |
| 2.5 |
0.4938 |
| 2.8 |
0.4974 |
Since the area is directly
related to probability, we can use the table
above and the properties of the normal
curve's symmetry to calculate the
probability of a random variable falling in
any given range.
Example: What is the
probability of having an IQ of more than 118
(mean = 100, st.dev. = 15. First calculate
the z-score:

Now look in the table above
to find that a z-score of 1.2
represents 0.4342 of the area under the
curve from the mean. Since the area to the
right of the mean is 0.5, that leaves only
0.0658 above the z-score of 1.2, or
6.6%
Decision-Making and the
Normal Curve. Just as we applied the
binomial model to decision-making with
discrete random variables, we can apply the
normal curve to decision-making with
continuous random variables.
Example (from the text): Two
overnight delivery services make the
following claims:
-
Federal Service: average
delivery time 30 hours, st.dev. 5 hours
-
United Express: average
delivery time 34 hours, st.dev. 2 hours
Assuming delivery times can
be modeled by the normal distribution, which
service has the highest chance of meeting a
36-hour deadline?
To solve this, we need to convert 36 to a
z-score for each delivery service:


We can interpret this to mean that Federal
Service has a probability of delivering your
package within 36 hours to be 0.5 (area to left
of mean) + 0.3849 (area to right of mean up
to 1.2 standard units) = 0.8849 (or 88.5%),
while a United Express delivery within 36
hours is 0.5 + 0.3413 = 0.8413 (or 84.1%).
So we would go with Federal Service. (But
try the same analysis if you could afford to
wait 38 hours, instead of just 36.)
How Do We Really Know the Mean and
Standard Deviation? It's clear from the
above that we need to know the mean and
standard deviation of a probability
distribution to be able to calculate
probabilities. In the case of coin flips we
can base these on theoretical assumptions.
But in the case of delivery services, the
only way to determine the mean and standard
deviation for the population as a
whole is by sampling and estimating using
statistical reasoning, the subject of the
next chapter.
|