Lecture Notes
Business Statistics (QMB 2100)
Basic Probability
Introduction to probability
We talk everyday about the probability that
something will occur. What does the term probability mean? How do we estimate the
probability of an event?
These are the topics of this chapter.
Here are some important definitions for our
study of probability:
- Trial
- A trial is a single
occurrence of some procedure
that has a
set of possible outcomes.
- Event
- The
actual outcome of a trial is an
event.
|
For example, the flip of a
coin is a trial, with the possible outcomes
of heads or tails, which we can write in set
notation as {H, T}. If the trial actually happened,
the event would be the actual outcome of the
coin flip. Flipping a coin twice in succession
can also be considered a trial,
with the possible outcomes of {HH, HT, TH,
TT}.
As another example, consider the
trial of whether or not it will rain in zip code
32205 on March 19. The outcomes are {Rain, no
Rain}, each of which has a probability of
occurring.
The concept of randomness
can be defined like this (from Brightman):
- Randomness
-
Random means that the
exact outcome [of a trial] is
not predictable in advance.
However, a predictable long-run
pattern of the outcomes will
emerge after many trials.
|
Considering
the case of flipping a coin twice, if you
perform the procedure many times, you will
find the outcomes follow this pattern:
- Two
heads or two tails: about a quarter of the
time each
- One head and one tail, in either
order, about half the time
So even though
you don't know for sure what will happen in
the next two flips of a coin, you can
estimate the event HH it will occur one
quarter of the time (25.0% or 0.250
probability).
In the case
of rain on a particular day at a particular
location, assume you have the weather
records for the last 20 years that show it
rained only 3 times on March 19 in zip code
32205. You could estimate the probability
that it rains on March 19 at 3
÷ 20 (15.0% or
0.150 probability), neglecting any other
factors. (Of course this is not how a
meteorologist estimates this probability;
hopefully, they consider some of these "other
factors".)
Note in
either case you could be wrong in using
these probabilities to predict the outcome
of the next two coin flips or rain the next
March 19, because each of these estimates
contain a margin of error. What is
the margin of error? Can it be computed?
Basic Probability Concepts
Here is a more formal definition of probability, for events with equally like
outcomes:
- Probability
- The chance or probability of an event is equal
to the number of outcomes for the
event for which you wish to compute
a probability, divided by the total
number of possible outcomes for the
trial.
|
Computing Relative
Frequency Probabilities
Assume you
have a jar that contains 100 M&Ms, 70
red and 30 green.
If you choose one at random (say
blindfolded), what is the probability of
getting a red M&M?
Using the above
definition we could estimate:

So we would estimate
we would choose a red M&M 70% of the
time.
Key to this procedure is the assumption
that all outcomes of the trial are equally likely.
If, for example, green M&Ms were much
larger in size than the red ones, they
might be more likely to be chosen, and
the above analysis would not hold.
Another example: What is
the probability of obtaining two heads
in two flips of a coin? Here we must do
some analysis by counting the number of
possible outcomes (each of which is
equally likely, assuming the coin is
"fair"). Consider this table:
| Diagram
for the Fair Coin Example |
| Flip 1 |
Flip 2 |
Outcome |
| H |
H |
HH |
| T |
HT |
|
T |
H |
TH |
| T |
TT |
Since all outcomes are
equally likely, and only one of the four
is the outcome HH, we can
calculate the probability as:

Another example: What is the probability
of rolling a 3 with two dice? When you
roll two dice, there are 11 possible
outcomes (2 through 12). But they are not
all equally likely outcomes! This is because there is
only one way to roll a 2 (snake eyes), but
six ways to roll a 7. We will
need to determine two numbers – the
number of rolls (outcomes) that result
in a 3, and the total number of possible
outcomes. The text does this by
exhaustively listing all possible
outcomes of rolling a pair of dice. We
can try a little reasoning:
- There are only two
ways to roll a 3 with two dice
labeled A and B: 1 on die A, and 2
on die B, or 2 die A and 1 on die B
-
There are 36
total possible outcomes: 6 possibilities
for die A, and for each of these, 6
possibilities for die B, so there
are 6 x 6 = 36 possible outcomes
Then we can calculate
the probability of rolling a 3 as:

A couple of
observations:
- Sometimes the most difficult part of
the calculation is figuring out how to
count the total number of outcomes and
the number of outcomes in which you are
interested
- Unless otherwise stated, round
probabilities to three decimal places (if
using percentages, to one decimal place),
although trailing zeros are often omitted
Rules of Probability.
Here are two
basic rules of probability:
- Rules of Probability
-
Probabilities are always between 0
(impossible) and 1 (certain); in
terms of percentages, probabilities
must lie between 0% (will never
occur) and 100% (will always occur)
- The sum of the probabilities
of all possible outcomes must equal
1 (100%), which is just another
way of saying that something
(some outcome) must occur.
|
Subjective vs. Relative Frequency
Probabilities
The above
examples use our theoretical understanding of
how a system works (a coin or a pair of dice)
to calculate the frequencies of the outcomes.
If we do not
understand the system completely (like the
weather), we may use historical
knowledge to estimate a frequency (Will it
rain?), or may just have to rely on our best
judgment – this is called personal or
subjective probability
Subjective
probability estimates still need to conform
to the two basic rules of probability listed
above. Estimate the probability of the grade
you will earn in this course:
Subjective Probability Estimates of Grade to be Earned
|
| Grade |
Probability Estimate (%) |
| A |
|
| B |
|
| C |
|
| D |
|
| F or Other |
|
Your probability
estimate for each outcome are most likely
different from each other. We can't use the
procedure above to simply estimate the grade
– just because
there are five possible outcomes, only one of
which is a grade of A, this does not mean the
probability of you earning an A is 1 / 5 =
0.200. So how do we compute the likely
grade you will earn, based on your probability
estimates?
Expected Value and Standard Deviation. We can compute the
expected value of an
event if each outcome has a numeric value and we
know the probability of each outcome. The
expected value E is the sum of the values
of the outcomes (x) multiplied by the
probability of that outcome (P(x));
or, in a more compact formula:
The important thing in applying this
formula is to make sure that all the
probabilities add to 1 (or else there is an
outcome not included, or the sum of
all outcomes is more than certain!).
For example, here is some data from
an unnamed airline on their history of
overbooking. What is the expected number
of passengers who will be bumped from
any flight?
| Overbookings by Airline XYZ |
Number of Booked Passengers Who Could not be Boarded |
Probability of Occurrence |
| 0 |
0.480 |
| 1 |
0.223 |
| 2 |
0.155 |
| 3 |
0.101 |
| 4 |
0.041 |
First, check to make sure the
probabilities add to 1 (they do). Then
calculate the expected value like this:

So we would expect, on average, about 1
person to be "bumped" per flight. But
this is only part of the answer. Is
there a large variation about the expected
value? To determine this we need to compute
the standard deviation.
The standard deviation of a probability
distribution can be calculated with this
formula:
If the standard deviation is
"small" relative to the expected value, then
the expected value is a meaningful estimate
of what to expect for a single event. In
this example, the standard deviation is:

The standard deviation of 1.408 is
approximately the same "size" as the
expected value of 1.010, so we can conclude
that the expected value is a good guess of
how many persons will be bumped the next
time we show up for a flight on this
airline. It is a very good estimate of the
average number of persons that will be bumped
over many flights.
How do we calculate an expected value if the
outcomes are not numeric, as in the grade earned
example? One way is to assign a numeric value to
each outcome. For example, instead of {A, B, C,
D, F}, we could use {4, 3, 2, 1, 0}, much the
same as we do to compute your grade point
average.
Calculating Expected Value Using a Payoff
Matrix. Constructing a payoff matrix is a way to easily compute the expected value of
an event. Here is an example, using the expected
gains/losses to an insurance company on a
life insurance policy (see text, pg 43). The
event: a 42-year old man not dying in the
next year;
the possible outcomes: {lives (probability
0.998), dies (0.002); annual insurance premium:
$222; the payoff in case of death: $100,000.
Payoff Matrix
for Insuring a 42-year old Man for One Year |
Possible Outcomes |
Revenue |
Probability |
Revenue x Probability |
| Lives |
+ $222 |
0.998 |
$221.56 |
| Dies |
+$222 –
$100,000 = –$99,778 |
0.002 |
– $199.56 |
| Expected value =
|
$22 |
The standard deviation can be calculated
to be $4,467.66. Since this is very much
larger than the expected value, we would
conclude that $22 is an unlikely revenue for
any one policy, but represents the average
payoff of very many policies.
Here is another problem: You are considering
investing $1000 in one of two junk bonds. Bond A
has a 6% annual yield, with a default rate of
1%. Bond B has yields 8% annually, but with a
default rate of 4%. (If the bonds default, the
total investment is lost.) Construct a payoff matrix for
each bond to figure this out, using this format:
| Payoff Matrix
Junk Bond A |
Possible Outcomes |
Return |
Prob |
Return x Prob |
| Pays off |
|
|
|
| Defaults |
|
|
|
| Expected value =
|
|
|
| Payoff Matrix
Junk Bond B |
Possible Outcomes |
Return |
Prob |
Return x Prob |
| Pays off |
|
|
|
| Defaults |
|
|
|
| Expected value =
|
|
|
Which bond, if either, should you buy?
Introduction to Statistical Independence
This section introduces another important
probability concept: the statistical
independence of events:
- Statistical
Independence
-
Two events are
statistically
independent if the occurrence of one
event does not affect or is not affected
by the occurrence of the other event.
|
For example, we saw from an above example the
subjective probability of rain (based on
historical data) on March 19 in zip code 32204 is
0.15. If you accept this estimate, but then see
a black cat cross your path, do you change
your estimate of rain? Suppose you look up and
see dark storm clouds gathering. Do you change
your estimate now? Which events are independent?
Which are dependent?
When Logic Works. This section in the
text provides an example analogous to that
above.
Historical
data: 80% of students pass statistics; if
you choose a statistics student at random,
what would you estimate is the probability
that he/she will pass statistics? What if
you had the additional information that:
- The student is over six feet tall
- The student spends over 30
hours/week studying statistics
Does your
estimate of his/her passing change? Due to
which piece of additional information?
Conditional Probabilities. A simple probability or
unconditional probability is the probability
of an outcome of an event given no other relevant information. A
conditional
probability is the probability of an event
given that another, relevant (i.e.,
non-statistically independent) event
occurred.
We can restate this in terms of probability,
using this notation: P(A | B), meaning "the
probability of event A, given than event B has occurred:
|
If P(A) = P(A |
B), then A and B
are independent.
If
P(A) ≠
P(A | B), then
A and B are not
independent. |
When Logic Fails.
Sometimes logic alone can determine if
outcomes are independent or not, even if our
conclusion is counterintuitive. For example:
Trial: flip a fair coin. Possible
outcomes: {H, T}. Probabilities: P(H) = 0.5,
P(T) = 0.5. Our initial estimate of a H
result: 0.5. Additional information: My
three previous flips have resulted in a H
result. Are the outcomes independent?
Why?
For more complicated situations, our logic
may not be enough to determine whether two
outcomes are independent or not. For example:
Trial: Randomly select adult person
from Limón province, Costa Rica. Possible
outcomes: {has malaria, does not have
malaria}. Historical probabilities for
untreated individuals in Limón province,
Costa Rica are: P(malaria) = 0.035, P(no
malaria) =0.965. Our initial estimate of a
random individual contracting malaria:
0.035. Additional information: The
individual sleeps under mosquito netting.
In situations like this, we will need to
collect data and analyze it to determine if the
outcomes are independent or not. One way to do
this is with contingency tables, as discussed in
the next section.
Statistical Independence and Contingency
Tables
Assume that, in a sec discrimination suit, we
have a random sample of 200 employees of a
company, including their gender and promotion
history over the past five years: 120 men, of
which 50 were promoted, and 80 women, of which
15 were promoted.
We can organize this data into a 2-way
contingency table like this:
| Contingency Table
for Sex Discrimination Suit |
| |
Male |
Female |
Total |
| Promoted |
50 |
15 |
65 |
| Not
Promoted |
70 |
65 |
135 |
| Total |
120 |
80 |
200 |
Note that all the 200 observations must be
accounted for in the table, in mutually
exclusive categories. The totals of the columns
must be the same as the totals of the rows.
For an employee selected at random, are the
promotion event outcomes (promoted, not
promoted) independent of the gender outcomes
(male, female)? We can determine these
probabilities from our equation for relative
frequencies:
P(promoted) = 65 / 200 = 0.325 P(promoted | male) = 50 / 120 = 0.417
P(promoted | female) = 15 / 80 =
0.188
In the last two calculations above, the
vertical line ( | ) is used to mean "given
that", and the probabilities are called conditional probabilities.
We can conclude
that the two outcomes are likely not
independent: Knowing the gender of the employee
raises (for men) or lowers (for women) our
estimate of the probability they are promoted.
However, how sure can we be? This data is
only based on a sample. A different sample is
likely to yield different results. Which will be
correct? The degree of uncertainty
in calculations like these is called the margin of error.
There are ways to determine
how large our sample must be to attain any
degree of certainty required. (See Chapter 4.)
Joint Probability
Simple probability and conditional
probability both apply to the outcome of a
single event. Joint probability is
the probability of the outcomes of two or more
events. Using the table above:
| Simple,
Conditional, and Joint
Probabilities |
| |
Example |
Calculation |
| Simple
probability |
P(male) |
120 / 200 =
0.600 |
| Conditional probability |
P(male
| promoted) |
50 / 65 =
0.769 |
| Joint
probability |
P(male
AND promoted) |
50 / 200 =
0.250 |
Remember the definition of trial: a single occurrence of a procedure with a
set of possible outcomes. An event in the
actual outcome of that trial.
Revising Probabilities: The Idea Behind
Bayes' Theorem
|

Thomas Bayes
(1702-1761) |
The above discussion of conditional
probabilities shows we should revise out
estimate of the simple probability of event A
based on the additional information of the
outcome of event B, if the events are related
(i.e., they are not statistically independent).
- How much should we revise our initial
estimate?
- By human nature, not enough (conservatism
bias)
- Bayes' theorem provides a way to
calculate this revision of probability
estimates
- Brightman uses a contingency table
rather than a formula to demonstrate this
|
Here is an example. (This one uses the same
data as the text "oil well" example in this
section, but with a different scenario):
- A random sample of 200 3rd-graders were
selected at random from Franklin County
public schools and tested for reading
proficiency. Of these, 160 were found to be
reading at grade level or below, and 40 were
above grade level. A new reading text had
been introduced in some of the reading
classes the previous year. This new text had
been used by 40 of the "at or below grade
level" readers and 28 of the the "above
grade level" readers. The rest of the
students had used the old reading text.
How do our estimates of probability
change with additional data? Do you think
that proficiency in reading is related to
the textbook used? Here is the contingency
table:
| Contingency Table for
Reading Proficiency |
| |
New Text |
Old Text |
Total |
| At/below grade level |
40 |
120 |
160 |
| Above grade level |
28 |
12 |
40 |
| Total |
68 |
132 |
200 |
Let us define event A as
picking, at random, a student reading
above grade level., so:
P(A) = 40 /
200 = 0.200
We will define event B
as picking a student who has used the
new textbook. What is the probability of
picking a student reading above grade
level, given that the student used the
new text?
P(A | B) = 28
/ 68 = 0.412
Our estimate of picking
a student performing above grade level
more than doubles if we are given the
additional information that he/she used
the new textbook. We should suspect that
these two events are not independent!
-
Another example (From Triola, pg 610): An opinion polling
firm is interested in whether the gender
of the interviewer had any affect on how
women interview subjects answered the
same question. 800 interviews were
conducted by men and 500 by women
interviewers. Women respondents agreed
with the question asked 512 times when
asked by a man, and 336 times when the
same question was asked by a woman.
Would you suspect the
gender of the interviewer and the
response to the question are not
independent? First, construct the
contingency table:
| Contingency Table
for Interviewer Gender |
| |
Gender of Interviewer |
Total |
| Male |
Female |
| Women who
agree |
512 |
336 |
848 |
| Women who
disagree |
288 |
64 |
352 |
| Total |
800 |
400 |
1200 |
If A is "women who agree" and B is "women
interviewed by a man", then:
P(A) = 848 / 1200 = 0.707
P(A | B) = 512 / 800 = 0.640
Although there is a difference in our
original estimate and estimate updated based
on additional information, is it enough to
conclude that there is actually a difference
in women's responses based on the gender of
the interviewer? We don't know enough about
inferential statistics yet to determine this
– wait until chapter 4. (But it turns out,
statistically speaking, that because of the
large sample size, we can be 99% sure that
there is an actual difference in the
responses of women based on the gender of
the interviewer.)
-
Sometimes the information in
a problem is given in terms of percentages rather than
specific numbers. In this case, just use the percentages to
construct the contingency table. (But make sure you are
using all percentages or all actual numbers in the table.)
Example: Of all the students taking COP2220, 72% passed. Of
those that passed, 60% had received a grade of A in COP1000
(the prerequisite course). Of those that failed COP2220, 80%
received a grade of B or lower in COP1000. Does a student's
grade in COP1000 appear to be an indicator of how well they
will do in COP2220? Fill in this contingency table:
| Contingency Table for
COP1000/2220 |
| |
Passed
COP2220 |
Failed
COP2220 |
Total |
| A in COP1000 |
|
|
|
| B or less in COP1000 |
|
|
|
| Total |
|
|
|
- One more example (from NYT,
October 20, 2009, pg D9): Is garlic helpful
in warding off a cold? In one double-blind
study, published in 2001, British scientists
followed 146 healthy adults over 12 weeks
from November to February. Those who had
been randomly selected [number not stated,
but let's assume it is 58] to receive a
daily garlic supplement came down with 24
colds during the study period, compared with
65 colds in the placebo group. Can you
construct a contingency table for this
study? Do taking garlic and getting a cold
appear to be dependent or independent?
This is a limited application of Bayes'
Theorem, but one sufficient for this course,
that will give you a good tool to analyze quite
a few different problems. For the record, here
is Bayes' Theorem:
If you are adventurous, try to verify this
formula with the above data.
|