Lecture Notes

Business Statistics (QMB 2100)

Basic Probability

Introduction to probability

We talk everyday about the probability that something will occur. What does the term probability mean? How do we estimate the probability of an event? These are the topics of this chapter.

Here are some important definitions for our study of probability:

Trial
A trial is a single occurrence of some procedure that has a set of possible outcomes.
Event
The actual outcome of a trial is an event.

For example, the flip of a coin is a trial, with the possible outcomes of heads or tails, which we can write in set notation as {H, T}. If the trial actually happened, the event would be the actual outcome of the coin flip. Flipping a coin twice in succession can also be considered a trial, with the possible outcomes of {HH, HT, TH, TT}.

As another example, consider the trial of whether or not it will rain in zip code 32205 on March 19. The outcomes are {Rain, no Rain}, each of which has a probability of occurring.

The concept of randomness can be defined like this (from Brightman):

Randomness

Random means that the exact outcome [of a trial] is not predictable in advance. However, a predictable long-run pattern of the outcomes will emerge after many trials.

Considering the case of flipping a coin twice, if you perform the procedure many times, you will find the outcomes follow this pattern:

  • Two heads or two tails: about a quarter of the time each
  • One head and one tail, in either order, about half the time

So even though you don't know for sure what will happen in the next two flips of a coin, you can estimate the event HH it will occur one quarter of the time (25.0% or 0.250 probability).

In the case of rain on a particular day at a particular location, assume you have the weather records for the last 20 years that show it rained only 3 times on March 19 in zip code 32205. You could estimate the probability that it rains on March 19 at 3 ÷ 20 (15.0% or 0.150 probability), neglecting any other factors. (Of course this is not how a meteorologist estimates this probability; hopefully, they consider some of these "other factors".)

Note in either case you could be wrong in using these probabilities to predict the outcome of the next two coin flips or rain the next March 19, because each of these estimates contain a margin of error. What is the margin of error? Can it be computed? 


Basic Probability Concepts

Here is a more formal definition of probability, for events with equally like outcomes:

Probability
The chance or probability of an event is equal to the number of outcomes for the event for which you wish to compute a probability, divided by the total number of possible outcomes for the trial.

Computing Relative Frequency Probabilities

Assume you have a jar that contains 100 M&Ms, 70 red and 30 green. If you choose one at random (say blindfolded), what is the probability of getting a red M&M?

Using the above definition we could estimate:

 


So we would estimate we would choose a red M&M 70% of the time.

Key to this procedure is the assumption that all outcomes of the trial are equally likely. If, for example, green M&Ms were much larger in size than the red ones, they might be more likely to be chosen, and the above analysis would not hold.

Another example: What is the probability of obtaining two heads in two flips of a coin? Here we must do some analysis by counting the number of possible outcomes (each of which is equally likely, assuming the coin is "fair"). Consider this table:

Diagram for the Fair Coin Example
Flip 1 Flip 2 Outcome
H H HH
T HT
T H TH
T TT

Since all outcomes are equally likely, and only one of the four is the outcome HH, we can calculate the probability as:

Another example: What is the probability of rolling a 3 with two dice? When you roll two dice, there are 11 possible outcomes (2 through 12). But they are not all equally likely outcomes! This is because there is only one way to roll a 2 (snake eyes), but six ways to roll a 7. We will need to determine two numbers – the number of rolls (outcomes) that result in a 3, and the total number of possible outcomes. The text does this by exhaustively listing all possible outcomes of rolling a pair of dice. We can try a little reasoning:

  • There are only two ways to roll a 3 with two dice labeled A and B: 1 on die A, and 2 on die B, or 2 die A and 1 on die B
  • There are 36 total possible outcomes: 6 possibilities for die A, and for each of these, 6 possibilities for die B, so there are 6 x 6 = 36 possible outcomes

Then we can calculate the probability of rolling a 3 as:

A couple of observations:

  • Sometimes the most difficult part of the calculation is figuring out how to count the total number of outcomes and the number of outcomes in which you are interested
  • Unless otherwise stated, round probabilities to three decimal places (if using percentages, to one decimal place), although trailing zeros are often omitted

Rules of Probability. Here are two basic rules of probability:

Rules of Probability
  1. Probabilities are always between 0 (impossible) and 1 (certain); in terms of percentages, probabilities must lie between 0% (will never occur) and 100% (will always occur)
     
  2. The sum of the probabilities of all possible outcomes must equal 1 (100%), which is just another way of saying that something (some outcome) must occur.

Subjective vs. Relative Frequency Probabilities

The above examples use our theoretical understanding of how a system works (a coin or a pair of dice) to calculate the frequencies of the outcomes.

If we do not understand the system completely (like the weather), we may use historical knowledge to estimate a frequency (Will it rain?), or may just have to rely on our best judgment – this is called personal or subjective probability

Subjective probability estimates still need to conform to the two basic rules of probability listed above. Estimate the probability of the grade you will earn in this course:

Subjective Probability Estimates of
Grade to be Earned
Grade Probability Estimate (%)
A  
B  
C  
D  
F or Other  

Your probability estimate for each outcome are most likely different from each other. We can't use the procedure above to simply estimate the grade – just because there are five possible outcomes, only one of which is a grade of A, this does not mean the probability of you earning an A is 1 / 5 = 0.200. So how do we compute the likely grade you will earn, based on your probability estimates?

Expected Value and Standard Deviation. We can compute the expected value of an event if each outcome has a numeric value and we know the probability of each outcome. The expected value E is the sum of the values of the outcomes (x) multiplied by the probability of that outcome (P(x)); or, in a more compact formula:

The important thing in applying this formula is to make sure that all the probabilities add to 1 (or else there is an outcome not included, or the sum of all outcomes is more than certain!).

For example, here is some data from an unnamed airline on their history of overbooking. What is the expected number of passengers who will be bumped from any flight?

Overbookings by Airline XYZ
Number of Booked Passengers
Who Could not be Boarded
Probability of Occurrence
0 0.480
1 0.223
2 0.155
3 0.101
4 0.041

First, check to make sure the probabilities add to 1 (they do). Then calculate the expected value like this:

So we would expect, on average, about 1 person to be "bumped" per flight. But this is only part of the answer. Is there a large variation about the expected value? To determine this we need to compute the standard deviation.

The standard deviation of a probability distribution can be calculated with this formula:

If the standard deviation is "small" relative to the expected value, then the expected value is a meaningful estimate of what to expect for a single event. In this example, the standard deviation is:

The standard deviation of 1.408 is approximately the same "size" as the expected value of 1.010, so we can conclude that the expected value is a good guess of how many persons will be bumped the next time we show up for a flight on this airline. It is a very good estimate of the average number of persons that will be bumped over many flights.

How do we calculate an expected value if the outcomes are not numeric, as in the grade earned example? One way is to assign a numeric value to each outcome. For example, instead of {A, B, C, D, F}, we could use {4, 3, 2, 1, 0}, much the same as we do to compute your grade point average.

Calculating Expected Value Using a Payoff Matrix. Constructing a payoff matrix is a way to easily compute the expected value of an event. Here is an example, using the expected gains/losses to an insurance company on a life insurance policy (see text, pg 43). The event: a 42-year old man not dying in the next year; the possible outcomes: {lives (probability 0.998), dies (0.002); annual insurance premium: $222; the payoff in case of death: $100,000.

Payoff Matrix for Insuring a 42-year old
Man for One Year
Possible
Outcomes
Revenue Probability Revenue x
Probability
Lives + $222 0.998 $221.56
Dies +$222 – $100,000
= –$99,778
0.002 – $199.56
Expected value = $22

The standard deviation can be calculated to be $4,467.66. Since this is very much larger than the expected value, we would conclude that $22 is an unlikely revenue for any one policy, but represents the average payoff of very many policies.

Here is another problem: You are considering investing $1000 in one of two junk bonds. Bond A has a 6% annual yield, with a default rate of 1%. Bond B has yields 8% annually, but with a default rate of 4%. (If the bonds default, the total investment is lost.) Construct a payoff matrix for each bond to figure this out, using this format:

Payoff Matrix Junk Bond A
Possible
Outcomes
Return Prob Return x
Prob
Pays off                       
Defaults      
Expected value =  
Payoff Matrix Junk Bond B
Possible
Outcomes
Return Prob Return x
Prob
Pays off                       
Defaults      
Expected value =  

Which bond, if either, should you buy?


Introduction to Statistical Independence

This section introduces another important probability concept: the statistical independence of events:

Statistical Independence

Two events are statistically independent if the occurrence of one event does not affect or is not affected by the occurrence of the other event.

For example, we saw from an above example the subjective probability of rain (based on historical data) on March 19 in zip code 32204 is 0.15. If you accept this estimate, but then see a black cat  cross your path, do you change your estimate of rain? Suppose you look up and see dark storm clouds gathering. Do you change your estimate now? Which events are independent? Which are dependent?

When Logic Works. This section in the text provides an example analogous to that above.

Historical data: 80% of students pass statistics; if you choose a statistics student at random, what would you estimate is the probability that he/she will pass statistics? What if you had the additional information that:

  1. The student is over six feet tall
  2. The student spends over 30 hours/week studying statistics

Does your estimate of his/her passing change? Due to which piece of additional information?

Conditional Probabilities. A simple probability or unconditional probability is the probability of an outcome of an event given no other relevant information. A conditional probability is the probability of an event given that another, relevant (i.e., non-statistically independent) event occurred.

We can restate this in terms of probability, using this notation: P(A | B), meaning "the probability of event A, given than event B has occurred:

If P(A) = P(A | B), then A and B are independent.

If P(A) P(A | B), then A and B are not independent.

When Logic Fails.

Sometimes logic alone can determine if outcomes are independent or not, even if our conclusion is counterintuitive. For example:

Trial: flip a fair coin. Possible outcomes: {H, T}. Probabilities: P(H) = 0.5, P(T) = 0.5. Our initial estimate of a H result: 0.5. Additional information: My three previous flips have resulted in a H result. Are the outcomes independent? Why?

For more complicated situations, our logic may not be enough to determine whether two outcomes are independent or not. For example:

Trial: Randomly select adult person from Limón province, Costa Rica. Possible outcomes: {has malaria, does not have malaria}. Historical probabilities for untreated individuals in Limón province, Costa Rica are: P(malaria) = 0.035, P(no malaria) =0.965. Our initial estimate of a random individual contracting malaria: 0.035. Additional information: The individual sleeps under mosquito netting.

In situations like this, we will need to collect data and analyze it to determine if the outcomes are independent or not. One way to do this is with contingency tables, as discussed in the next section.


Statistical Independence and Contingency Tables

Assume that, in a sec discrimination suit, we have a random sample of 200 employees of a company, including their gender and promotion history over the past five years: 120 men, of which 50 were promoted, and 80 women, of which 15 were promoted.

We can organize this data into a 2-way contingency table like this:

Contingency Table for Sex Discrimination Suit
  Male Female Total
Promoted 50 15 65
Not Promoted 70 65 135
Total 120 80 200

Note that all the 200 observations must be accounted for in the table, in mutually exclusive categories. The totals of the columns must be the same as the totals of the rows.

For an employee selected at random, are the promotion event outcomes (promoted, not promoted) independent of the gender outcomes (male, female)? We can determine these probabilities from our equation for relative frequencies:

P(promoted) = 65 / 200 = 0.325
P(promoted | male) = 50 / 120 = 0.417
P(promoted | female) = 15 / 80 = 0.188

In the last two calculations above, the vertical line ( | ) is used to mean "given that", and the probabilities are called conditional probabilities. We can conclude that the two outcomes are likely not independent: Knowing the gender of the employee raises (for men) or lowers (for women) our estimate of the probability they are promoted.

However, how sure can we be? This data is only based on a sample. A different sample is likely to yield different results. Which will be correct?  The degree of uncertainty in calculations like these is called the margin of error. There are ways to determine how large our sample must be to attain any degree of certainty required. (See Chapter 4.)


Joint Probability

Simple probability and conditional probability both apply to the outcome of a single event.  Joint probability is the probability of the outcomes of two or more events. Using the table above:

Simple, Conditional, and Joint Probabilities
  Example Calculation
Simple probability P(male) 120 / 200 = 0.600
Conditional probability P(male | promoted) 50 / 65 = 0.769
Joint probability P(male AND promoted) 50 / 200 = 0.250

Remember the definition of  trial: a single occurrence of a procedure with a set of possible outcomes. An event in the actual outcome of that trial.


Revising Probabilities: The Idea Behind Bayes' Theorem


Thomas Bayes (1702-1761)

The above discussion of conditional probabilities shows we should revise out estimate of the simple probability of event A based on the additional information of the outcome of event B, if the events are related (i.e., they are not statistically independent).
  • How much should we revise our initial estimate?
  • By human nature, not enough (conservatism bias)
  • Bayes' theorem provides a way to calculate this revision of probability estimates
  • Brightman uses a contingency table rather than a formula to demonstrate this

 

Here is an example. (This one uses the same data as the text "oil well" example in this section, but with a different scenario):

  • A random sample of 200 3rd-graders were selected at random from Franklin County public schools and tested for reading proficiency. Of these, 160 were found to be reading at grade level or below, and 40 were above grade level. A new reading text had been introduced in some of the reading classes the previous year. This new text had been used by 40 of the "at or below grade level" readers and 28 of the the "above grade level" readers. The rest of the students had used the old reading text.

How do our estimates of probability change with additional data? Do you think that proficiency in reading is related to the textbook used? Here is the contingency table:

Contingency Table for Reading Proficiency
  New
Text
Old
Text
Total
At/below grade level 40 120 160
Above grade level 28 12 40
Total 68 132 200

Let us define event A as picking, at random, a student reading above grade level., so:

P(A) = 40 / 200 = 0.200

We will define event B as picking a student who has used the new textbook. What is the probability of picking a student reading above grade level, given that the student used the new text?

P(A | B) = 28 / 68 = 0.412

Our estimate of picking a student performing above grade level more than doubles if we are given the additional information that he/she used the new textbook. We should suspect that these two events are not independent!

  • Another example (From Triola, pg 610): An opinion polling firm is interested in whether the gender of the interviewer had any affect on how women interview subjects answered the same question. 800 interviews were conducted by men and 500 by women interviewers. Women respondents agreed with the question asked 512 times when asked by a man, and 336 times when the same question was asked by a woman.

Would you suspect the gender of the interviewer and the response to the question are not independent? First, construct the contingency table:

Contingency Table for Interviewer Gender
  Gender of
Interviewer
Total
Male Female
Women who agree 512 336 848
Women who disagree 288 64 352
Total 800 400 1200

If A is "women who agree" and B is "women interviewed by a man", then:

P(A) = 848 / 1200 = 0.707

P(A | B) = 512 / 800 = 0.640

Although there is a difference in our original estimate and estimate updated based on additional information, is it enough to conclude that there is actually a difference in women's responses based on the gender of the interviewer? We don't know enough about inferential statistics yet to determine this – wait until chapter 4. (But it turns out, statistically speaking, that because of the large sample size, we can be 99% sure that there is an actual difference in the responses of women based on the gender of the interviewer.)

  • Sometimes the information in a problem is given in terms of percentages rather than specific numbers. In this case, just use the percentages to construct the contingency table. (But make sure you are using all percentages or all actual numbers in the table.) Example: Of all the students taking COP2220, 72% passed. Of those that passed, 60% had received a grade of A in COP1000 (the prerequisite course). Of those that failed COP2220, 80% received a grade of B or lower in COP1000. Does a student's grade in COP1000 appear to be an indicator of how well they will do in COP2220? Fill in this contingency table:

Contingency Table for COP1000/2220
  Passed
COP2220
Failed
COP2220
Total
A in COP1000      
B or less in COP1000      
Total      
  • One more example (from NYT, October 20, 2009, pg D9): Is garlic helpful in warding off a cold? In one double-blind study, published in 2001, British scientists followed 146 healthy adults over 12 weeks from November to February. Those who had been randomly selected [number not stated, but let's assume it is 58] to receive a daily garlic supplement came down with 24 colds during the study period, compared with 65 colds in the placebo group. Can you construct a contingency table for this study? Do taking garlic and getting a cold appear to be dependent or independent?

This is a limited application of Bayes' Theorem, but one sufficient for this course, that will give you a good tool to analyze quite a few different problems. For the record, here is Bayes' Theorem:

If you are adventurous, try to verify this formula with the above data.

 Updated 11.03.2009