Lecture Notes

Business Statistics (QMB 2100)

Probability Distributions

Introduction

When we calculated the probability of an event in the last chapter, we assumed that all the outcomes of a trial were equally likely. This is an overly-restrictive assumption. This chapter develops techniques to estimate probabilities when the outcomes of a trial have different probabilities, described by a probability distribution.


Random Variables and Probability Models

Random Variables. We will define a random variable this way:

A random variable is an event (the outcome of a trial) for which you would like to calculate a probability.

The possible number of values for the event (the number of possible outcomes of the trial) must be greater than one; events that can only be a single value are constants.

There are two types of random variables:

  • Discrete random variables are those whose possible values can be enumerated (a finite set)
  • Continuous random variables are those that can take on any of a set of values that cannot be enumerated (an infinite or very large set)

Here is a table of the examples given in the text (p 72):

Examples of Discrete and Continuous
Random Variables
Event Possible Values Type of
Random Variable
Number of boys in three children 0, 1,  2,  3 ?
Number of ovens not working out
of five in the bakery
0, 1, 2, 3, 4, 5 ?
Lifetime of a computer chip 800-1,300 hours ?
Weight of a package 1 to 5 lbs ?

Probability Models, Distributions, and Histograms

A probability model is used to help determine the probabilities of events. This chapter discusses two commonly used probability models:

  • For discrete random variables, the binomial probability model
  • For continuous random variables, the normal probability model

To make a probability model useful, we need a probability distribution:

A discrete probability distribution consists of all the possible values of a random variable and their associated probabilities.

For example, for the random variable "the number of heads in 3 flips of a coin", we can determine from the previous chapter that the possible values are 0, 1, 2, and 3, with the probabilities 0.125, 0.375, 0.375 and 0.125, respectively. (Draw a tree diagram to see this); this constitutes a probability distribution, because every possible outcome has been identified, with its corresponding probability of occurrence.

We can also construct a probability histogram, which displays the same information:

This histogram is a picture of a probability distribution; note the total area of the bars is 1.000 (because the probabilities in a distribution must sum to 1).

Here is the probability histogram for the random variable "rolling a pair of dice". The probabilities are based on the tree diagram in the text (p. 35).


A Bernoulli Process and Statistical Independence

The Characteristics of a Bernoulli Process.

File:Jakob Bernoulli.jpg

Jacob (Jacques) Bernoulli (1654-1705) was a Swiss mathematician, most noted for his work in differential equations and probability theory. He defined and studied the probability distributions describing a Bernoulli process, that describes a series of trials whose characteristics are:

  1. Each trial has only two possible outcomes

  2. The outcome probabilities remain constant
    from trial to trial

The first condition can usually be satisfied by the design of the experiment; for example, if the results of a survey can be reported as "opposed", "indifferent", and "in favor", we could classify the actual results as "opposed" and "not opposed".

The second condition is a characteristic of the physical process, and has to do with whether successive trials are statistically independent; if they are not, we do not have a Bernoulli process; for example:

  • A coin flip, or drawing from a jar of M&Ms of two colors with replacement are both Bernoulli processes, because successive trials are independent
  • Drawing from a jar of M&Ms of two colors without replacement is not a Bernoulli process, because the results of one draw will affect the next (because you are affecting the relative ratios of the two colors)

Computing Joint Probabilities for a Bernoulli Process. Joint probability is the probability that two or more events occur. The joint probability of events in a Bernoulli process can be calculated with the special rule of multiplication:

The joint probability of two or more independent events is equal to the product of their simple probabilities.

Or, describing joint probability with a formula for two independent events A and B:

For example: The probability of randomly drawing a red M&M from a jar containing 3 red and 7 green M&Ms on one trial is 0.3. What is the probability of drawing two red M&Ms in succession, with replacement?

P(Red and Red) = (3 / 10) · (3 / 10) = 0.090 (or 9.0%)

Here is a "puzzler" from Car Talk©: Hank and Moe went job-hunting. At one likely employer, the boss said, "You guys look promising, but I need a guarantee you can be here 90% of the time. Hank said, "Geez, we both have old cars. Mine only starts 80% of the time, and Moe's is even worse – it only starts 70% of the time." The boss hired them anyway and they turned out to be the hardest-working, if not the brightest, workers he ever had. What did the boss know?

Characteristics of a non-Bernoulli Process. A non-Bernoulli process is one in which there are more than two possible outcomes for each trial, or the probabilities of each outcome are not the same from trial to trial. The first we can usual account for by proper categorizing of the result into two categories. The second may not be possible to adjust for.

For example, consider the jar of 3 red and 7 green M&Ms. If we randomly draw two M&Ms with replacement, we have a Bernoulli process. If we draw two M&Ms without replacement, the probability of outcomes on successive trials is not the same, because they is affected by what has already happened: If we draw a red M&M on trial 1, there are only 2 red and 7 green M&Ms left in the jar for the next draw, so the probability of drawing a red is 2 / 9, not 3 / 10.

Computing Joint Probabilities for a non-Bernoulli Process. To compute the joint probability of a non-Bernoulli process, we need the special rule of multiplication:

The joint probability of two or more dependent events is equal to the product of their simple and conditional probabilities.

As a formula, for two dependent events A and B:

For the M&M example without replacement,

P(Red and Red) = (3 / 10) · (2 / 9) = 0.067 (or 6.7%)

Here is another example. Assume there is a department in your company that has 2 men and 5 women employees. Two employees are to be selected at random to serve on the Holiday Party Committee. What is the probability of two men being selected? Two women? One man and one woman? Here is a table (similar to the tree diagrams in the text):

Diagram for the Holiday Party Committee
Choice 1 Choice 2 | Choice 1 Outcome
M (2/7) M (1/6) MM (2/7 ·1/6 = 0.048)
W (5/6) MW (2/7 · 5/6 = 0.238)
W (5/7) M (2/6) WM (5/7 · 2/6 = 0.238)
W (4/6) WW (5/7 · 4/6 = 0.476)

Note the probabilities sum to 1, which is a requirement of a probability distribution. So we have these calculated joint probabilities:

P(man and man) = 4.8%
P((man and woman) or (woman and man)) = 23.8% + 23.8% = 47.6%
P(woman and woman) = 47.6%


More on a Bernoulli Process

Sometimes it is obvious if a process meets the requirements to be a Bernoulli process, other times it is less clear and you have to to think about it a little. For example:

  1. There is a 2.5% probability that an individual selected at random in the United States will have an IQ above 130. What is the probability that 3 individuals selected at random from across the country will all have IQ's above 130?

  2. What is the probability that three siblings from families selected at random in the United States will all have IQ's above 130?

In both cases there are only two outcomes per trial (IQ > 130, IQ ≤ 130), so one condition is satisfied. As to the other (condition of independent outcomes), the first case meets it because choosing 3 people at random, even without replacement, will only make a very small difference in the probabilities of outcomes because the population in question is so large. The second case does not meet this condition, assuming we believe there is a genetic component to IQ. If one sibling has an IQ > 130, the probability that his/her siblings has a high IQ is greater than 2.5%.

The question of whether a process qualifies as a Bernoulli process or not is critical, because as we will see, we can only use the binomial model to calculate probabilities if the problem can be represented as a Bernoulli process.

A process can only be a Bernoulli process if the trials are statistically independent. We have already seen how to estimate independence: with a contingency table. Here is part of the contingency table given in the text (p 84):
 

Latest
Child
Number and Sex of Previous Children
None 1 Boys 2 Boys 3 Boys 1 Girls 2 Girls 3 Girls
Boy 487 488 500 600 480 460 380
Girl 513 512 500 400 520 540 620
Total 1000 1000 1000 1000 1000 1000 1000

Are the chances of having a boy independent of the outcomes of previous births? We don't need to compute every probability. Consider just:

P(boy having no previous children) = 487 / 1000 = 0.487

while at the same time,

P(boy having 3 previous girls) = 380 / 1000 = 0.380

Since these two probabilities are significantly different, we would estimate that the event of the sex of a child is not independent of the sex of previous children. Hence, we do not have a Bernoulli process, and will not be able to use the Binomial model described in the next section.


The Binomial Model

Probability models are used to calculate the probabilities for random variables. Random variables can be discrete or continuous. This chapter discusses the binomial model and how we can use it to calculate the probabilities for a random variable obtained through a Bernoulli process. That is, the binomial model will calculate the probability of outcome a in n trials, where each trial has only two possible outcomes, and the trials are independent.

Estimating Binomial Probabilities by Simulation. One way to estimate a binomial probability is by simulating a real world process with random numbers. A random number is a number from a series of numbers, each of which has the property that its probability of appearance cannot be determined from the pattern of the previous numbers. In other words, the numbers are independent of each other.

The text gives a small table of random digits. Here is another, generated by this program (if you are interested in the C source code, here it is: random.c). Like the table in the text, these random digits are grouped by two's just to make the table a little easier to read.

A Small Table of Random Digits
24 47 39 49 85 56 74 60 98 51
89 91 91 79 30 91 10 02 23 12
00 53 88 58 38 41 30 51 21 53
81 25 24 56 84 44 18 49 15 69
35 15 07 07 43 06 85 46 72 94
47 12 65 08 88 18 80 59 45 46
76 77 86 59 55 95 96 55 56 73
46 96 64 57 57 11 67 86 90 88
09 24 05 47 15 44 10 04 02 83
97 06 86 43 18 16 12 54 75 39

We can use this table to simulate what might happen in the real world. Here is the "bakery problem" from the text (p. 87):

A bakery has five ovens. Four of the ovens must be working in order to meet customer demand on any given day. The owner keeps records for major repairs. A major repair is on that requires a service call and puts an oven out of service for the day. A minor repair ... takes only a few minutes to fix and their are no lost sales.

Based on the owner's records, each oven has a 10% chance of a major breakdown on any given day. What is the probability that at least four ovens are working on a given day?

First, note that this problem does describe a Bernoulli process: each trial (one oven on one day) has only two possible outcomes (working, not working), and that the trials are independent (that status of one oven does not effect the status of another).

We can use our table of random digits to solve this problem this way: let the digits 0-8 represent the condition "oven is working", and the digit 9 to mean "oven is not working." We will run the simulation by picking five random digits from the table, one for each oven, each "day", and recording the results. Using the table above, the results for the first day would be 2-4-4-7-3, which we will interpret as "all ovens working." For the second day, the series would be 9-4-9-8-5. This indicates that two ovens are not working that day.

If we kept up this procedure, we could determine the number of days at least four ovens were working, which, divided by the total number of days, would approximate the true probability of at least four ovens working. The longer we ran the simulation, the more accurate our estimate would be. Here is a program that runs this simulation (executable, C source code), and the results of one simulation run of 10,000 days:

Bakery Simulation
How many days to run simulation? 10000

Number of days (of 10000) with n ovens working:     

     0:      0 (0.000)
     1:      5 (0.001)
     2:     74 (0.007)
     3:    665 (0.067)
     4:   3267 (0.327)
     5:   5989 (0.599)
 

From the above, we can calculate that the probability of at least 4 ovens working on any given day will be approximately:

P(at least 4 ovens working) = 0.327 + 0.599 = 0.919 (or 91.9%)

Note that this differs from the text estimate of 0.96, because a different set of random numbers were used, and because the above simulation ran for 10,000 days, rather than the 25 in the text (one of the advantages of simulation by computer).

Computing Binomial Probabilities. If you don't have a computer program handy, or a table of random numbers or the time to use it to run a simulation of sufficient length, we can determine probabilities directly from the Binomial probability distribution.

Let's assume we have n trials, each of which can result in the outcomes {success, failure}, each with probabilities p and q, respectively. (Note that since there can be only two outcomes in a Bernoulli process, q = 1 – p.) We designate P(x) as the probability of x successes in n trials.

The text describes how to calculate P(x):

The probability of x successes in the number of ways x can occur, times the probability of any one way to obtain x successes.

For example, what is the probability of two heads (successes) in three flips of a coin? The probability of one way of obtaining two heads, say HHT, is 0.5 x 0.5 x 0.5 = 0.125. But there are three ways to obtain two heads in three throws: HHT, HTH, and THH. So:

P(2 heads in three flips) = 3 x 0.125 = 0.375 (or 37.5%)

The difficulty is usually not figuring out the probability of one way for x to occur, but how many possible ways x can occur in n trials. For this, we need the math concepts of factorial, so we may as well introduce the binomial probability formula:

Don't let this equation scare you! Just because it has a lot of ! in it! Really! It is just another "plug and chug" formula. The first factor is the number of ways x successes can occur in n trials. (In math it is called a combination.) The second factor is is the the probability of any one way to obtain x successes. The exclamation point is the factorial operator, which just means that number multiplied by all the whole numbers less than itself, down to 1; for example:  5! = 5 · 4 · 3 · 2 · 1 = 120. So, what is the probability of getting 2 heads in 3 flips:


 






 

Another example: What is the probability of four ovens out of five working? How about all five? How about at least four?

 


 

Compare this with the estimated probability obtained by simulation, above.

Expected Value and Standard Deviation of the Binomial Probability Distribution. Every probability distribution has an expected value (mean) and standard deviation. The formulas for these were given in Chapter 2. For the binomial distribution, these are:


Example (from Triola, pg 232): The midterm exam in a nursing course consists of 75 true/false questions. Assume that an unprepared student makes random guesses for each of the answers. (a) Find the mean and standard deviation for the number of correct answers for such students. (b) Would it be unusual for a student to pass this exam by guessing and getting at least 50 correct answers?



We would expect an unprepared student to "guess" correctly on 37.5 questions, with a standard deviation of 4.3. Guessing 50 (or more) answers correctly would fall outside of our range of "usual" values.

The Binomial Model and Decision-Making.

We can use the binomial probability distribution model to support our management decision-making. For example (from the text): assume the cost in orders not fulfilled when at least 4 ovens are not available on a given day is $100. Also assume the bakery can buy another oven for $175/month (monthly payment on a business loan). Neglecting taxes, depreciation, salvage costs and opportunity costs, would this be a good investment?

We have already determined the probability of at least four ovens working out of five is 0.918, so the probability of two or more ovens broken is 1 – 0.918 or 0.082. So, in a 30-day month, we would expect np = 30 · 0.082 = 2.46 days in which two or more ovens are broken, each day of which we will lose $100, or an expected monthly loss of $246. We can do the same calculations to determine the probability that three or more ovens broken out of six (if we buy the new oven), which is 0.016. The expected number of days in which this will occur in a month is 30 · 0.016 = 0.48 days, for an expected loss of $48. So:

Expected Revenue/Loss for
Oven Buying Decision
Decision Expected Monthly
Revenue/Loss from Lost Orders
Don't Buy – $246
Buy –$48

So buying the oven looks like it will reduce our expected monthly loss in revenue due to lost orders by $198. If the oven costs only $175/month, we would have a decrease in expected net loss of $23.


Continuous Random Variables and the Normal Model

So far we have only looked at calculating the probabilities of discrete random variables – those that have a finite number of outcomes. A Bernoulli random variable has only two outcomes. But many processes have continuous outcomes, such as the expected lifetime of a computer chip, or the weight of a package, where the random variable can take on non-discrete values. The most common model for calculating probabilities for a continuous random variable is the normal (Gaussian) model, studied by Carl Gauss, German mathematician, 1777-1855 and many others.

 
Why is the Normal Model so Useful?
The normal model is useful because:

  • Many processes in nature are naturally "normal"
  • The normal model can be used to approximate other models, particularly when the number of trials is high

With regard to the second, here is a probability histogram of the number of heads in 100 flips of a coin:

You can see the familiar normal curve emerging. If we increased the number of trials, it would more closely approximate a real normal curve.

Features of the Normal Curve. The normal curve has these characteristics:

  1. The normal curve is symmetric about its mean value
  2. The total area under the normal curve is 1
  3. The normal curve never falls below the x-axis (it is never negative).

Computing Normal Probabilities. There are an infinite number of normal curves, each one describing a population with a different mean or standard deviation. The standard normal curve is the curve that describes a population whose mean is 0 and standard deviation 1. Of course there are almost no populations in real life that meet this specification. But we can normalize any population by converting the data into their corresponding z-scores:

A z-score tells you how many standard deviations the value of the random variable is above or below the mean.

You calculate a z-score for a data value xi from a population with mean x and standard deviation s like this:

A negative z-score indicates the value is less than the mean; a positive z-score indicates a value is greater than the mean.

Example: What is the z-score for an IQ 0f 132 (IQ mean = 100, st. dev. = 15):


 

Since z-scores change any normal distribution into one with a mean of 0 and standard deviation of 1, we can use just one curve (and one statistical table) to model any normal distribution.

Here is a small table of the standard normal curve, giving the area under the curve between the mean and the given z-score. This value can be interpreted as the probability of a random variable having a value between the mean and that z-score.

Area Under the Standard Normal Curve – A Small Table
Number of Standard
Units (z-score)
Area Between the Mean
and the Number
of Standard Units Away
from the Mean
0.5 0.1915
0.75 0.2734
1.0 0.3413
1.2 0.3849
1.5 0.4342
1.6 0.4452
1.75 0.4599
2.0 0.4772
2.5 0.4938
2.8 0.4974

Since the area is directly related to probability, we can use the table above and the properties of the normal curve's symmetry to calculate the probability of a random variable falling in any given range.

Example: What is the probability of having an IQ of more than 118 (mean = 100, st.dev. = 15. First calculate the z-score:

 
 

Now look in the table above to find that a z-score of 1.2 represents 0.4342 of the area under the curve from the mean. Since the area to the right of the mean is 0.5, that leaves only 0.0658 above the z-score of 1.2, or 6.6%

Decision-Making and the Normal Curve. Just as we applied the binomial model to decision-making with discrete random variables, we can apply the normal curve to decision-making with continuous random variables.

Example (from the text): Two overnight delivery services make the following claims:

  • Federal Service: average delivery time 30 hours, st.dev. 5 hours

  • United Express: average delivery time 34 hours, st.dev. 2 hours

Assuming delivery times can be modeled by the normal distribution, which service has the highest chance of meeting a 36-hour deadline?

To solve this, we need to convert 36 to a z-score for each delivery service:

We can interpret this to mean that Federal Service has a probability of delivering your package within 36 hours to be 0.5 (area to left of mean) + 0.3849 (area to right of mean up to 1.2 standard units) = 0.8849 (or 88.5%), while a United Express delivery within 36 hours is 0.5 + 0.3413 = 0.8413 (or 84.1%).

So we would go with Federal Service. (But try the same analysis if you could afford to wait 38 hours, instead of just 36.)

How Do We Really Know the Mean and Standard Deviation? It's clear from the above that we need to know the mean and standard deviation of a probability distribution to be able to calculate probabilities. In the case of coin flips we can base these on theoretical assumptions. But in the case of delivery services, the only way to determine the mean and standard deviation for the population as a whole is by sampling and estimating using statistical reasoning, the subject of the next chapter.

 Updated 10.31.2009