Category Archives: Math and Statistics

The Birthday Paradox: an interesting probability problem involving “statistically independent” events

Following up on last week’s blog posting entitled “Statistical Independence,” consider the so-called “Birthday Paradox”. The Birthday Paradox pertains to the probability that in a set of randomly chosen people, some pair of them will have the same birthday. Counter-intuitively, in a group of 23 randomly chosen people, there is slightly more than a 50% probability that some pair of them will both have been born on the same day.

To compute the probability that two people in a group of n people have the same birthday, we disregard variations in the distribution, such as leap years, twins, seasonal or weekday variations, and assume that the 365 possible birthdays are equally likely.[1] Thus, we assume that birth dates are statistically independent events. Consequently, the probability of two randomly chosen people not sharing the same birthday is 364/365. According to the combinatorial equation, the number of unique pairs in a group of n people is n!/2!(n-2)! = n(n-1)/2. Assuming a uniform distribution (i.e., that all dates are equally probable), this means that the probability that no pair in a group of n people shares the same birthday is equal to p(n) = (364/365)^[n(n-1)/2]. The event of at least two of the n persons having the same birthday is complementary to all n birthdays being different. Therefore, its probability is p’(n) = 1 – (364/365)^[n(n-1)/2].

Given the assumptions listed in the previous paragraph, suppose that we are interested in determining how many randomly chosen people are needed in order for there to be a 50% probability that at least two persons share the same birthday. In other words, we are interested in finding the value of n which causes p(n) to equal 0.50. Therefore, 0.50 = (364/365)^[n(n-1)/2]; taking natural logs of both sides and rearranging, we obtain (ln 0.50)/(ln 364/365) = n(n-1)/2. Solving for n, we obtain 505.304 = n(n -1); therefore, n is approximately equal to 23.[2]

The following graph illustrates how the probability that a pair of people share the same birthday varies as the number of people in the sample increases:

New Picture (1)

[1] It is worthwhile noting that real-life birthday distributions are not uniform since not all dates are equally likely. For example, in the northern hemisphere, many children are born in the summer, especially during the months of August and September. In the United States, many children are conceived around the holidays of Christmas and New Year’s Day. Also, because hospitals rarely schedule C-sections and induced labor on the weekend, more Americans are born on Mondays and Tuesdays than on weekends; where many of the people share a birth year (e.g., a class in a school), this creates a tendency toward particular dates. Both of these factors tend to increase the chance of identical birth dates, since a denser subset has more possible pairs (in the extreme case when everyone was born on three days of the week, there would obviously be many identical birthdays!).

[2]Note that since 26 students are enrolled in Finance 4366 this semester, this implies that the probability that two Finance 4366 students share the same birthday is roughly p’(26) = 1 – (364/365)^[26(25)/2] = 59%.

Quote for the day

There are very few things which we know, which are not capable of being reduced to a mathematical reasoning. And when they cannot, it’s a sign our knowledge of them is very small and confused. Where a mathematical reasoning can be had, it’s as great a folly to make use of any other, as to grope for a thing in the dark, when you have a candle standing by you.

—Of the Laws of Chance, Preface (1692)
John Arbuthnot (1667–1735)

Statistical Independence

During last Thursday’s Finance 4366 class meeting, I introduced the concept of statistical independence. This coming Tuesday, much of our class discussion will focus on the implications of statistical independence for probability distributions such as the binomial and normal distributions which we will rely upon throughout the semester.

Whenever risks are statistically independent of each other, this implies that they are uncorrelated; i.e., random variations in one variable are not meaningfully related to random variations in another. For example, auto accident risks are largely uncorrelated random variables; just because I happen to get into a car accident, this does not make it any more likely that you will suffer a similar fate (that is, unless we happen to run into each other!). Another example of statistical independence is a sequence of coin tosses. Just because a coin toss comes up “heads,” this does not make it any more likely that subsequent coin tosses will also come up “heads.”

Computationally, the joint probability that we both get into car accidents or heads comes up on two consecutive tosses of a coin is equal to the product of the two event probabilities. Suppose your probability of getting into an auto accident during 2017 is 1%, whereas my probability is 2%. Then the likelihood that we both get into auto accidents during 2017 is .01 x .02 = .0002, or .02% (1/50th of 1 percent). Similarly, when tossing a “fair” coin, the probability of observing two “heads” in a row is .5 x .5 = 25%. The probability rule which emerges from these examples can be generalized as follows:

Suppose Xi and Xj are uncorrelated random variables with probabilities pi and pj respectively. Then the joint probability that both Xi and Xj occur is equal to pipj.

The 17 equations that changed the course of history (spoiler alert: we use 4 of these equations in Finance 4366!)

I especially like the fact that Ian Stewart includes the famous Black-Scholes equation (equation #17) on his list of the 17 equations that changed the course of history; Equations (2), (3), (7), and (17) play particularly important roles in Finance 4366!

From Ian Stewart’s book, these 17 math equations changed the course of human history.