Section 11 Computing accurate Monte Carlo estimates

Returning to the result of our original for loop, we had

numberUnique
## [1] 19 19 20 20 20

From this, we estimate the probability that at least two people have the same birthday, by counting the proportion of times there were fewer than 20 unique birthdays:

sum(numberUnique < 20) / 5
## [1] 0.4

So our estimated probability is 0.4. Remember that, on its own,

numberUnique < 20

produces

## [1]  TRUE  TRUE FALSE FALSE FALSE

so that sum(numberUnique < 20) adds up the number of TRUEs (each TRUE is interpreted as a 1, and each FALSE is interpreted as a zero.)

11.1 Using the mean() command

It’s slightly easier to use the mean() command instead of sum(), as it will add up the values and divide by the number of values there are (it will do the dividing by 5 for us).

mean(numberUnique < 20) 
## [1] 0.4

11.2 Improving the accuracy of the estimate

Roughly speaking, the more simulations we do, the better the estimate will be. Simulating 5 sets of 20 birthdays wasn’t enough, so let’s try a larger number: 100,000:

numberUnique <- rep(0, 100000)
for(i in 1:100000){
  birthdays <- sample(x = 1:365, size = 20, replace = TRUE)
  numberUnique[i] <- length(unique(birthdays))
}
mean(numberUnique < 20)
## [1] 0.40967

We can repeat this a few times to see if the answer changes much. The estimate looks fairly stable correct to two decimal places. If we needed to, we could do even more simulations (millions), though we’ll start noticing that the code takes longer to run.

Exercise 11.1 Suppose there are, instead, 30 people in the room, and they are all born in a leap year. Assume each person is equally likely to have been born on any day of the year. Use the Monte Carlo method to

  1. estimate the probability that at least two people have birthdays on the same day;
  2. estimate the probability that two people only have birthdays on the same day.