Section 11 Computing accurate Monte Carlo estimates
Returning to the result of our original for loop, we had
## [1] 19 19 20 20 20
From this, we estimate the probability that at least two people have the same birthday, by counting the proportion of times there were fewer than 20 unique birthdays:
## [1] 0.4
So our estimated probability is 0.4. Remember that, on its own,
produces
## [1] TRUE TRUE FALSE FALSE FALSE
so that sum(numberUnique < 20)
adds up the number of TRUE
s (each TRUE
is interpreted as a 1, and each FALSE
is interpreted as a zero.)
11.1 Using the mean()
command
It’s slightly easier to use the mean()
command instead of sum()
, as it will add up the values and divide by the number of values there are (it will do the dividing by 5 for us).
## [1] 0.4
11.2 Improving the accuracy of the estimate
Roughly speaking, the more simulations we do, the better the estimate will be. Simulating 5 sets of 20 birthdays wasn’t enough, so let’s try a larger number: 100,000:
numberUnique <- rep(0, 100000)
for(i in 1:100000){
birthdays <- sample(x = 1:365, size = 20, replace = TRUE)
numberUnique[i] <- length(unique(birthdays))
}
mean(numberUnique < 20)
## [1] 0.40967
We can repeat this a few times to see if the answer changes much. The estimate looks fairly stable correct to two decimal places. If we needed to, we could do even more simulations (millions), though we’ll start noticing that the code takes longer to run.
Exercise 11.1 Suppose there are, instead, 30 people in the room, and they are all born in a leap year. Assume each person is equally likely to have been born on any day of the year. Use the Monte Carlo method to
- estimate the probability that at least two people have birthdays on the same day;
- estimate the probability that two people only have birthdays on the same day.