Understanding The Monte Carlo Method
An introduction and intuitive overview of the Monte Carlo method.
The Monte Carlo method is a powerful technique that uses computers to simulate various random events. It has become essential for scientists to model a wide range of probabilistic phenomena.
Imagine that we want to calculate the surface area of an irregularly shaped lake. The lake's random and curving boundaries make it challenging to find a straightforward mathematical formula to determine its area. Instead, using the Monte Carlo method, we can generate random points and check how many fall within the lake's boundaries. By comparing the number of points inside the lake to the total number of points generated, we can estimate its surface area probabilistically.
A Simple Example:
We’ll provide an example in Python to illustrate how Monte Carlo sampling can be used to estimate areas. To do this, we’ll plot a simple lake and try to obtain its area without having to use any advanced mathematical techniques or formulas.
Let’s first plot our lake using the Python Matplotlib library:
import matplotlib.pyplot as plt
import numpy as np
def irregular_lake(x, y):
# Define the first circular region with center (2, 4) and radius 2
circle1 = (x - 2)**2 + (y - 4)**2 - 2**2
# Define the second circular region with center (6, 6) and radius 3
circle2 = (x - 6)**2 + (y - 6)**2 - 3**2
# Define the third circular region with center (8, 2) and radius 2
circle3 = (x - 8)**2 + (y - 2)**2 - 2**2
# Combine the circular regions to get the irregular lake shape
return np.minimum(np.minimum(circle1, circle2), circle3)
# Generate x and y values from 0 to 11 with a step of 0.1
x_values = np.arange(0, 11, 0.1)
y_values = np.arange(0, 11, 0.1)
# Create a meshgrid from x and y values
X, Y = np.meshgrid(x_values, y_values)
# Calculate the irregular lake function for each point in the meshgrid
Z = irregular_lake(X, Y)
# Create the plot with the lake area
plt.contourf(X, Y, Z, levels=[-1, 0], colors='blue', alpha=0.5)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Irregular Lake')
# Show the plot
plt.show()
The figure produced by our code is provided below:
As you can probably notice, calculating the area for this shape isn’t straight forward. Although it’s composed of 3 simple circular figures, they look like they intersect and we can’t calculate the area of our figure by combining their areas and summing them. Is there a different methodology which we can use to obtain the total area of our shape?
Well, this is where the Monte Carlo method comes in! Using Monte Carlo, we shall estimate its area probabilistically!! How do we do this?
First, we enclose the region within the rectangle shown in the diagram. We can see that the total region within our plot is composed of 11 units available within our x-axis and 11 units within our y-axis):
Next, we’ll generate hundreds of (x, y) points within this rectangle. For instance, the computer might generate points (6.3, 5.2) or (8.1, 9.2) as shown in the image below:
For each generated point, we ask whether it falls inside of our lake or not. To do this within our Python code, we plug the point into the irregular_lake() function (which represents the lake shape). This function calculates a value for each random point based on its position in the x-y plane. For points inside the irregular lake, the function value will be less than or equal to zero, and for points outside the lake, the function value will be positive.
Let’s show the output for this function for a few random points to illustrate this behavior:
>>> # Point (6.3, 5.2) -> contained within our lake
>>> print(irregular_lake(x = 6.3, y = 5.2 ))
-8.27
>>> # Point (7.0, 7.1) -> contained within our lake
>>> print(irregular_lake(x = 7.0, y = 7.1 ))
-6.79
>>> # Point (2.1, 4.2) -> contained within our lake
>>> print(irregular_lake(x = 4.2, y = 2.1 ))
-3.95
>>> # Point (0.0, 4.0) -> right on the boundary (contained)
>>> print(irregular_lake(x = 0.0, y = 4.0 ))
0.0
>>> # Point (8.1, 9.2) -> NOT contained within our lake
>>> print(irregular_lake(x = 8.1, y = 9.2 ))
5.65
>>> # Point (10.5, 10.0) -> NOT contained within our lake
>>> print(irregular_lake(x = 10.5, y = 10.0 ))
27.25
>>> # Point (0.2, 0.1) -> NOT contained within our lake
>>> print(irregular_lake(x = 0.2, y = 0.1 ))
14.45
We should be able to easily see and confirm from the code above that the irregular lake function produces a value which is either negative or 0 for any points which lie within our lake while producing positive values for any points which lie outside. In other words, we can calculate whether our generated points lie within the lake by using the code provided below:
>>> # Point (6.3, 5.2) does lie within our boundary:
>>> lies_within_lake = (irregular_lake(x = 6.3, y = 5.2 ) <= 0)
>>> print(lies_within_lake)
True
>>> # Point (8.1, 9.2) doesn't lie within boundary
>>> lies_within_lake = (irregular_lake(x = 8.1, y = 9.2 ) <= 0)
>>> print(lies_within_lake) # Prints False
False
In other words, to check whether our point lies within our lake, we simply call (irragular lake(x, y) <= 0). This call will return a Boolean value indicating whether our point lies within our bounded lake region or not.
Great, but how do we use this information to estimate the total area within our plot?
Well, now comes the critical observation from the Monte Carlo method: The exact probability that a randomly chosen point falls within the lake is precisely the proportional to the 11 x 11 rectangular area that the lake occupies. That is:
Of course, we cannot calculate this probability unless we know the lake's area – the very unknown which we’re looking to find, but we can estimate the probability of hitting the lake by sampling random points and calculating the proportion of hits which we get that lie within the boundary. This use of the long run proportion of successes to approximate the true probability of success is a direct application of the law of large numbers. We can then use this proportionality to obtain our total area!
In order to do this, we simply take the proportion of random points which fall within our area and multiply it by our total area (area enclosing our rectangle) and we should be able to obtain an accurate estimate for our area!! This is in essence what the Monte Carlo method does!!
Let’s create a python script which does this for us!!
import numpy as np
def irregular_lake(x, y):
# Define the first circular region with center (2, 4) and radius 2
circle1 = (x - 2)**2 + (y - 4)**2 - 2**2
# Define the second circular region with center (6, 6) and radius 3
circle2 = (x - 6)**2 + (y - 6)**2 - 3**2
# Define the third circular region with center (8, 2) and radius 2
circle3 = (x - 8)**2 + (y - 2)**2 - 2**2
# Combine the circular regions to get the irregular lake shape
return np.minimum(np.minimum(circle1, circle2), circle3)
# Generate x and y values from 0 to 11 with a step of 0.01
x_values = np.arange(0, 11, 0.01)
y_values = np.arange(0, 11, 0.01)
# Create a meshgrid from x and y values
X, Y = np.meshgrid(x_values, y_values)
# Calculate the irregular lake function for each point in the meshgrid
Z = irregular_lake(X, Y)
# Set the number of random points to generate
num_points = 10000
# Generate random x and y coordinates within the bounding box of the lake
random_x = np.random.uniform(0, 11, num_points)
random_y = np.random.uniform(0, 11, num_points)
# Count the number of points that fall inside the lake area
points_inside_lake = sum(irregular_lake(random_x, random_y) <= 0)
# Estimate the area of the lake based on the ratio of points inside the lake to # total points:
total_area = 11 * 11 # Total area of the bounding box
lake_area = total_area * (points_inside_lake / num_points)
print(f"Total points sampled: {num_points}")
print(f"Total points which fall inside the lake area: {points_inside_lake}")
print(f"Estimated lake area: {lake_area}")
Running the code for 10,000 sample points produces the below output:
Total points sampled: 10000
Total points which fall inside the lake area: 4286
Estimated lake area: 51.8606
For this example, our code selected 10,000 points in the rectangle (which has a total area of 11 x 11 = 121) and found that 4,286 of them hit the lake. Thus, we can estimate that:
4,286 (number of points within area) / 10,000 (number of points sampled) = 0.4286 ( 42.86% probability that random point lands in lake)
which after cross-multiplication becomes:
Area of lake = 0.4286 (probability random point lands in lake) * 121 (area of enclosing rectangle) = 51.86 square units
Could we get a sharper estimate? To do this, we simply change our code to select more points! Instead of sampling 10,000 points, let’s sample 100,000 and see what we get!
In this case, it found that 42,784 out of 100,000 points fell within the lake area yielding an estimate of 51.77 – extremely close to our original value!
Total points sampled: 100000
Total points which fall inside the lake area: 42784
Estimated lake area: 51.76864
Of course, we could now ask the computer for 1 million random points, or 1 billion, or even more! With a greater number of samples, the greater the confidence we have in the accuracy of our area estimate!
Of course, this is a very elementary and artificial example. Real-world phenomena have more nuances and sophistication which adds a bit more complexity to real-world Monte Carlo methods, but at least now we should know the gist of what this methodology provides: a numerical technique that uses random sampling to estimate complex mathematical results or simulate probabilistic events. It involves generating a large number of random samples to approximate an outcome or calculate probabilities, making it useful when exact solutions are difficult or impossible to obtain analytically.
Some examples of how the Monte Carlo method is used in the real world are provided below:
Finance: Monte Carlo simulations help estimate financial risks, such as stock price movements or option pricing by simulating a large number of possible scenarios.
Physics: it is employed to solve complex problems like simulating particle interactions or estimating material properties.
Engineering: Monte Carlo simulations are used in structural analysis to estimate stress, fatigue life, and failure probabilities in complex structures.
Risk Analysis: In insurance and risk management, Monte Carlo simulations help assess potential losses and set appropriate premiums.
Environmental Modeling: It is applied to simulate weather patterns, predict natural disasters, and study climate change effects.
Manufacturing: Monte Carlo simulations aid in quality control and process optimization to minimize defects and improve production efficiency.
Artificial Intelligence: In AlphaGo (the artificial intelligence program that defeated a world champion in Go), Monte Carlo Tree Search was used to simulate potential moves, allowing the AI to make strategic decisions.
There are many other real-world use-cases for Monte Carlo method but we don’t have time to go over all of them. Here, we simply wanted to provide a simple overview of what it is and how it can be implemented! Hopefully you found this tutorial useful!