In this assignment you will be reinforcening your intuition about the concepts covered in the lectures by taking the example with the dice to the next level.
This assignment will not evaluate your coding skills but rather your intuition and analytical skills. You can answer any of the exercise questions by any means necessary, you can take the analytical route and compute the exact values or you can alternatively create some code that simulates the situations at hand and provide approximate values (grading will have some tolerance to allow approximate solutions). It is up to you which route you want to take!
Note that every exercise has a blank cell that you can use to make your calculations, this cell has just been placed there for you convenience but will not be graded so you can leave empty if you want to.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import utils
During this assignment you will be presented with various scenarios that involve dice. Usually dice can have different numbers of sides and can be either fair or loaded.
Let’s get started!
Given a 6-sided fair dice (all of the sides have equal probability of showing up), compute the mean and variance for the probability distribution that models said dice. The next figure shows you a visual represenatation of said distribution:
Submission considerations:
Hints:
# You can use this cell for your calculations (not graded)
n_sides = 6
die = np.array([i for i in range(1, n_sides + 1)])
n_rolls = 20000
rolls = np.array([np.random.choice(die) for _ in range(n_rolls)])
print(f"Mean: {np.mean(rolls):.3f}, variance: {np.var(rolls):.3f}")
Mean: 3.495, variance: 2.917
# Run this cell to submit your answer
utils.exercise_1()
FloatText(value=0.0, description='Mean:')
FloatText(value=0.0, description='Variance:')
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
Now suppose you are throwing the dice (same dice as in the previous exercise) two times and recording the sum of each throw. Which of the following probability mass functions
will be the one you should get?
![]() |
![]() |
![]() |
Hints:
+
operator like this: sum = first_throw + second_throw
# You can use this cell for your calculations (not graded)
n_rolls = 20000
first_throw = np.array([np.random.choice(die) for _ in range(n_rolls)])
second_throw = np.array([np.random.choice(die) for _ in range(n_rolls)])
sum_throw = first_throw + second_throw
sns.histplot(sum_throw, stat = "probability")
plt.show()
# Run this cell to submit your answer
utils.exercise_2()
ToggleButtons(description='Your answer:', options=('left', 'center', 'right'), value='left')
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
Given a fair 4-sided dice, you throw it two times and record the sum. The figure on the left shows the probabilities of the dice landing on each side and the right figure the histogram of the sum. Fill out the probabilities of each sum (notice that the distribution of the sum is symetrical so you only need to input 4 values in total):
Submission considerations:
# You can use this cell for your calculations (not graded)
n_sides = 4
n_times = 50000
die = np.array([i for i in range(1, n_sides + 1)])
first_throw = np.array([np.random.choice(die) for _ in range(n_times)])
second_throw = np.array([np.random.choice(die) for _ in range(n_times)])
sum_throw = first_throw + second_throw
u, f = np.unique(sum_throw, return_counts = True)
freq = f / sum_throw.shape[0]
print(f"Values: {u}")
print(np.around(freq, 3))
Values: [2 3 4 5 6 7 8]
[0.06 0.124 0.189 0.249 0.19 0.124 0.064]
# Run this cell to submit your answer
utils.exercise_3()
FloatText(value=0.0, description='P for sum=2|8', style=DescriptionStyle(description_width='initial'))
FloatText(value=0.0, description='P for sum=3|7:', style=DescriptionStyle(description_width='initial'))
FloatText(value=0.0, description='P for sum=4|6:', style=DescriptionStyle(description_width='initial'))
FloatText(value=0.0, description='P for sum=5:', style=DescriptionStyle(description_width='initial'))
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
Using the same scenario as in the previous exercise. Compute the mean and variance of the sum of the two throws and the covariance between the first and the second throw:
Hints:
# You can use this cell for your calculations (not graded)
mean = np.mean(sum_throw)
var = np.var(sum_throw)
covar = np.around(np.cov(first_throw, second_throw), 3)
print(f"Mean: {mean:.3f}, Var: {var:.3f}, Covar: {covar}")
Mean: 5.013, Var: 2.486, Covar: [[ 1.253 -0.006]
[-0.006 1.244]]
# Run this cell to submit your answer
utils.exercise_4()
FloatText(value=0.0, description='Mean:')
FloatText(value=0.0, description='Variance:')
FloatText(value=0.0, description='Covariance:')
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
Now suppose you are have a loaded 4-sided dice (it is loaded so that it lands twice as often on side 2 compared to the other sides):
You are throwing it two times and recording the sum of each throw. Which of the following probability mass functions
will be the one you should get?
![]() |
![]() |
![]() |
Hints:
p
parameter of np.random.choice to simulate a loaded dice.# You can use this cell for your calculations (not graded)
def load_dice(n_sides, loaded_number):
# All probabilities are initially the same
probs = np.array([1/(n_sides+1) for _ in range(n_sides)])
# Assign the loaded side a probability that is twice as the other ones
probs[loaded_number-1] = 1 - sum(probs[:-1])
# Check that all probabilities sum up to 1
if not np.isclose(sum(probs), 1):
print("All probabilities should add up to 1")
return
return probs
n_sides = 4
n_times = 20000
loaded_number = 2
probs_loaded_die = load_dice(n_sides = n_sides, loaded_number = loaded_number)
die = np.array([i for i in range(1, n_sides + 1)])
first_rolls = np.array([np.random.choice(die, p=probs_loaded_die) for _ in range(n_rolls)])
second_rolls = np.array([np.random.choice(die, p=probs_loaded_die) for _ in range(n_rolls)])
sum_of_rolls = first_rolls + second_rolls
sns.histplot(sum_of_rolls, stat = "probability")
plt.show()
# Run this cell to submit your answer
utils.exercise_5()
ToggleButtons(description='Your answer:', options=('left', 'center', 'right'), value='left')
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
You have a 6-sided dice that is loaded so that it lands twice as often on side 3 compared to the other sides:
You record the sum of throwing it twice. What is the highest value (of the sum) that will yield a cumulative probability lower or equal to 0.5?
Hints:
# You can use this cell for your calculations (not graded)
n_sides = 6
n_times = 20000
loaded_number = 3
probs_loaded_die = load_dice(n_sides = n_sides, loaded_number = loaded_number)
die = np.array([i for i in range(1, n_sides + 1)])
first_rolls = np.array([np.random.choice(die, p=probs_loaded_die) for _ in range(n_rolls)])
second_rolls = np.array([np.random.choice(die, p=probs_loaded_die) for _ in range(n_rolls)])
sum_of_rolls = first_rolls + second_rolls
sns.histplot(sum_of_rolls, cumulative=True, stat = "probability")
plt.show()
# Run this cell to submit your answer
utils.exercise_6()
IntSlider(value=2, continuous_update=False, description='Sum:', max=12, min=2)
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
Given a 6-sided fair dice you try a new game. You only throw the dice a second time if the result of the first throw is lower or equal to 3. Which of the following probability mass functions
will be the one you should get given this new constraint?
![]() |
![]() |
![]() |
![]() |
Hints:
# You can use this cell for your calculations (not graded)
n_rolls = 50000
first_rolls = np.array([np.random.choice(die) for _ in range(n_rolls)])
second_rolls = np.array([np.random.choice(die) for _ in range(n_rolls)])
# Preserve the result of the second throw only if the first roll was greater or equal to 4
second_rolls = np.where(first_rolls<=3, second_rolls, 0)
sum_of_rolls = first_rolls + second_rolls
sns.histplot(sum_of_rolls, stat="probability")
plt.plot()
[]
# Run this cell to submit your answer
utils.exercise_7()
ToggleButtons(description='Your answer:', options=('left-most', 'left-center', 'right-center', 'right-most'), …
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
Given the same scenario as in the previous exercise but with the twist that you only throw the dice a second time if the result of the first throw is greater or equal to 3. Which of the following probability mass functions
will be the one you should get given this new constraint?
![]() |
![]() |
![]() |
![]() |
# You can use this cell for your calculations (not graded)
n_rolls = 50000
first_rolls = np.array([np.random.choice(die) for _ in range(n_rolls)])
second_rolls = np.array([np.random.choice(die) for _ in range(n_rolls)])
# Preserve the result of the second throw only if the first roll was greater or equal to 4
second_rolls = np.where(first_rolls>=3, second_rolls, 0)
sum_of_rolls = first_rolls + second_rolls
sns.histplot(sum_of_rolls, stat="probability")
plt.plot()
[]
# Run this cell to submit your answer
utils.exercise_8()
ToggleButtons(description='Your answer:', options=('left-most', 'left-center', 'right-center', 'right-most'), …
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
Given a n-sided fair dice. You throw it twice and record the sum. How does increasing the number of sides n
of the dice impact the mean and variance of the sum and the covariance of the joint distribution?
# You can use this cell for your calculations (not graded)
n_rolls = 50000
n_side = 20
for side in range (1, n_side + 1):
die = np.array([i for i in range(1, n_sides + 1)])
first_rolls = np.array([np.random.choice(die) for _ in range(n_rolls)])
second_rolls = np.array([np.random.choice(die) for _ in range(n_rolls)])
sum_of_rolls = first_rolls + second_rolls
mean = np.mean(sum_of_rolls)
var = np.var(sum_of_rolls)
covar = np.cov(first_rolls, second_rolls)
print(f"SideL: {side}, mean = {mean:.3f}, var = {var:.3f}")
print(covar)
SideL: 1, mean = 7.013, var = 5.840
[[ 2.91986902e+00 -2.83853877e-03]
[-2.83853877e-03 2.92592811e+00]]
SideL: 2, mean = 6.991, var = 5.874
[[2.93511469 0.00691985]
[0.00691985 2.92476136]]
SideL: 3, mean = 6.987, var = 5.860
[[2.9136599 0.01175 ]
[0.01175 2.92345501]]
SideL: 4, mean = 6.993, var = 5.811
[[ 2.92467062 -0.00610857]
[-0.00610857 2.89913542]]
SideL: 5, mean = 7.002, var = 5.913
[[2.91834213 0.03830328]
[0.03830328 2.91816438]]
SideL: 6, mean = 7.001, var = 5.839
[[ 2.92641378e+00 -4.97175944e-04]
[-4.97175944e-04 2.91417659e+00]]
SideL: 7, mean = 6.993, var = 5.874
[[2.9264185 0.01378909]
[0.01378909 2.91985536]]
SideL: 8, mean = 7.022, var = 5.845
[[ 2.92815455e+00 -2.81335627e-03]
[-2.81335627e-03 2.92288946e+00]]
SideL: 9, mean = 7.007, var = 5.847
[[ 2.91121978 -0.00384553]
[-0.00384553 2.94337809]]
SideL: 10, mean = 7.006, var = 5.824
[[ 2.91393559 -0.0043477 ]
[-0.0043477 2.91875685]]
SideL: 11, mean = 6.997, var = 5.803
[[ 2.94216319 -0.02016468]
[-0.02016468 2.90121287]]
SideL: 12, mean = 6.989, var = 5.824
[[ 2.90798192 -0.00309269]
[-0.00309269 2.92262908]]
SideL: 13, mean = 7.005, var = 5.820
[[ 2.90941799 -0.0106226 ]
[-0.0106226 2.93229721]]
SideL: 14, mean = 6.986, var = 5.890
[[2.91107422 0.02079642]
[0.02079642 2.93781476]]
SideL: 15, mean = 7.004, var = 5.852
[[2.91769645 0.00512646]
[0.00512646 2.92421151]]
SideL: 16, mean = 7.019, var = 5.827
[[2.92680693e+00 1.06827737e-03]
[1.06827737e-03 2.89773902e+00]]
SideL: 17, mean = 6.989, var = 5.872
[[2.90389796 0.01874928]
[0.01874928 2.93117861]]
SideL: 18, mean = 6.981, var = 5.834
[[2.93198487 0.00553051]
[0.00553051 2.89087282]]
SideL: 19, mean = 7.001, var = 5.832
[[ 2.93776145 -0.00791676]
[-0.00791676 2.91020784]]
SideL: 20, mean = 7.015, var = 5.840
[[ 2.92207666e+00 -8.99596792e-04]
[-8.99596792e-04 2.91996094e+00]]
# Run this cell to submit your answer
utils.exercise_9()
As the number of sides in the die increases:
ToggleButtons(description='The mean of the sum:', options=('stays the same', 'increases', 'decreases'), value=…
ToggleButtons(description='The variance of the sum:', options=('stays the same', 'increases', 'decreases'), va…
ToggleButtons(description='The covariance of the joint distribution:', options=('stays the same', 'increases',…
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
Given a 6-sided loaded dice. You throw it twice and record the sum. Which of the following statemets is true?
# You can use this cell for your calculations (not graded)
n_sides = 6
n_times = 50000
for loaded_number in range(1, n_sides + 1):
# loaded_number = 1
probs_loaded_die = load_dice(n_sides = n_sides, loaded_number = loaded_number)
die = np.array([i for i in range(1, n_sides + 1)])
first_rolls = np.array([np.random.choice(die, p=probs_loaded_die) for _ in range(n_rolls)])
second_rolls = np.array([np.random.choice(die, p=probs_loaded_die) for _ in range(n_rolls)])
sum_of_rolls = first_rolls + second_rolls
mean = np.mean(sum_of_rolls)
var = np.var(sum_of_rolls)
covar = np.cov(sum_of_rolls)
print(f"loaded_side = {loaded_number}, mean = {mean:.3f}, var = {var:.3f}, covar = {covar:.3f}")
loaded_side = 1, mean = 6.275, var = 6.531, covar = 6.531
loaded_side = 2, mean = 6.577, var = 5.512, covar = 5.513
loaded_side = 3, mean = 6.874, var = 5.063, covar = 5.063
loaded_side = 4, mean = 7.164, var = 5.068, covar = 5.068
loaded_side = 5, mean = 7.452, var = 5.542, covar = 5.542
loaded_side = 6, mean = 7.725, var = 6.504, covar = 6.504
# Run this cell to submit your answer
utils.exercise_10()
RadioButtons(layout=Layout(width='max-content'), options=('the mean and variance is the same regardless of whi…
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
Given a fair n-sided dice. You throw it twice and record the sum but the second throw depends on the result of the first one such as in exercises 7 and 8. Which of the following statements is true?
# You can use this cell for your calculations (not graded)
n_sides = 6
n_times = 50000
die = np.array([i for i in range(1, n_sides + 1)])
first_rolls = np.array([np.random.choice(die) for _ in range(n_rolls)])
second_rolls = np.array([np.random.choice(die) for _ in range(n_rolls)])
second_rolls = np.where(first_rolls>3, second_rolls, 0)
sum_of_rolls = first_rolls + second_rolls
mean = np.mean(sum_of_rolls)
var = np.var(sum_of_rolls)
covar = np.cov(first_rolls, second_rolls)
print(f"mean = {mean:.3f}, var = {var:.3f}")
print(covar)
mean = 5.255, var = 12.658
[[2.91186951 2.6219118 ]
[2.6219118 4.50285004]]
# Run this cell to submit your answer
utils.exercise_11()
RadioButtons(layout=Layout(width='max-content'), options=('changing the direction of the inequality will chang…
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
Given a n-sided dice (could be fair or not). You throw it twice and record the sum (there is no dependance between the throws). If you are only given the histogram of the sums can you use it to know which are the probabilities of the dice landing on each side?
# You can use this cell for your calculations (not graded)
# Run this cell to submit your answer
utils.exercise_12()
RadioButtons(layout=Layout(width='max-content'), options=('yes, but only if one of the sides is loaded', 'no, …
Button(button_style='success', description='Save your answer!', style=ButtonStyle())
Output()
Run the next cell to check that you have answered all of the exercises
utils.check_submissions()
All answers saved, you can submit the assignment for grading!
Congratulations on finishing this assignment!
During this assignment you tested your knowledge on probability distributions, descriptive statistics and visual interpretation of these concepts. You had the choice to compute everything analytically or create simulations to assist you get the right answer. You probably also realized that some exercises could be answered without any computations just by looking at certain hidden queues that the visualizations revealed.
Keep up the good work!