In this assignment you will solve a simple optimization problem for a function of one variable. Given a dataset of historical prices of a product from two suppliers, your task is to identify what share of the product you should buy from each of the suppliers to make the best possible investment in the future. Stating the problem mathematically, you will construct a target function to minimize, evaluate its minimum and investigate how its derivative is connected with the result.
Your Company is aiming to minimize production costs of some goods. During the production process, an essential product P is used, which can be supplied from one of two partners - supplier A and supplier B. Your consultants requested the historical prices of product P from both suppliers A and B, which were provided as monthly averages for the period from February 2018 to March 2020.
Preparing Company Budget for the coming twelve months period, your plan is to purchase the same amount of product P monthly. Choosing the supplier, you noticed, that there were some periods in the past, when it would be more profitable to use supplier A (the prices of product P were lower), and other periods to work with supplier B. For the Budget model you can set some percentage of the goods to be purchased from supplier A (e.g. 60%) and the remaining part from supplier B (e.g. 40%), but this split should be kept consistent for the whole of the twelve months period. The Budget will be used in preparation for the contract negotiations with both suppliers.
Based on the historical prices, is there a particular percentage which will be more profitable to supply from Company A, and the remaining part from Company B? Or maybe it does not matter and you can work just with one of the suppliers?
Denoting prices of the product P from Company A and Company B as $p_A$ (USD) and $p_B$ (USD) respectively, and the volume of the product to be supplied per month as $n$ (units), the total cost in USD is:
$$f\left(\omega\right) = p_A \omega ,n+ p_B \left(1 - \omega\right) n,$$
where $0\leq\omega\leq1$ is the parameter. If $\omega = 1$, all goods will be supplied from Company A, and if $\omega = 0$, from Company B. In case of $0<\omega<1$, some percentage will be allocated to both.
As it is planned to keep the volume $n$ constant over the next twelve months, in the mathematical model the common approach is to put $n = 1$. You can do this, because nothing depends on the volume and the end result will be the same. Now the total cost will be simpler:
$$f\left(\omega\right) = p_A \omega+ p_B \left(1 - \omega\right) \tag{1}$$
Obviously, you do not know the future prices $p_A$ and $p_B$, only historical values (prices ${p_A^1, \cdots, p_A^k}$ and ${p_B^1, \cdots, p_B^k}$ for $k$ months). And historically there were various periods of time when it was better to take $\omega = 1$ ($p_A^i < p_B^i$) or $\omega = 0$ ($p_A^i >p_B^i$). Is it possible now to choose some $\omega$ value that would provide some evidence of minimum costs in the future?
This is a standard portfolio management (investment) problem well known in statistics, where based on the historical prices you need to make investment decision to maximize profit (minimize costs). Since statistics has not been covered in this Course, you do not need to understand the details about the function $\mathcal{L}\left(\omega\right)$ (called loss function) to minimize, explained in the next paragraph.
The approach is to calculate $f\left(\omega\right)$ for each of the historical prices $p_A^i$ and $p_B^i$, $f^i\left(\omega\right)=p_A^i \omega+ p_B^i \left(1 - \omega\right)$. Then take an average of those values, $\overline{f\left (\omega\right)}=\text{mean}\left(f^i\left(\omega\right)\right) = \frac{1}{k}\sum_{i=1}^{k}f^i\left(\omega\right)$ and look for such value of $\omega$ which makes $f^i\left(\omega\right)$ as “stable” as possible - varying as little as possible from the average $\overline{f\left (\omega\right)}$. This means that you would want to minimize the sum of the differences $\left(f^i \left(\omega\right) - \overline{f\left (\omega\right)}\right)$. As the differences can be negative or positive, a common approach is to take the squares of those and take an average of the squares:
$$\mathcal{L}\left(\omega\right) = \frac{1}{k}\sum_{i=1}^{k}\left(f^i \left(\omega\right) - \overline{f\left (\omega\right)}\right)^2\tag{2}$$
In statistics $\mathcal{L}\left(\omega\right)$ is called a variance of ${f^1 \left(\omega\right), \cdots , f^k \left(\omega\right)}$. The aim is to minimize the variance $\mathcal{L}\left(\omega\right)$, where $\omega\in\left[0, 1\right]$. Again, do not worry if you do not understand deeply why particularly this function $\mathcal{L}\left(\omega\right)$ was chosen. You might think if it is logical to minimize an average $\overline{f\left (\omega\right)}$, but risk management theory states that in this problem variance needs to be optimized.
Statistical theory shows that there is an $\omega\in\left[0, 1\right]$ value which minimizes function $\mathcal{L}\left(\omega\right)$ and it can be found using some properties of the datasets ${p_A^1, \cdots, p_A^k}$ and ${p_B^1, \cdots, p_B^k}$. However, as this is not a Course about statistics, the example is taken to illustrate an optimization problem of one variable based on some dataset. It is a chance for you to check your understanding and practice this week material.
Now let’s upload a dataset and explore if it is possible to find a minimum point for the corresponding function $\mathcal{L}\left(\omega\right)$.
Let’s import all of the required packages. In addition to the ones you have been using in this Course before, you will need to import pandas
library. It is a commonly used package for data manipulation and analysis.
# A function to perform automatic differentiation.
from jax import grad
# A wrapped version of NumPy to use JAX primitives.
import jax.numpy as np
# A library for programmatic plot generation.
import matplotlib.pyplot as plt
# A library for data manipulation and analysis.
import pandas as pd
# A magic command to make output of plotting commands displayed inline within the Jupyter notebook.
%matplotlib inline
Load the unit tests defined for this notebook.
import w1_unittest
# Please ignore the warning message about GPU/TPU if it appears.
WARNING:jax._src.lib.xla_bridge:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Historical prices for both suppliers A and B are saved in the file data/prices.csv
. To open it you can use pandas
function read_csv
. This example is very simple, there is no need to use any other parameters.
df = pd.read_csv('data/prices.csv')
The data is now saved in the variable df
as a DataFrame, which is the most commonly used pandas
object. It is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it as a table or a spreadsheet. Full documentation can be found here.
View the data with a standard print
function:
print(df)
date price_supplier_a_dollars_per_item \
0 1/02/2016 104
1 1/03/2016 108
2 1/04/2016 101
3 1/05/2016 104
4 1/06/2016 102
5 1/07/2016 105
6 1/08/2016 114
7 1/09/2016 102
8 1/10/2016 105
9 1/11/2016 101
10 1/12/2016 109
11 1/01/2017 103
12 1/02/2017 93
13 1/03/2017 98
14 1/04/2017 92
15 1/05/2017 97
16 1/06/2017 96
17 1/07/2017 94
18 1/08/2017 97
19 1/09/2017 93
20 1/10/2017 99
21 1/11/2017 93
22 1/12/2017 98
23 1/01/2018 94
24 1/02/2018 93
25 1/03/2018 92
26 1/04/2018 96
27 1/05/2018 98
28 1/06/2018 98
29 1/07/2018 93
30 1/08/2018 97
31 1/09/2018 102
32 1/10/2018 103
33 1/11/2018 100
34 1/12/2018 100
35 1/01/2019 104
36 1/02/2019 100
37 1/03/2019 103
38 1/04/2019 104
39 1/05/2019 101
40 1/06/2019 102
41 1/07/2019 100
42 1/08/2019 102
43 1/09/2019 108
44 1/10/2019 107
45 1/11/2019 107
46 1/12/2019 103
47 1/01/2020 109
48 1/02/2020 108
49 1/03/2020 108
price_supplier_b_dollars_per_item
0 76
1 76
2 84
3 79
4 81
5 84
6 90
7 93
8 93
9 99
10 98
11 96
12 94
13 104
14 101
15 102
16 104
17 106
18 105
19 103
20 106
21 104
22 113
23 115
24 114
25 124
26 119
27 115
28 112
29 111
30 106
31 107
32 108
33 108
34 102
35 104
36 101
37 101
38 100
39 103
40 106
41 100
42 97
43 98
44 90
45 92
46 92
47 99
48 94
49 91
To print a list of the column names use columns
attribute of the DataFrame:
print(df.columns)
Index(['date', 'price_supplier_a_dollars_per_item',
'price_supplier_b_dollars_per_item'],
dtype='object')
Reviewing the displayed table and the column names you can conclude that monthly prices are provided (in USD) and you only need the data from the columns price_supplier_a_dollars_per_item
and price_supplier_b_dollars_per_item
. In real life the datasets are significantly larger and require a proper review and cleaning before injection into models. But this is not the focus of this Course.
To access the values of one column of the DataFrame you can use the column name as an attribute. For example, the following code will output date
column of the DataFrame df
:
df.date
0 1/02/2016
1 1/03/2016
2 1/04/2016
3 1/05/2016
4 1/06/2016
5 1/07/2016
6 1/08/2016
7 1/09/2016
8 1/10/2016
9 1/11/2016
10 1/12/2016
11 1/01/2017
12 1/02/2017
13 1/03/2017
14 1/04/2017
15 1/05/2017
16 1/06/2017
17 1/07/2017
18 1/08/2017
19 1/09/2017
20 1/10/2017
21 1/11/2017
22 1/12/2017
23 1/01/2018
24 1/02/2018
25 1/03/2018
26 1/04/2018
27 1/05/2018
28 1/06/2018
29 1/07/2018
30 1/08/2018
31 1/09/2018
32 1/10/2018
33 1/11/2018
34 1/12/2018
35 1/01/2019
36 1/02/2019
37 1/03/2019
38 1/04/2019
39 1/05/2019
40 1/06/2019
41 1/07/2019
42 1/08/2019
43 1/09/2019
44 1/10/2019
45 1/11/2019
46 1/12/2019
47 1/01/2020
48 1/02/2020
49 1/03/2020
Name: date, dtype: object
Load the historical prices of supplier A and supplier B into variables prices_A
and prices_B
, respectively. Convert the price values into NumPy
arrays with elements of type float32
using np.array
function.
### START CODE HERE ### (~ 4 lines of code)
prices_A = df["price_supplier_a_dollars_per_item"]
prices_B = df["price_supplier_b_dollars_per_item"]
prices_A = np.array(prices_A).astype('float32')
prices_B = np.array(prices_B).astype('float32')
### END CODE HERE ###
# Print some elements and mean values of the prices_A and prices_B arrays.
print("Some prices of supplier A:", prices_A[0:5])
print("Some prices of supplier B:", prices_B[0:5])
print("Average of the prices, supplier A:", np.mean(prices_A))
print("Average of the prices, supplier B:", np.mean(prices_B))
Some prices of supplier A: [104. 108. 101. 104. 102.]
Some prices of supplier B: [76. 76. 84. 79. 81.]
Average of the prices, supplier A: 100.799995
Average of the prices, supplier B: 100.0
>Some prices of supplier A: [104. 108. 101. 104. 102.]
Some prices of supplier B: [76. 76. 84. 79. 81.]
Average of the prices, supplier A: 100.799995
Average of the prices, supplier B: 100.0
w1_unittest.test_load_and_convert_data(prices_A, prices_B)
[92m All tests passed
Average prices from both suppliers are similar. But if you will plot the historical prices, you will see that there were periods of time when the prices were lower for supplier A, and vice versa.
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
plt.plot(prices_A, 'g', label="Supplier A")
plt.plot(prices_B, 'b', label="Supplier B")
plt.legend()
plt.show()
Based on the historical data, can you tell which supplier it will be more profitable to work with? As discussed in the section 1.3, you need to find such an $\omega \in \left[0, 1\right]$ which will minimize function $(2)$.
Calculate f_of_omega
, corresponding to the $f^i\left(\omega\right)=p_A^i \omega+ p_B^i \left(1 - \omega\right)$. Prices ${p_A^1, \cdots, p_A^k}$ and ${p_B^1, \cdots, p_B^k}$ are saved in the arrays prices_A
and prices_B
. Thus, multiplying them by the scalars omega
and 1 - omega
and adding together the resulting arrays, you will get an array containing ${f^1\left(\omega\right), \cdots, f^k\left(\omega\right)}$.
Then array f_of_omega
can be taken to calculate L_of_omega
, according to the expression $(2)$:
$$\mathcal{L}\left(\omega\right) = \frac{1}{k}\sum_{i=1}^{k}\left(f^i \left(\omega\right) - \overline{f\left (\omega\right)}\right)^2$$
def f_of_omega(omega):
### START CODE HERE ### (~ 1 line of code)
f = prices_A * omega + prices_B * (1 - omega)
### END CODE HERE ###
return f
def L_of_omega(omega):
return 1/len(f_of_omega(omega)) * np.sum((f_of_omega(omega) - np.mean(f_of_omega(omega)))**2)
print("L(omega = 0) =",L_of_omega(0))
print("L(omega = 0.2) =",L_of_omega(0.2))
print("L(omega = 0.8) =",L_of_omega(0.8))
print("L(omega = 1) =",L_of_omega(1))
L(omega = 0) = 110.72
L(omega = 0.2) = 61.1568
L(omega = 0.8) = 11.212797
L(omega = 1) = 27.48
>L(omega = 0) = 110.72
L(omega = 0.2) = 61.1568
L(omega = 0.8) = 11.212797
L(omega = 1) = 27.48
w1_unittest.test_f_of_omega(f_of_omega)
[92m All tests passed
Analysing the output above, you can notice that values of the function $\mathcal{L}$ are decreasing for $\omega$ increasing from $0$ to $0.2$, then to $0.8$, but there is an increase of the function $\mathcal{L}$ when $\omega = 1$. What will be the $\omega$ giving the minimum value of the function $\mathcal{L}$?
In this simple example $\mathcal{L}\left(\omega\right)$ is a function of one variable and the problem of finding its minimum point with a certain accuracy is a trivial task. You just need to calculate function values for each $\omega = 0, 0.001, 0.002, \cdots , 1$ and find minimum element of the resulting array.
Function L_of_omega
will not work if you will pass an array instead of a single value of omega
(it was not designed for that). It is possible to rewrite it in a way that it would be possible, but here there is no need in that right now - you can calculate the resulting values in the loop as there will be not as many of them.
Evaluate function L_of_omega
for each of the elements of the array omega_array
and pass the result into the corresponding element of the array L_array
with the function .at[<index>].set(<value>)
.
Note: jax.numpy
has been uploaded instead of the original NumPy
. Up to this moment jax
functionality has not been actually used, but it will be called in the cells below. Thus there was no need to upload both versions of the package, and you have to use .at[<index>].set(<value>)
function to update the array.
# Parameter endpoint=True will allow ending point 1 to be included in the array.
# This is why it is better to take N = 1001, not N = 1000
N = 1001
omega_array = np.linspace(0, 1, N, endpoint=True)
# This is organised as a function only for grading purposes.
def L_of_omega_array(omega_array):
N = len(omega_array)
L_array = np.zeros(N)
for i in range(N):
### START CODE HERE ### (~ 2 lines of code)
L = L_of_omega(omega_array[i])
L_array = L_array.at[i].set(L)
### END CODE HERE ###
return L_array
L_array = L_of_omega_array(omega_array)
print("L(omega = 0) =",L_array[0])
print("L(omega = 1) =",L_array[N-1])
L(omega = 0) = 110.72
L(omega = 1) = 27.48
>L(omega = 0) = 110.72
L(omega = 1) = 27.48
w1_unittest.test_L_of_omega_array(L_of_omega_array)
[92m All tests passed
Now a minimum point of the function $\mathcal{L}\left(\omega\right)$ can be found with a NumPy
function argmin()
. As there were $N = 1001$ points taken in the segment $\left[0, 1\right]$, the result will be accurate to three decimal places:
i_opt = L_array.argmin()
omega_opt = omega_array[i_opt]
L_opt = L_array[i_opt]
print(f'omega_min = {omega_opt:.3f}\nL_of_omega_min = {L_opt:.7f}')
omega_min = 0.702
L_of_omega_min = 9.2497196
This result means that, based on the historical data, $\omega = 0.702$ is expected to be the most profitable choice for the share between suppliers A and B. It is reasonable to plan $70.2%$ of product P to be supplied from Company A, and $29.8%$ from Company B.
If you would like to improve the accuracy, you just need to increase the number of points N. This is a very simple example of a model with one parameter, resulting in optimization of a function of one variable. It is computationally cheap to evaluate it in many points to find the minimum with certain accuracy. But in machine learning the models have hundreds of parameters, using similar approach you would need to perform millions of target function evaluations. This is not possible in most of the cases, and that’s where Calculus with its methods and approaches comes into play.
In the next weeks of this Course you will learn how to optimize multivariate functions using differentiation. But for now as you are on the learning curve, let’s evaluate the derivative of the function $\mathcal{L}\left(\omega\right)$ at the points saved in the array omega_array
to check that at the minimum point the derivative is actually the closest to zero.
For each $\omega$ in the omega_array
calculate $\frac{d\mathcal{L}}{d\omega}$ using grad()
function from JAX
library. Remember that you need to pass the function which you want to differentiate (here $\mathcal{L}\left(\omega\right)$) as an argument of grad()
function and then evaluate the derivative for the corresponding element of the omega_array
. Then pass the result into the corresponding element of the array dLdOmega_array
with the function .at[<index>].set(<value>)
.
# This is organised as a function only for grading purposes.
def dLdOmega_of_omega_array(omega_array):
N = len(omega_array)
dLdOmega_array = np.zeros(N)
for i in range(N):
### START CODE HERE ### (~ 2 lines of code)
dLdOmega = grad(L_of_omega)(omega_array[i])
dLdOmega_array = dLdOmega_array.at[i].set(dLdOmega)
### END CODE HERE ###
return dLdOmega_array
dLdOmega_array = dLdOmega_of_omega_array(omega_array)
print("dLdOmega(omega = 0) =",dLdOmega_array[0])
print("dLdOmega(omega = 1) =",dLdOmega_array[N-1])
dLdOmega(omega = 0) = -288.96
dLdOmega(omega = 1) = 122.47999
>dLdOmega(omega = 0) = -288.96
dLdOmega(omega = 1) = 122.47999
w1_unittest.test_dLdOmega_of_omega_array(dLdOmega_of_omega_array)
[92m All tests passed
Now to find the closest value of the derivative to $0$, take absolute values $\left|\frac{d\mathcal{L}}{d\omega}\right|$ for each omega and find minimum of them.
i_opt_2 = np.abs(dLdOmega_array).argmin()
omega_opt_2 = omega_array[i_opt_2]
dLdOmega_opt_2 = dLdOmega_array[i_opt_2]
print(f'omega_min = {omega_opt_2:.3f}\ndLdOmega_min = {dLdOmega_opt_2:.7f}')
omega_min = 0.702
dLdOmega_min = -0.1290760
The result is the same: $\omega = 0.702$. Let’s plot $\mathcal{L}\left(\omega\right)$ and $\frac{d\mathcal{L}}{d\omega}$ to visualize the graphs of them, minimum point of the function $\mathcal{L}\left(\omega\right)$ and the point where its derivative is around $0$:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
# Setting the axes at the origin.
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
plt.plot(omega_array, L_array, "black", label = "$\mathcal{L}\\left(\omega\\right)$")
plt.plot(omega_array, dLdOmega_array, "orange", label = "$\mathcal{L}\'\\left(\omega\\right)$")
plt.plot([omega_opt, omega_opt_2], [L_opt,dLdOmega_opt_2], 'ro', markersize=3)
plt.legend()
plt.show()
Congratulations, you have finished the assignment for this week! This example illustrates how optimization problems may appear in real life, and gives you an opportunity to explore the simple case of minimizing a function with one variable. Now it is time to learn about optimization of multivariate functions!