Programming Assignment - Choice of Loss Function

I’m completing the “Math for ML and DS” and “ML” Specializations in parallel. I know it’s beyond the scope of the initial material, but I have been curious about how loss functions are chosen.

  • Variance vs Other Methods: in the Programming Assignment, why did we use the variance of our model instead of other loss function approaches such as the mean squared error, mean absolute error, etc? Was that for simplicity or is usage of the variance a common approach in ML?
  • General Approach: to ML practitioners, how do you usually choose your loss functions? My guess would be that you first try common loss functions and then rarely (10% of the time or less) create customized loss functions if needed.

Thank you!

I’m not a mentor for this course, but in general:

Loss functions are chosen based on the type of problem you’re working on.

  • If it’s a prediction of a real value, then it uses the linear regression cost function.
  • If it’s true/false or simple classification, then it’s the logistic regression cost function.
  • If it’s multiple classes, or a complex data set, then you’d typically use a dense NN output layer with softmax, which uses a sparse categorical loss function.

Thank you for sharing that perspective. I also found this article helpful: https://builtin.com/machine-learning/common-loss-functions

I’d still appreciate if any instructors chime in: why did we use variance (instead of MSE) and what is your process for choosing your loss functions? For example, I’m curious about if there are “go to” loss functions that you always use or if it’s more of a custom and experimental approach each time.

Personally I’ve never used variance. So I don’t have an opinion on that.

The loss function is generally based on the problem you’re trying to solve, as I mentioned.

The ML arts are mature enough that new loss functions aren’t really needed, unless you’re solving a unique new type of problem.

Hey @farmerinatechstack,
I just completed the assignment myself, and indeed, the choice of loss function arouses curiosity among the learners. Now, you might be wondering about 2 things, particularly:

  1. Why are we minimising the variance instead of the average (as being mentioned in the assignment)?
  2. Why aren’t we using any other loss functions, such as logistic loss, MSE, MAE, etc?

Let me answer the second question first. For computing the logistic loss, MSE, MAE, etc, we need the true values of w or the total cost corresponding to each of the data-points in our dataset of previous prices. But we have neither, so, we won’t be able to formulate the loss functions in the form of MAE, MSE, etc. Feel free to think about it once on your own.


Now, let’s come to the first question. Why not minimise the average? In the resource mentioned in the assignment, it has been specified that “Minimising the variance” is a popular strategy that helps to maximize returns and minimize risks. So, let me try to give my 2 cents as to why I think minimising the variance could be better than minimising the mean.

For this, I will use a little bit of extra code. First, we will try to find the w, for which we can minimise the mean of the previous data-points. The code for this is as follows:

### MY CODE
def avg_of_omega(omega):
    return 1 / len(f_of_omega(omega)) * np.sum(f_of_omega(omega))

def avg_of_omega_array(omega_array):
    N = len(omega_array)
    avg_array = np.zeros(N)

    for i in range(N):
        ### START CODE HERE ### (~ 2 lines of code)
        L = avg_of_omega(omega_array[i])
        avg_array = avg_array.at[i].set(L)
        ### END CODE HERE ###
        
    return avg_array

avg_array = avg_of_omega_array(omega_array)
i_opt = avg_array.argmin()
omega_opt = omega_array[i_opt]
avg_opt = avg_array[i_opt]
print(f'omega_min = {omega_opt:.3f}\nL_of_omega_min = {avg_opt:.7f}')

It will produce the following output:

omega_min = 0.000
L_of_omega_min = 100.0000000

Now, say, we need to estimate the price of the product in the future, from suppliers A and B. The most simple answer to this question is to find the average of the product’s prices corresponding to suppliers A and B, from the past. So, let’s find those out. The code for that is as follows:

# The average is the most likely estimate for the prices in the future
prices_A_avg = np.mean(prices_A)
prices_B_avg = np.mean(prices_B)

Now, say we are preparing an annual budget (as per the problem statement). In that, we can’t write 12 prices for 12 months, we have to include a single cost for the procurement of n units of the product (for each of the 12 months). So, once again, what will be this cost? One simple answer is to include the average cost of procurement of the product in the past. So, we will compute the avg_price for both values of w, and also, we will find the deviation in the cost of procurement (per month). The code for this is as follows:

## If we use the minimisation of mean strategy, deviation in the total price (per month)
avg_price = 0 * prices_A_avg + (1 - 0) * prices_B_avg
true_prices = 0 * prices_A + (1 - 0) * prices_B
dev_prices = abs(true_prices - avg_price) / avg_price
print(f"Maximum Deviation (in the case of minimisation of mean): {max(dev_prices):.4f}")

## If we use the minimisation of variance strategy, deviation in the total price (per month)
avg_price = 0.702 * prices_A_avg + (1 - 0.702) * prices_B_avg
true_prices = 0.702 * prices_A + (1 - 0.702) * prices_B
dev_prices = abs(true_prices - avg_price) / avg_price
print(f"Maximum Deviation (in the case of minimisation of variance): {max(dev_prices):.4f}")

The output is as follows:

Maximum Deviation (in the case of minimisation of mean): 0.2400
Maximum Deviation (in the case of minimisation of variance): 0.0722

The interesting thing to note here is that if we keep w = 0, the cost of procurement may vary up to 24% of what we have included in our budget, which seems like a huge risk, and the risk increases as the cost of procurement increases. But on the other hand, if we keep w = 0.7, the cost of procurement may only vary up to 7.22% of what we have included in our budget, i.e., a 3 times smaller risk. The interesting thing to note here is that we may end up paying more money to procure the same amount of product over a period of time, say a year. But the advantage is that every month, the cost of procurement fluctuates less, and hence, it could be easier to manage with the monthly funds that the company gets as per the annual budget.

Let me know if this helps.

Cheers,
Elemento

1 Like