I am not pro in the differential of log loss, could some one help me understand the why this is the anwser

Let’s take K=3 for now (which means 3 units only)

Check this online

I am not pro in the differential of log loss, could some one help me understand the why this is the anwser

J = ln \left(\frac{e^{\theta_1}}{\sum_{i=0}^K e^{\theta_i}}\right)

Let’s take K=3 for now (which means 3 units only)

\frac{\partial J}{\partial \theta_1} = 1 - \frac{e^{\theta_1}}{\sum_{i=0}^K e^{\theta_i}}

Check this online

Let n(\theta_1) = e^{\theta_1} and d(\theta_i) = \sum_{i=0}^K e^{\theta_i}, and finally f(\theta_i) = n(\theta_1) / d(\theta_i), such that J(\theta_i) = \ln f(\theta_i). By the chain rule, we have

\frac{\partial J}{\partial \theta_1} = \frac{1}{f(\theta_i)} \frac{\partial f(\theta_i)}{\partial \theta_1}.

Next, you need the following rule to differentiate quotients:

\frac{\partial f(\theta_i)}{\partial \theta_1} = \frac{1}{d(\theta_i)^2} \left[ d(\theta_i) \frac{\partial n(\theta_i)}{\partial \theta_1} - n(\theta_1) \frac{\partial d(\theta_i)}{\partial \theta_1} \right].

This is easy to compute, since

\frac{\partial n(\theta_i)}{\partial \theta_1} = e^{\theta_1} = \frac{\partial d(\theta_i)}{\partial \theta_1}

whence

\frac{\partial f(\theta_i)}{\partial \theta_1} = \frac{e^{\theta_1}}{d(\theta_i)^2} \left[ d(\theta_i) - n(\theta_1) \right] = \frac{n(\theta_1)}{d(\theta_i)} \left[ 1- \frac{e^{\theta_1}}{d(\theta_i)} \right].

At the end of the day,

\frac{\partial J}{\partial \theta_1} = \frac{d(\theta_i)}{n(\theta_1)} \frac{\partial f(\theta_i)}{\partial \theta_1} = 1- \frac{e^{\theta_1}}{d(\theta_i)}

as you set out to prove.

There are different ways to compute the same derivative; for example you can write J as \ln n(\theta_1) - \ln d(\theta_i) using the multiplicative property of the log, and then you realize that \ln n(\theta_1) = \theta_1 which has derivative 1. If you are happy using this property of the logarithm \ln, it makes the computation more streamlined.