Why "unnormalized log probabilities"?

henrikh · March 25, 2022, 12:49pm

Hi,

why are the probabilities in the rnn_step_forward function in utils.py called unnormalized log probabilities? This is the 2nd assignment of the 1st week.
Maybe, only the second comment probabilities for next chars should be left?

def rnn_step_forward(parameters, a_prev, x):

Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
a_next = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + b) # hidden state
p_t = softmax(np.dot(Wya, a_next) + by) # unnormalized log probabilities for next chars # probabilities for next chars

return a_next, p_t

Shouldn’t only the arguments of the softmax function be called unnormalized log probabilities?
Softmax is defined in this way (for the sake of simplicity, I didn’t include max(z) subtraction that is included in the original definition in utils.py):

p_{j}=\frac{e^{z_{j}}}{\sum_{i} e^{z_{i}}}

If we take the -log(p_{j}) that would be proportional to z_{j} but not normalized. Now it makes sense to me to call z_{j} unnormalized log probability. The expression for -log(p_{j}) is:

-\log p_{j}=-\log \frac{e^{z_{j}}}{\sum_{i} e^{z_{i}}}=-\left(\log e^{z_{j}}-\log \sum_{i} e^{z_{i}}\right)=-\left(z_{j}-\log \sum_{i} e^{z_{i}}\right),

and the proportionality coefficient is equal to -1:

k=\frac{-\log p_{j}-\log \sum_{i} e^{z_{i}}}{z_{j}}=-1

As an example of it in Python:

-z = np.random.randn(5)
print(-np.log(softmax(z)))

[1.77946694 1.5168839 2.20200317 0.82859276 2.73903499]
that is clearly not normalized but proportional to z_j

k = -((np.log(softmax(z)) + np.log(np.sum(np.exp(z))))/z)
print(k)

[-1. -1. -1. -1. -1.]

Is my reasoning correct?

TMosh · April 3, 2022, 2:53am

I agree, since the activation function there is tanh(), “log” does not seem appropriate.

henrikh · April 4, 2022, 2:51pm

Dear Tom,

Why does the activation function matter?
I think, in principle, we could have any activation function to evaluate a_next and in softmax would multiply it by Wya weights and sum by, wouldn’t it? So, softmax would work with np.dot(Wya, a_next) + by.

Henrikh

TMosh · April 4, 2022, 3:13pm

Yes, it would.
But there’s no “log” involved unless you’re using something that has a log function in the cost equation.

henrikh · April 13, 2022, 8:04pm

I think I get it now. Thanks!

Topic		Replies	Views
Model Output with and without Softmax Activation / from_logits=True Advanced Learning Algorithms week-module-2	11	493	June 1, 2023
Why use Softmax instead of a linear transform that sums to 1? Neural Networks and Deep Learning coursera-platform	5	997	May 28, 2021
Softmax_Regression Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	452	August 1, 2023
Why are the outputs not probabilities? Advanced Learning Algorithms week-module-2	2	526	August 1, 2022
Where does this e^z come from while doing softmax? Advanced Learning Algorithms week-module-2	5	462	July 9, 2023

Why "unnormalized log probabilities"?

Related topics