# Batch Norm at test time doubt

Hello,

I wanted a little clarification about something.

As seen in the image, Prof. Ng writes the X as X{1}, X{2}, X{3}, … and below he writes m{1}[l], m{2}[l], m{3}[l], …
What I was wondering is:

1. Did Prof. Ng deliberately skip writing [l] for the X’s like,
X{1}[l], X{2}[l], X{3}[l], …
2. Also, is it more accurate to replace X by A like
A{1}[l], A{2}[l], A{3}[l], …

Also, can you please verify if my understanding of layers and minibatches is correct?

Thanks,
Sushant

Please pay attention to the notation:

1. m: batch size
2. \mu^{\{i\}[l]}: mean of i^{th} mini-batch obtained at l^{th} layer. Andrew is showing the forward propagation part of the network for each mini batch.
3. X^{\{i\}} is the i^{th} mini-batch which is of shape (# features per row, mini-batch size)
4. Unless specific context is assumed, inputs to a NN is represented by X. Z is the output of W^{T} X. A refers to output of activation function applied to Z.

Your understanding of mini-batch in the last figure is correct. The entire batch X can be split into smaller mini-batches and passed through layer l.

1 Like

Ok. So @balaji.ambresh can you please tell whether the statement 1. is correct? Your explanation helps me understand about my statement 2. and the figure. But I am not so sure yet reading your explanation whether my statement 1. is correct or not.

Your understanding is correct in point 1. Mini-batch i serving as input to layer l should be written as X^{\{i\}[l]}

1 Like

Thanks a lot @balaji.ambresh ! Glad to know that my concepts are in good shape

Nitpick: Since X is usually associated with input to the overall NN and not to any particular layer, consider using A^{\{i\}[l]} instead of X^{\{i\}[l]} (see course 1 week 4 assignment 2 L_model_forward)

def L_model_forward(X, parameters):
"""
Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation

Arguments:
X -- data, numpy array of shape (input size, number of examples)
parameters -- output of initialize_parameters_deep()

Returns:
AL -- last post-activation value
caches -- list of caches containing:
every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)
the cache of linear_sigmoid_forward() (there is one, indexed L-1)
"""

caches = []
A = X
L = len(parameters) // 2                  # number of layers in the neural network

# Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
for l in range(1, L):
A_prev = A
A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation = "relu")
caches.append(cache)

# Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation = "sigmoid")
caches.append(cache)

assert(AL.shape == (1,X.shape[1]))

return AL, caches

2 Likes

Thanks for the information @balaji.ambresh. It further drives the point that what I stated in statement 2. is correct.

Thanks a lot again