Hello,
I wanted a little clarification about something.
As seen in the image, Prof. Ng writes the X as X{1}, X{2}, X{3}, … and below he writes m{1}[l], m{2}[l], m{3}[l], …
What I was wondering is:
- Did Prof. Ng deliberately skip writing [l] for the X’s like,
X{1}[l], X{2}[l], X{3}[l], …
- Also, is it more accurate to replace X by A like
A{1}[l], A{2}[l], A{3}[l], …
Also, can you please verify if my understanding of layers and minibatches is correct?

Thanks,
Sushant
Please pay attention to the notation:
- m: batch size
- \mu^{\{i\}[l]}: mean of i^{th} mini-batch obtained at l^{th} layer. Andrew is showing the forward propagation part of the network for each mini batch.
- X^{\{i\}} is the i^{th} mini-batch which is of shape
(# features per row, mini-batch size)
- Unless specific context is assumed, inputs to a NN is represented by
X
. Z
is the output of W^{T} X. A
refers to output of activation function applied to Z
.
Your understanding of mini-batch in the last figure is correct. The entire batch X
can be split into smaller mini-batches and passed through layer l
.
1 Like
Ok. So @balaji.ambresh can you please tell whether the statement 1. is correct? Your explanation helps me understand about my statement 2. and the figure. But I am not so sure yet reading your explanation whether my statement 1. is correct or not.
Your understanding is correct in point 1. Mini-batch i
serving as input to layer l
should be written as X^{\{i\}[l]}
1 Like
Thanks a lot @balaji.ambresh ! Glad to know that my concepts are in good shape 
Nitpick: Since X
is usually associated with input to the overall NN and not to any particular layer, consider using A^{\{i\}[l]} instead of X^{\{i\}[l]} (see course 1 week 4 assignment 2 L_model_forward
)
def L_model_forward(X, parameters):
"""
Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation
Arguments:
X -- data, numpy array of shape (input size, number of examples)
parameters -- output of initialize_parameters_deep()
Returns:
AL -- last post-activation value
caches -- list of caches containing:
every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)
the cache of linear_sigmoid_forward() (there is one, indexed L-1)
"""
caches = []
A = X
L = len(parameters) // 2 # number of layers in the neural network
# Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
for l in range(1, L):
A_prev = A
A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation = "relu")
caches.append(cache)
# Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation = "sigmoid")
caches.append(cache)
assert(AL.shape == (1,X.shape[1]))
return AL, caches
2 Likes
Thanks for the information @balaji.ambresh. It further drives the point that what I stated in statement 2. is correct.
Thanks a lot again 