Batch Norm at test time doubt

Hello,

I wanted a little clarification about something.


As seen in the image, Prof. Ng writes the X as X{1}, X{2}, X{3}, … and below he writes m{1}[l], m{2}[l], m{3}[l], …
What I was wondering is:

  1. Did Prof. Ng deliberately skip writing [l] for the X’s like,
    X{1}[l], X{2}[l], X{3}[l], …
  2. Also, is it more accurate to replace X by A like
    A{1}[l], A{2}[l], A{3}[l], …

Also, can you please verify if my understanding of layers and minibatches is correct?
image

Thanks,
Sushant

Please pay attention to the notation:

  1. m: batch size
  2. \mu^{\{i\}[l]}: mean of i^{th} mini-batch obtained at l^{th} layer. Andrew is showing the forward propagation part of the network for each mini batch.
  3. X^{\{i\}} is the i^{th} mini-batch which is of shape (# features per row, mini-batch size)
  4. Unless specific context is assumed, inputs to a NN is represented by X. Z is the output of W^{T} X. A refers to output of activation function applied to Z.

Your understanding of mini-batch in the last figure is correct. The entire batch X can be split into smaller mini-batches and passed through layer l.

1 Like

Ok. So @balaji.ambresh can you please tell whether the statement 1. is correct? Your explanation helps me understand about my statement 2. and the figure. But I am not so sure yet reading your explanation whether my statement 1. is correct or not.

Your understanding is correct in point 1. Mini-batch i serving as input to layer l should be written as X^{\{i\}[l]}

1 Like

Thanks a lot @balaji.ambresh ! Glad to know that my concepts are in good shape :slightly_smiling_face:

Nitpick: Since X is usually associated with input to the overall NN and not to any particular layer, consider using A^{\{i\}[l]} instead of X^{\{i\}[l]} (see course 1 week 4 assignment 2 L_model_forward)

def L_model_forward(X, parameters):
    """
    Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation
    
    Arguments:
    X -- data, numpy array of shape (input size, number of examples)
    parameters -- output of initialize_parameters_deep()
    
    Returns:
    AL -- last post-activation value
    caches -- list of caches containing:
                every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)
                the cache of linear_sigmoid_forward() (there is one, indexed L-1)
    """

    caches = []
    A = X
    L = len(parameters) // 2                  # number of layers in the neural network
    
    # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
    for l in range(1, L):
        A_prev = A 
        A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation = "relu")
        caches.append(cache)
    
    # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
    AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation = "sigmoid")
    caches.append(cache)
    
    assert(AL.shape == (1,X.shape[1]))
            
    return AL, caches
2 Likes

Thanks for the information @balaji.ambresh. It further drives the point that what I stated in statement 2. is correct.

Thanks a lot again :smile: