Batch Norm at test time doubt

sushantnair · July 3, 2024, 3:16pm

Hello,

I wanted a little clarification about something.

As seen in the image, Prof. Ng writes the X as X{1}, X{2}, X{3}, … and below he writes m{1}[l], m{2}[l], m{3}[l], …
What I was wondering is:

Did Prof. Ng deliberately skip writing [l] for the X’s like,
X{1}[l], X{2}[l], X{3}[l], …
Also, is it more accurate to replace X by A like
A{1}[l], A{2}[l], A{3}[l], …

Also, can you please verify if my understanding of layers and minibatches is correct?

Thanks,
Sushant

balaji.ambresh · July 4, 2024, 3:43am

Please pay attention to the notation:

m: batch size
\mu^{\{i\}[l]}: mean of i^{th} mini-batch obtained at l^{th} layer. Andrew is showing the forward propagation part of the network for each mini batch.
X^{\{i\}} is the i^{th} mini-batch which is of shape (# features per row, mini-batch size)
Unless specific context is assumed, inputs to a NN is represented by X. Z is the output of W^{T} X. A refers to output of activation function applied to Z.

Your understanding of mini-batch in the last figure is correct. The entire batch X can be split into smaller mini-batches and passed through layer l.

sushantnair · July 4, 2024, 4:46am

Ok. So @balaji.ambresh can you please tell whether the statement 1. is correct? Your explanation helps me understand about my statement 2. and the figure. But I am not so sure yet reading your explanation whether my statement 1. is correct or not.

balaji.ambresh · July 4, 2024, 4:56am

Your understanding is correct in point 1. Mini-batch i serving as input to layer l should be written as X^{\{i\}[l]}

sushantnair · July 4, 2024, 5:09am

Thanks a lot @balaji.ambresh ! Glad to know that my concepts are in good shape

balaji.ambresh · July 4, 2024, 8:10am

Nitpick: Since X is usually associated with input to the overall NN and not to any particular layer, consider using A^{\{i\}[l]} instead of X^{\{i\}[l]} (see course 1 week 4 assignment 2 L_model_forward)

def L_model_forward(X, parameters):
    """
    Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation
    
    Arguments:
    X -- data, numpy array of shape (input size, number of examples)
    parameters -- output of initialize_parameters_deep()
    
    Returns:
    AL -- last post-activation value
    caches -- list of caches containing:
                every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)
                the cache of linear_sigmoid_forward() (there is one, indexed L-1)
    """

    caches = []
    A = X
    L = len(parameters) // 2                  # number of layers in the neural network
    
    # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
    for l in range(1, L):
        A_prev = A 
        A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], activation = "relu")
        caches.append(cache)
    
    # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
    AL, cache = linear_activation_forward(A, parameters['W' + str(L)], parameters['b' + str(L)], activation = "sigmoid")
    caches.append(cache)
    
    assert(AL.shape == (1,X.shape[1]))
            
    return AL, caches

sushantnair · July 4, 2024, 9:05am

Thanks for the information @balaji.ambresh. It further drives the point that what I stated in statement 2. is correct.

Thanks a lot again

Topic		Replies	Views
Confusion with value l in 1st video "Mini batch Gradient descent" Improving Deep Neural Networks: Hyperparameter tun	1	510	January 31, 2022
week2:Optimization_methods Improving Deep Neural Networks: Hyperparameter tun	4	352	October 15, 2023
Doubts on the video titled Normalizing Activations in a Neural Network Improving Deep Neural Networks: Hyperparameter tun	1	522	June 29, 2021
Week2 Exercise2- mini batches Improving Deep Neural Networks: Hyperparameter tun	3	538	January 19, 2022
Week3 Programming Exercise, Section 3.3 Train the Model Improving Deep Neural Networks: Hyperparameter tun	8	538	April 12, 2023

Batch Norm at test time doubt

Related topics