How does the size of training data affect the size of the activation function

Hi, I just completed the lab 1 as part of the 1st week’s course work and I was wondering how come the array size of training data i.e “X” does not impact the size of the activation function a1. Can someone please explain why?

What do you mean as size of activation function! The activation function is taking an input and producing an output, the data list size is not affecting it!

Hello @VikJagger,

Could you please provide some screenshots that show how you discovered that the size of X did not impact the size of a1?


Hi, I meant the matrix dimension of a1. How is that not dependent on X’s dimension especially the size of the training data.

In regression, the sample size “m” affects the cost function. But I didn’t see the sample size affect any of the equations in neural networks.

Hi @rmwkwok , I inferred it (may be wrongly) from this part of the assignment C2_W1.

X in the assignment has a shape of 1000x400. The shape of W1 is 400x25. So even if the training size was 2000, the shape of W1 wouldn’t change right?

Yes, it is correct that it won’t change. Do you think this may be a problem? :wink:

I think that’s the very correct and important observation, that the sample size SHOULD NOT affect them, otherwise, we would be saying that “a neural network can only make prediction when supplied with a certain number of samples” which is quite non-sense, right? Can you imagine we have a neural network that can make prediction if we provide it with 5 samples, but not 4 nor 6 nor 10?

Although the sample size does not affect the equation and the shape of any weights matrix, it does affect the size of the output from each layer. You said there is a m in the cost function, but there is also a summation sign in the cost function, right? What does it sum over?


@rmwkwok It sums over the sample size and divides by sample size. So, i guess it averages the effect. It makes sense that the sample size should not affect the network.Your response was very insightful. Thanks a lot.



You are welcome, Vik!


This seems an interesting observation. However it seems to be counter intuitive also. isnt it? lets take an example of face recognition again. Wouldnt large number of samples mean algorithm identifying different number of hidden layers. eg. if your sample set if only for lets say Latin American population - Would this system really identify Asian faces?