Dimensions of activation function


Video: neural network layer

I’m looking at this video, and trying to figure out the dimension of the activation function.

X: (4,1)
w:(1,1)
z: Xw+b (4,1)(1,1)+b=(4,1)
a=g(z): (4,1)

but the video shows the activation function is a scalar? Where did the four dimensions go? Is the activation function shown here a sum over the samples of X?

Thanks,
Steven

Edit:

It does occur to me that it is possible this is addressed in later videos. But right now, my best guess is that the number of samples is always equal to the number of features in any neuron. In the language of statistics, the number of degrees of freedom of the fit is one greater than the number of data points at every neuron (because of b). It seems like it would be hard to add or remove data with that choice and that everything would be overfitting so I’m a bit confused but maybe this will be clarified later on.

Considering philosophy more than I would really like to for the purposes of math, I think this is because ML is not statistical, but rather is seeking complicated deterministic patterns. So, we have for a data set X with (m,n) samples and features

X: (m, n)
w_j^{[1]} : (1,m)
z_j^{[1]}: w_j^{[1]} X + b : (1,n)
a_j^{[1]}=g(z): (1,n)

At this point there are N_L neurons in the layer (making the layer arbitrary)
a^{[1]}: (N_L,n)

To iterate to the next layer
w_j^{[L+1]}: (1,N_L)
z_j^{[L+1]}:(1,n)
a_j^{[L+1]}: (1,n)
a^{[L+1]}:(N_{L+1},n)
and if the last a has L=1, then a=(1,n) which can match y_{out}=(1,n) features maybe

That’s my guess for how this works so that the dimensions match even with n features in the original data as well as m samples. It is weird that the w_j^{[1]} is dotted with the m samples rather than with the n features but that is my best guess as to how it works.

I’m still only just about to start this video

Next video

I don’t see this double exponent it’s mentioning. The numbers after the colons are matrix dimensions

Does anyone see what’s wrong here?

The videos do a really bad job of consistently addressing the situations where there are multiple features.

The best way to think of it is that for each example, each layer is the product of the outputs of the previous layer (a 1D vector) and a 2D weight matrix. Then you apply some activation function. The result is another 1D vector.

If you expand this to do batch processing on all of the examples (as an input matrix X of size (m x n), you get this dimensional analysis for each layer L:

Let ‘n’ be the number of input features, and ‘k’ be the number of units in the next hidden (or output) layer. * represents a dot product. g() represents an activation function.

L = g(X * W)

where:
(m x n) * (n x k) = (m x k)

1 Like

Here I have neglected the bias weight, it is a scalar value for each unit, added prior to applying the activation.

1 Like

I think there was a bug in your LaTeX. I did a single character edit that I think is what you intended, but please check me on that. If you haven’t tried that before, you should be able to view the edit history by clicking the little orange pencil icon in the upper right.

IDK. I thought it was right, but thank you for making it display correctly. LaTeX can be finicky like that. It always took a long time to get things to stay on the right page and for figures to be a pleasant size when I was writing drafts of papers in the past. I certainly am aware that I don’t always write it correctly on the first try though. Is there a way to see a preview before submitting the post?

When I am creating a post, it shows me a rendering of it on the right side of the screen as in the attached screen shot:

sin^2\theta + cos^2\theta = 1

Are you not seeing that?

Not every time, no. I see it when I edit but I don’t see it when I post. Maybe there’s a popup over it or something?

Yeah it’s a popup. If I close the popup, I can see it. Thanks :slight_smile:

1 Like