I am learning about vectorization in week 2 of the neural networks and deep learning course, and am curious about when professor Ng uses superscripts and subscripts, and just wanted some clarity. I understand that the feature vector that he is using is an “unrolled” image of size 12288 or something like that. When he is writing out his formula, he says that there is w1, x1, w2, x2 etc. each of superscript (i). is w1 and w2 representing each pixel, and the superscript i representing each feature vector? If so, does that mean that the deep learning algorithm is optimizing 12288 coefficients? That seems like way too many, even for a computer. How am I thinking about this wrong, or am I just underestimating what these algorithms can do?
In general superscripts identify one object out of many (a layer in a network or an input vector or a weight vector). Subscripts identify one element within a larger object (one element of a weight vector or a sample vector).
You are vastly underestimating what these algorithms can do. Please stay tuned. 64 x 64 x 3 images are actually pretty small in the grand scheme of things. 12288 is not a big number. It is common to train networks that have millions or even billions of parameters. The recent GPT3 model which Google trained for language generation has 175 billion parameters. Admittedly that is a pretty extreme case and of course you need some pretty powerful compute resources in order to train a network like that. Your home PC won’t cut it, even if you have a couple of GPUs.
dang, that’s soooo cool! So does each subscript refer to each pixel RGB value in this particular example?
Yes, the subscripts on w_i and x_i are selecting one particular color value (RGB) at one particular position in the image. Here’s a thread which discusses in detail how the “unrolling” works. If you read through the whole thread, you learn that there are more than one ways to do it.