How are images represented as input features for a neural network ? I have understood how an image is understood by a computer . Are the input features x1,x2…xm , the matrix representations of each image ??
Does the matrix X ( of dimension nx X m ) contain all the matrix representations of each image for m images ??
Yes. The way the Neural Networks that we are learning about here in Course 1 work is that each input (“sample”) needs to be formatted as a vector. Prof Ng has chosen to use column vectors as opposed to row vectors, but that is just a choice he made. For efficiency, we want to handle multiple samples at once using vectorized instructions, so when we have multiple input samples, they are concatenated into a matrix usually called X. The standard convention is to use n_x as the number of features (the number of elements in each input vector) and the number of samples in the given set is called m. So given that each image is “unrolled” into a column vector, the dimensions of X end up being n_x x m. Each “sample” (image) is one column of X.
Of course images normally are represented as 3D arrays with dimensions height x width x channels. The input images here in Course 1 Week 2 are 64 x 64 RGB images, so the arrays are 64 x 64 x 3 with the “channels” being the R, G and B color values for each pixel. 64 * 64 * 3 = 12288, so we have:
n_x = 12288
in this particular case, but the ideas generalize to images of any size.
Then we have to preprocess those images by “unrolling” or “flattening” them from 3D arrays to 1D vectors. They explain this and basically write the logic for you in the Week 2 assignment notebook, but here is a thread which explains in detail how that all works.
Hello, thanks for the reply. I have a doubt in week-3. Here Prof. Andrew Ng, while teaching about neural network with one hidden layer, he takes x1, x2, x3 as inputs in the input layer. What exactly are these x1, x2, x3s.
I initially thought that each x_i in an input layer represents the feature vector of each image. But later when prof explained about multiple examples, there arouse a very big confusion in me. What are those x_is that prof showed in the slides?
Please do illustrate with a real time example like the same logistic regression used in the course.
I have understood whatever happens in the neural network like the calculations everything. But the only input layer I wasn’t able understand about.
It is as I explained in my first post: every input or “sample” is a vector. How many elements it has just depends on what kind of data you are using. In the Logistic Regression exercise, they are 64 x 64 x 3 RGB images, so every input vector x has 12288 elements. Prof Ng uses the notation x when he means one sample vector. If he is talking about elements of x, he will use subscripts like this: x_i is the i-th element of the vector. Note that when he’s writing slides, he uses the standard mathematical convention of numbering the elements 1 through n_x. When you’re writing python code, you need to remember that python indexing is “0-based” so the first element of a vector x would be x[0].
When he is talking about multiple x vectors, he uses a “superscript” like this: x^{(j)} is the j-th sample vector. If he wants the 3rd element of the 4th sample vector that would be x_3^{(4)}. Then whenever he uses X (capitalized), it means he is talking about the input sample matrix in which each column is one sample vector x^{(j)}. That means that the dimensions of X are n_x x m, where m is the number of samples in the particular batch you are dealing with (training set or test set).
Part of your confusion in Week 3 is probably that they don’t spend too much time explaining the data. They probably thought it should be obvious that they are talking about points in the (x, y) plane. Note that the title of the exercise is Planar Data. So the particular datasets we are dealing with here in the Week 3 assignment have input vectors which are just the coordinates of a point, meaning that they have two components: x_1 and x_2. Then the y “label” values are either 0 for red or 1 for blue.
Note that the explanations Prof Ng is giving in the lecture are general in the sense that he’s not assuming that the data is the same as our actual “flower” dataset. He can give an example where the inputs have 3 elements instead of 2. The algorithms still work the same way. The point is that even though we know the particular datasets here in Week 3 have 2 element input vectors, we never ever use that assumption in the way we write the code. We write all the algorithms here to be vectorized and with no fixed dimensions for either n_x or m, so that they are completely general. You don’t want to have to rewrite the code every time you want to deal with differently sized data, right?