I found the practice lab session of week 2 frankly, a bit overwhelming. Considering how the comments in code were pretty much designed for anyone to be able to execute, I understand that maybe I’m not supposed to exactly understand everything that’s going on but still I want a firmer grasp on what was done so I have a few set of questions and assertions. Please answer them/ correct them if possible.
Q1) If train_orig_x is a vector of dimensions (Nx.m) then for the model we built, Nx refers to the total number of pixel brightness values and m refers to the number of pixels right? So each column ( labelled say k1,k2,…km) would be one grid of the image? But an image has 3 layers so how exactly can I visualize the data set in my head? I’m looking at it as a table of column and rows with various values. Are those values real numbers or are they tuples of (px_height,px_width,3)?Q2) When we flatten x_orig into x_flatten we use the reshape function and set one dimension to -1 or “unknown” but we set the first parameter to x_orig[0] which means the reshaped array will have the same no. of rows. We set -1 to let numpy calculate the columns for us but if the no. of rows are the same then to prevent loss of data, the number of columns have to be the same as well, right? I understand that we flatten the image to better accommodate it in memory but could someone please explain the implementation?
Q3) Can someone give me some resource for practicing on more models/understanding all that implementation in depth? I don’t really like having a superficial knowledge of things.
Welcome to the community. It is great that you are sharing your questions with us. I will try to answer them as best I can. However, if it is still unclear, please let’s know.
A coloured image is made up of 3 component colours, red , green and blue, the primary colours; they are referred to as RGB channels. Each channel has values ranged from 0 to 255. So for a colour pixel, it would have 3 numbers, one from each channel. The attached diagram may help you visualise what a pixel looks like. So a coloured image of size 4 by 5, will be represented as the height x width x 3, which will be
number of pixel for height x number of pixel for width x 3 channels
and each pixel will have a value between 0 to 255
In this assignment, train_set_x_orig
is a numpy-array of shape (m_train, num_px, num_px, 3). Where each image has the same height and width, and m_train is the number of training examples in the training set.
So a flattened image will be of shape (num_px ∗ num_px ∗ 3, 1). 1 being a single image.
If you read the instruction hint: A trick when you want to flatten a matrix X of shape (a,b,c,d) to a matrix X_flatten of shape (b ∗ c ∗ d, a) is to use:
X.reshape(X.shape[0], -1).T
Here is a detailed explanation of what how the reshape function works.
It is good to read other resources and practise, but it is very important that you have the basics to build a good foundation first. All the mentors are here to support your learning journey, so if there is anything that worries you, please do not hesitiate to post your query here.
Hi, Anurag Ravi Nimonkar.
Welcome!
The explanation provided by Kic is impressive. Well, I would like to present some additional explanation as well. What happens is, the image is made up of small square boxes, known as pixels and each of the pixels are made up of a combination of different colors. Machines have certain limitations, and thus, cannot differentiate any image to its edges and colors, the way humans do. So, what it does is, stores images in the form of numbers. For instance, if we have three-to-four colors within an image then a machine will catch each of the colors as per numerical its numbers: say, 1 for red, 2 for black, 3 for grey etc. The colored images are stored in the form of 3-D matrix structure, whereas for images in grayscale format, 1-D matrices are counted.
In general, we are in a practice of using primary colors-Red, Green and Blue (RGB) in codifying-decodifying the colorful images. The light intensity of each pixel in the field of computer vision is measured in the range between 0 - 255. This is the pixel value. Thus, three RGB colors will produce three matrices of the same image in the form of channels. Here, the 0th index will have reddish color in dominance, 1st index will have greenish color, 2nd index will have Bluish color ,and so on. So, a tuple value for the shape of any image will always contain the height, the width and the number of channels, as per se, for example -(num_px, num_px, 3). By multiplying the all these three values, we can attain the total number of values that exists inside the array of an image.
Now, in later stages, we apply mean pixel value of a channel to convert a 3-D-based matrix into a 1-D matrix form. Here, what happens is, the pixel values of all the three channels,separately, get a single mean value for all the pixels of all the three channels.
Thus, we reduce the dimensionality of an image from 3-D form to 1-d form.
Now, till here, the machines have understood the color-based representation of an image, but still have certain work left, as in, they need to check the edges now. So, how could they do that?
They go from value to value i.e from top to bottom of each of the pixel values of the image that have been generated in the form of 1-D matrix.
Suppose, there are five different values present in a horizontal row: 145, 78, 85, 89, 65. So, to identify the edges, it will select a number and would find the difference between the values present on either sides of that selected pixel value. If the difference comes large, then there’s a significant transition around this pixelated value and there could be a presence of some edge in its vicinity, otherwise it will show the absence of any edge around.
The later part of your question includes a (-1) as an integer to set a new dimension. This is nothing but we are passing a negative integer to represent the other dimension. This will reshape the remaining dimensions into whatever shape we need to make by holding all the original data we have in-hand.
To your last question, Kic has already given you a significant tip that all the mentors are here to clarify your queries anytime that is related to this course.
Thankyou so much! @Kic I was confusing the raw data set with the flattened version from the get go and expected both to be 2 dimensional and also helped me understand what values comprise a pixel, thankyou on the clarification! It helps highlight the importance of data preprocessing more to me. @Rashmi thankyou so much for the visualization of what an image is. I now understand to look at it from a basic 3D matrix perspective, that eases things up for me!