I’m having a hard time understanding, or having an intuition about the so-called “input features”. How should I think about them in a real-life deep learning example?
When I analyze the wording I first think of “input” as something that I have to provide to the Deep Learning model. Something I have to specify and tell the source code what these are. I’m not sure if that is the correct way of thinking of it?
Second, when I think about “features” I can think about it as e.g. the features of my car: It has the max. speed x mph, it has 4 wheels, it has an electrical engine, it has a radio, … and so on.
If I put these two thoughts together in my brain, my understanding of what “input features” should be is that I somehow tell the deep learning algorithms e.g, the features of a cat image. So the cat always has 2 eyes, one mouth, a nose with two holes, two ears, …
But obviously, I’m not explicitly telling all these details to the deep learning algorithm. So what should be my understanding of this term “input features” in a real example? What does it stand for? I’m confused here and would not be able to explain it to someone else.
PS: I have to admit, I have not yet done the programming assignment of week 4 yet. If that will help my understanding, and I should do that first before I read your explanation regarding this topic, then please let me know and I will do so. But other than that I’d be very happy to know your thoughts of what I should have in mind (or what I should visualize) in my mind when thinking of “input features”.
BTW: There is a second question bothering me, but I guess it will be part of a future course? For binary classification, how many positive and how many negative training examples are needed to get good results? How much does it depend on the classification task itself? Any rule of thumb for this?
The input dataset for Week 4 is exactly the same as the input dataset for the Logistic Regression exercise in Week 2, so I don’t think it is necessary for you to complete Week 4 in order to address your question.
“Features” in the context of ML/DL just mean whatever the data is that you feed directly to your “model” (algorithm). Here our goal is to produce a model which can take as input a 64 x 64 RGB image and tell us whether that image contains a cat or not. So the inputs are the pixel values of the image, which will be 64 * 64 * 3 = 12288 unsigned 8 bit integers representing the RGB color values of each of the pixels. If you look at the early parts of the notebook for Logistic Regression in Week 2, you’ll see that we do some preprocessing of those images to get them into the form we need. It turns out that this type of Neural Network needs the inputs to be structured as vectors, but a set of RGB images is given to us as a numpy array with 4 dimensions. We first have to “flatten” or unroll those 64 x 64 x 3 arrays into vectors. Here’s a thread which discusses the details of that. Then we also “normalize” or “standardize” the pixel values by dividing them by 255, so that we end up with floating point numbers between 0 and 1, rather than unsigned integers between 0 and 255. It’s a separate topic, but that turns out to be helpful to make our iterative approximation method for finding the solution work well.
So we do not tell the model anything about higher level attributes of cats (e.g. the shapes of their ears or tails): we simply give it the pictures and the “labels” for our training set. The labels give it the real answers for the training images: this image contains a cat and this one does not. Here’s where the real magic happens: just based on the raw images and the labeled answers, the training is able to “learn” a model which can accomplish that recognition. How it does that will be discussed at a number of different levels by Prof Ng as we go through these courses. If you want a preview of perhaps the most interesting explanation of that, here’s a video of a lecture that Prof Ng gives in Course 4 of this series that delves into what is happening in the internal layers of a Convolutional Neural Network.
Of course the example we are working on here of trying to learn how to recognize objects in photographic images is just one type of problem you can try to solve using DL models. You can also apply these techniques to other types of predictions or recognition problems. E.g. you can try to predict the weather or a stock price or the price of a house based on various input data. Obviously the input data will be different in all those cases. In the case of predicting a house price, you might have the number of bedrooms, bathrooms, square footage, distance from schools or shopping, previous price data from other houses in the area … Those would be the “features” for that type of problem.
As to your final question about how much data you need in order to train an effective model, there is no one answer. “It all depends.” I guess if there is a rule of thumb, it would be “lots”. As in, you need lots of data. What that means all depends on your particular case. Prof Ng will discuss this and say much more useful things than I just did in the first week of Course 2 and also in Course 3 of this series, so please stay tuned for that.
Many thanks for the explanations. It makes things more clear. And I bet, once I have more practical experience it will become a lot more familliar to me.