Features or Training Set data?


I’m a little confused about distinguishing between training set data and features. In one video, it is mentioned that the (X1 X2 … Xm) are training data (where m is the number of training data); however, in other one, it is mentioned that (X1 X2 … Xn) are features (where n is the number of input features). Please rescue me from this confusion:

Hi @simamh,

You already figured it out :slight_smile:

A single training example has n features, denoted x with dimension n x 1 (column vector).
A training set contains m training examples and is vectorized by a matrix X of dimension n x m, where each column is a training example.

1 Like

Thanks a lot :innocent:

So in this code the n_x is the number of features, right? , and where we identify the number of training set data? I can not find the related code line to the number of training data.

Yes, n_x is the number of features. You use all the m training examples in gradient descent to calculate good estimates of the parameters W1, b1, W2, and b2. The weight matrices and bias terms are the same for every training example. These parameters make up the model that you later use for prediction on unseen data.

As to where m is in the code, check out how you solved Exercise 1 in the same notebook.

Now that you know how it works, please remove the picture since it contains solution code :slight_smile:

1 Like

I get it.

Immensely grateful for your clarification.

I removed the picture :raised_back_of_hand:

1 Like

Hi, since this topic is based on features, I have to admit sometimes the word confuses me. The way I get features is “characteristics”, like a tv features would be: shape, size, does it support bluetooth, and so on… but when it is said that a sentence is represented by its features: I drink coffee is represented by features x1, x2 and x3 I have to admit I don’t get it clearly.

Hi @cirediallo,

Indeed. Sometimes getting used to the vocabularies takes some time when learning.

Characteristics” is a very a good term you have used: we characterize a data sample with features x1, x2, and x3. What do you say with this interpretation?

Features can also be interpretted as “useful stuff” or “useful info.”. For example, “the new Android OS comes with exciting features including …” We can also say we have engineered a new feature x4 based on x1, x2, and x3, and x4 is going to be a useful stuff that also characterizes our samples.

It’s the language we use, and let’s start getting used to it from today!


Thanks for your response @rmwkwok. The way I understand this

is as follow in this picture. Tell me if I’m wrong.

for example how features of this is going to be represented.

if you way explain this to me too. The way I understand these definitions is the way new characteristics can appears depending on the method choosen to be applied.

OK. Let’s look at a more concrete example.

Now let’s say we are an estate agency, and we want to predict the selling price of our houses. Obviously, we want to collect some data about our houses. Can you imagine some characteristics of a house that can help model the selling price? What are the deciding factors of the price? Can you list 2 to 3 such characters/factors ?


That’s like the same example I gave for the TV(shape, size, …) for a house it can be its size too and number of bedrooms and the place, … that could be a house characteristics. but where is the relation with features engineering ? is the explanation gave above about it not correct ?

this one.

The problem with this sentence is that it is too vague. What do you mean by “the method”, and “applied to what”.

Now, let’s look at our example again:

I want to make 2 points here:

  1. size and number of bedrooms are already called “features” in the language of Machine Learning.
  2. On top of these two features, we can engineer a new one such as \frac{\text{size}}{\text{number of bedrooms}}. I am not saying that this new feature must be good. But just for the sake of engineering a new feature, we can say this ratio can tell us about how spacious is the living area, because if there are too many rooms, then obviously this ratio will be small and the living room might not be too spacious.

Would you like to suggest a few more features, and try to engineer something out? We need to try it in order to feel it.


we can try to know if there is a garden or not by substracting the size of the house of the overall area of the land

Yes, that sounds like a good one!

The idea is that, given a dataset of certain features (size, number of bedrooms, overall area of the land), based on domain knowledge (as a property agency who knows the market very well), we engineer some additional features (e.g. garden) that is likely to be predictive for (in our example) house pricing.