Hands-on ML book from Aurelien Geron

I have started studying the book “Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow” from Aurelien Geron. Following questions are relative to 2nd edition

I understand the goal of this chapter is to predict the commercial value of some houses in California state. This prediction is performed using information from different districts (number of rooms, bedrooms, population, etc). Am I right about this?

Now some specific questions, (page numbers are from the 2nd edition of the boo).

Pag 65: the three new categories created are simply an exercise, they are never used later on nor added to the data set. Right or false?

What are the hyperparameter? What is the score (mentioned in section “Select and train model”). It is also mentioned the model is overfitting the data but i dont understand why.

Pag 72: the code from this page is also never used?

Thanks for your help

Sorry, I have not read the book.

Okay
First yes at end-to-end chapter 2 it turn around predict median house value from feature like number of bed room area and income and another feature
Second give more details about which category you speak about
Third hyperparameter consider the specific values at certain model that can I make tuning to get better model performance
Finally score is accuracy measure using to determine whether my mode perform well or no
Maybe say overfitting due to no error at training??

Thank you very much for your answers. Just a last question: do you know where can I find the definition of score or where can I read more about it? Im curious about how it is computed.

@lic.lvp So I have the third edition of this book and I think the page numbers are a little different, but I tried to have a look for you.

So the most common score he uses is RMSE which he defines in his functions as:

scoring="neg_root_mean_squared_error"

The definition (formula) of RMSE is as follows:

image

where:
y_i is your actual value
\hat{y}_i is your predicted value
N is size of your sample
P is the number of your parameter estimates (num vars you are predicting on)

N - P here is what is known as ‘degrees of freedom’.

If you are calculating the error measure on an entire population, just replace N-P with just N.

As to hyperparameters, what those are will entirely depend on what model you are using. But in this case, if you go to the ‘Fine-tune your model’ section, where he is using a RandomForestRegressor you will see the hyperparameters in this case are ‘n_clusters’ and ‘max_features’.

Again, as I said maybe our pg 72 is different ? But if you mean one-hot-encoding, he does use that, with relation to changing the measure of ocean proximity, etc. After converting these values to one-hot he concatenates the results and folds them back into the dataset.

Hope this helps a little.

Actually I don’t have an source but from my understanding for this point accuracy is opposite of MSE.mean that if a model make 10% error the accuracy 90% as mentioned at book we aim to make higher score and make error low

@lic.lvp

1 Like