Hands-on ML book from Aurelien Geron

lic.lvp · May 6, 2024, 6:42am

I have started studying the book “Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow” from Aurelien Geron. Following questions are relative to 2nd edition

I understand the goal of this chapter is to predict the commercial value of some houses in California state. This prediction is performed using information from different districts (number of rooms, bedrooms, population, etc). Am I right about this?

Now some specific questions, (page numbers are from the 2nd edition of the boo).

Pag 65: the three new categories created are simply an exercise, they are never used later on nor added to the data set. Right or false?

What are the hyperparameter? What is the score (mentioned in section “Select and train model”). It is also mentioned the model is overfitting the data but i dont understand why.

Pag 72: the code from this page is also never used?

Thanks for your help

TMosh · May 6, 2024, 4:50pm

Sorry, I have not read the book.

Ahmed112 · May 17, 2024, 2:15am

Okay
First yes at end-to-end chapter 2 it turn around predict median house value from feature like number of bed room area and income and another feature
Second give more details about which category you speak about
Third hyperparameter consider the specific values at certain model that can I make tuning to get better model performance
Finally score is accuracy measure using to determine whether my mode perform well or no
Maybe say overfitting due to no error at training??

lic.lvp · May 17, 2024, 4:42am

Thank you very much for your answers. Just a last question: do you know where can I find the definition of score or where can I read more about it? Im curious about how it is computed.

Nevermnd · May 17, 2024, 6:23am

@lic.lvp So I have the third edition of this book and I think the page numbers are a little different, but I tried to have a look for you.

So the most common score he uses is RMSE which he defines in his functions as:

scoring="neg_root_mean_squared_error"

The definition (formula) of RMSE is as follows:

where:
y_i is your actual value
\hat{y}_i is your predicted value
N is size of your sample
P is the number of your parameter estimates (num vars you are predicting on)

N - P here is what is known as ‘degrees of freedom’.

If you are calculating the error measure on an entire population, just replace N-P with just N.

As to hyperparameters, what those are will entirely depend on what model you are using. But in this case, if you go to the ‘Fine-tune your model’ section, where he is using a RandomForestRegressor you will see the hyperparameters in this case are ‘n_clusters’ and ‘max_features’.

Again, as I said maybe our pg 72 is different ? But if you mean one-hot-encoding, he does use that, with relation to changing the measure of ocean proximity, etc. After converting these values to one-hot he concatenates the results and folds them back into the dataset.

Hope this helps a little.

Ahmed112 · May 17, 2024, 12:04pm

Actually I don’t have an source but from my understanding for this point accuracy is opposite of MSE.mean that if a model make 10% error the accuracy 90% as mentioned at book we aim to make higher score and make error low

Nevermnd · May 17, 2024, 12:28pm

@lic.lvp

Topic		Replies	Views
Regarding first assignment of the week 1/ house prediction model Introduction to TF for Artificial Intelligence ... week-1	10	577	December 12, 2022
House price prediction is completely off Introduction to TF for Artificial Intelligence ... week-1	7	651	December 26, 2021
C3 what about Regression problems? Structuring Machine Learning Projects week-1 , week-2	4	270	January 19, 2024
MLS course 2 week1 - layers and units in a layer Advanced Learning Algorithms week-1	4	537	November 13, 2022
Exercise 1 housing price Introduction to TF for Artificial Intelligence ... week-1	2	670	November 16, 2021

Hands-on ML book from Aurelien Geron

Related topics