I need some help. I’ve developed a model trained on a housing dataset using logistic regression. However, the accuracy I’m achieving is only 4.96%. Can you help me figure out why it’s not doing better?

Having categorical features does not require you use logistic regression. Category features are usually converted to one-hot true (1) / false (0) values.

The key between linear and logistic regression is what is being predicted.

If the output is a real value, then it’s linear regression.

If the output is true/false or a classification, then it’s logistic regression.

In that @TMosh picture, I separated the feature variables by removing the target variable from the dataset. I assigned these features to the ‘x’ variable. On the other hand, I assigned the target variable ‘median_house_value’ to the ‘y’ variable.

Can I know how the housing prices changes with median house value variable?

Also by stating removing the target variable from dataset, you extracting only the particular categorical value from dataset??
If for the above question answer is yes then you do not need to use df.drop, rather call it by df.head, select the defined column(make sure you have removed any null values). then check related of the defined column to the housing price.

I checked your notebook and dataset. can you explain what kind of model you are trying to create as no where you explained in the post what kind of correlation you are creating with your model.

You have used latitude (which is negative variable) and total rooms to get a median housing value (done incorrectly as I cannot see you creating any relation between these variables other than graph showing latitude would not be the right variable to get median housing value.

Next what @TMosh mentioned as your data seemed to be wanting to do logistic regression but you have created linear regression which creating all the issue.

So, kindly first brief us what kind of model are you trying to create based on what features or what you are trying to analyse?

In case you are creating a regression analysis between median housing value and total rooms then try to find what is relation between the two.

My suggestion would be to create relation between median housing age and total rooms to median housing value.(This suggestion is without knowing what you are basically looking for in your model.)

Looking at the data set, it appears to me that this data set depicts the median house price within georgraphic areas that are identified by a central latitude/longitude point.

The other columns create the X training features as shown below.

So the goal is to predict the median house price as a function of the location and ocean proximity.