The Prices for Houses in Mumbai vary within the distance of 4-5 Km. So should I apply clustering and form clusters and then apply regression to get a better prediction score.
Its sounds a logical flow.
Thank you @gent.spah.
Although the choice of ML algorithm should be looked into carefully, maybe regression is not the right one, check out others also like trees for eg.
@gent.spah Thank you so much for the suggestion
Ok, I should apply Regression Trees for this project. Can you explain in brief how can i use it.
Initial reaction: can you use a neighborhood as a categorical feature? Seems like things like neighborhood and school district would have premium or discount built into historical data, so you just need to include that feature, along with number of rooms, lot size, age etc into the regression on price. What am I overlooking?
Hi @ai_curious, my idea was that it sounds logical to create clusters first and apply a decision tree for each cluster because elements in each cluster should have similar distribution and it could be easier model each of them than vs. the entire city lets say. But yeah one could just apply a tree (random, forest etc…) directly without clusterring. Or some other appropriate ML algorithm but i think that trees do well on these kind of data.
Its just my opinion really.
Considering the variation in house prices across Mumbai’s neighborhoods, applying mixed models sounds to be a very good approach. These models factor in both consistent attributes (fixed effects) and neighborhood-specific influences (random effects), yielding a comprehensive understanding of how factors and location intertwine to impact prices. This strategy not only enhances predictive accuracy but also provides valuable insights into pricing dynamics within diverse areas.
Best regards
elirod