Continuing the discussion from Linear Regression Model implementation:
Hello again,
I implemented Multiple Linear Regression Model from scratch. And it worked fine. I compared my results with scikit-learn implementation. At first, there were large deviations for cost and weights but when I changed the learning rate, scikit-learn SGDRegressor worked fine. This time I was careful with the parameters of SGDRegressor so that it also implements batch gradient descent, following is the parameters I used(I used help of LLMs, its not my own parameter initialization) -
model=SGDRegressor(max_iter=3000,learning_rate=‘constant’,eta0=0.001,penalty=None,warm_start=True)
What the LLM suggested was to use partial.fit instead normal fit so that batch gradient descent is implemented in SGD regressor. Was this necessary?
What I observed that my implementation could converge perfectly for learning rate 0.01 but SGDRegressor gave large deviations in cost and weights. To be specific, the cost after my implementation was around 5 while for SGDRegressor it was around 10^22 so clearly, the weights blew up. So i reduced the learning rate for SGDRegressor alone, and it gave same result for learning rate 0.001 running for same number of iterations as my implementation but when I changed learning rate for my implementation to 0.001, my implementation blew up, for same number of iterations. Why this happened?
In short, my implementation converged for larger value of alpha but for SGDRgerssor it converged for smaller value. Also, increasing the number of iteration did not help SGDRegressor.
Note- The data set used was different as compared to Simple Linear Regression.
Hi @Saat
Using partial_fit
with SGDRegressor
is not necessary to implement batch gradient descent. The fit
method can run full-batch gradient descent if you pass the entire dataset at once and set shuffle=False
. partial_fit
is for incremental or online learning where data arrives in small chunks.
I think your code does true batch gradient descent (all data at once), which tends to be more stable and can tolerate larger learning rates. On the other hand, SGDRegressor
by default does stochastic updates with data shuffling. Also, increasing iterations won’t help if the learning rate is too large, as the weights will keep exploding.
Hope it helps! Feel free to ask if you need further assistance.
Thanks for the reply! It was really helpful. So what I can understand is that stochastic gradient descent updates the weight for each training example and batch gradient descent does it after iterating over all the training examples, right? Also is there any module in scikit-learn that implements batch gradient descent ?
Hi @Saat,
Yes, exactly! In stochastic gradient descent, weights are updated after each training example, while in batch gradient descent, updates happen after computing the gradient over the entire dataset.
As for scikit-learn, it doesn’t have a dedicated batch gradient descent module. However, models like LinearRegression
use closed-form solutions (like the Normal Equation), which are equivalent to the result of batch gradient descent at convergence. For iterative training with full-batch updates, you’d need to implement it manually or use libraries like TensorFlow or PyTorch, where you can fully control the training loop and gradient updates.
Hope it helps! Feel free to ask if you need further assistance.