Scikit-learn vs tensorflow

Hi everybody,
now that I have completed the first course in the machine learning specialization and halfway through the second one, I am impatient to have some hands on experience on machine learning by attempting to solve problems in some of the training data sets in kaggle.

By now, I know that pandas and numpy are must have libraries to learn but I am a little undecided whehter to use tensorflow or scikit-learn for modelling purposes.

I have gone through discussions on stackoverflow and other blogs about the differences between the two and everybody seems to suggest that begineers should start by the easier to implement library-Scikit-Learn rather than Tensorflow. Actually, course 1 also used Scikit-learn for implementing lineer regression and logistic regression models. Then in course 2, Tensorflow is used as it introduces the artificial neural networks architecture.

So considering that I may also study or work problems best adressed with deep learning, should I skip learning scikit-Learn and use tensorflow for simple ML models ? or should I first learn the easy to implement Scikit-learn library?

I feel I should rather stick with Tensorflow from the very start yet I wanted to have the opinion of the community before moving on to the next step in my data science journey.
Thank you in advance for your response.

Mehmet Deniz

3 Likes

Hi there,

congrats for completing the course. Well done!

Regarding your next steps, I would suggest to consider, what challenges are you going to solve in the future and then work towards the method and tools to solve them:

  • Would you use rather classic ML models to tackle your challenges with classic data science, incl. feature engineering to incorporate your domain knowledge: Then Scikit-learn is great to explore and practice these methods.
  • On the other hand. If you know that you will work with highly unstructured and high dimensional data in e.g. computer vision or large language models and you know that Deep Learning seems to be the right tool to solve your challenges, then Tensorflow of PyTorch can be the next step to focus on.

Here is a thread that might be of interest with respect to classic ML and DL: Do traditional algorithms perform better than CNN? - #2 by Christian_Simonis

Long story short: I would suggest to anticipate which method would be of higher benefit for you based on your future challenges and the industry you work in or the field that you want to join. Feel free then to derive the library question as the next step.

Hope that helps!

Best regards
Christian

2 Likes

Hello @mehmet_baki_deniz,

I think Sklearn doesn’t have much to overlap with Tensorflow in the landscape of Machine Learning.

Sklearn comes with many handy utilities in data preprocessing, in model selection, and algorithms (such as clustering) that Tensorflow doesn’t have. Tensorflow has a maths library, and is for neural network learning, and ofcourse as a framework it also comes with some data preprocessing tools for completeness. Let me get XGBoost (or LightGBM) into this discussion because they are proved to work well in many situations, and sometimes it is easier to build a batter model with them than with a neural network.

Which one would I suggest you to learn first? Sklearn? Tensorflow? Or XGBoost? I would go for Sklearn and XGBoost first unless I am to deal with image or language data as Christian explained. I don’t necessarily want to use the modeling algorithm offered by Sklearn, but their data preprocessing and model selection tools. WIth those tools, I get my dataset ready for feeding into a XGBoost model. Also, I might use clustering algorithm or dimension reduction tools in Sklearn to help me visualize and understand the data. Since you also mentioned Pandas, and I use Pandas heavily in the data inspection stage. I would use whatever capability needed in my work from those packages.

After I build a XGBoost model (and perhaps a some baseline models like a linear regression for comparison), then I might or might not go for Tensorflow to build a neural network for more comparison. My ordering is therefore, Sklearn, XGBoost, and then Tensorflow. However, I don’t learn Sklearn from 0 to 100%. I just take whatever I need. Also, I would skip XGBoost if, as I said, I image or language data was all in my mind.

Happy new year, and cheers,
Raymond

2 Likes

many thanks to both of you for your responses

2 Likes

Another dimension that wasn’t emphasized in the excellent responses above: scale. TensorFlow was created with extremely large data sets in mind and has native support both for datasets too large to fit in memory and widely parallel training deployment topologies. I’m no expert on it, but my understanding is that neither XGBoost nor scikit offer the same level of support. If you can envision that being important, it might influence your choice. At least worthy of deeper dive to rule out. HTH

1 Like

Very good point, made by @ai_curious. Thanks!

In addition also with respect to Deeplearning frameworks, PyTorch is also designed to handle very large data sets and is also quite flexible due to its dynamical computational graph (which many researchers appreciate a lot) compared the „define-AND-run“ approach when it comes to the static computational graph definition in TF.

Best regards
Christian