Key Attributes of a Successful Data Scientist

mamba824 · August 28, 2024, 2:52pm

Hello everyone!
I am a newbie and have recently started learning ML. I was curious to know what attributes a successful data scientist possesses. (I am asking because I was recently trying a Kaggle challenge, implementing the ML model (Regression) was quite easy, however, a lot of time was spent on data preparation and only a few minutes were spent running the necessary model. Is it like that with everyone or only am I focusing too much on the data preparation part?)

carlosrl · August 28, 2024, 9:26pm

Yes, data preparation is one of the key parts of the entire process. If you don’t have a good dataset, it may lead you to bad results like bad predictions. In this case, you will need to go back to data analysis and improve your dataset. In your case, as you had good results, it means that you are in the correct way how you managed your dataset.
Keep learning!

Rorisang · August 30, 2024, 6:40am

Key attributes of a data scientist:

curiosity,
mathematical aptitude,
coding competency,
creativity.

mamba824 · August 31, 2024, 9:21am

Hi @carlosrl ,
Appreciate the support :D, thank you for clearing.

mamba824 · August 31, 2024, 9:21am

Gotcha! Thanks a ton @Rorisang , appreciate it.

mamba824 · August 31, 2024, 9:25am

@carlosrl @Rorisang I was curious to know how the career paths differ for a Data Scientist as opposed to an ML Developer. Which is the better option in today’s world?(I know it’s subjective, but would like to know your povs)

Rorisang · August 31, 2024, 9:54am

Hi, @mamba824
The tools for the two roles are mainly the same. You definitely will need Python coding. Master Python to a point where you are able to ingest, transform and use the data to get insights. This will be the data science part. This then extends/feeds the ML (that is part of the ‘use the data to get insights’). As an ML developer you will be training ML models to automate tasks such as image classification, forecasting, etc. You can go the ML developer route exclusively but that will leave you with big blind spots. You need to be able to curate your own data so that it fits the purposes of building the ML apps. It is very rare in the real world to get data that is fit for purpose and ready to be used to train the ML models. I would suggest start with data science and the ML part will be ‘automatically’ included there to some extent. You will build ML apps under data science. The advantage will be that you will have that expertise of ETL’ing your own data and judging whether it will meet your model needs. You don’t want your ML products to be held back because you cannot source, clean and prepare your own data.

Rorisang · August 31, 2024, 9:55am

Like the say ‘garbage in, garbage out’. Data wrangling is the foundation as @carlosrl is pointing out.

mamba824 · September 1, 2024, 1:05pm

Hi @Rorisang ,
This is the best answer I’ve gotten till now, this gives me more clarity into how to go about things. You have really helped me, appreciate the help.
Thank you so much

Rorisang · September 1, 2024, 5:55pm

A pleasure, @mamba824 .

vishal_sivakumar · September 4, 2024, 3:44am

I’m not an expert in ML but I builded few projects @ my university and I’ll say the things that nobody said to me when I was working,
So if you just count the number of models or methods just for classification task it may be well above 50+, so given ML hat reached a point where you have plenty of things in hand and you want the best out of it…

Spend time finding out the best pipeline for the task in hand with the help of research papers, these papers could also be old based on the task you’re working on.
Now that you got a general approach i.e the pipeline based on your research in step 1, choose the best practices again by research analysis for each of the component in your pipeline, for example if you’re doing feature selection and the model you want use after step 1 is SVM, try doing research analysis like “optimal feature selection for SVM” kind off research papers.
Dive deep into the selected approach and try to customize them to optimize further for your application in hand…

mamba824 · September 9, 2024, 3:52pm

Thank you so much @vishal_sivakumar for such an elaborate response. This is something that I will try out, looks like a fine approach!
Appreciate it.

Topic		Replies	Views
Data Science/Quant Career AI Discussions careers , introductions	5	377	March 5, 2024
Overwhelmed about the skills to develop to become an entry level machine learning engineer or data scientist! AI Discussions ai-discussions , careers	4	316	November 4, 2021
Preparing my own dataset AI Discussions ai-discussions	4	212	June 6, 2024
Guidance Needed to Transition into Machine Learning Integration Leveraging Extensive Software Development Experience AI Discussions careers	4	35	September 21, 2024
Course 1 finished - how can I improve? Supervised ML: Regression and Classification week-2	7	633	July 15, 2022

Key Attributes of a Successful Data Scientist

Related topics