Key Attributes of a Successful Data Scientist

Hello everyone!
I am a newbie and have recently started learning ML. I was curious to know what attributes a successful data scientist possesses. (I am asking because I was recently trying a Kaggle challenge, implementing the ML model (Regression) was quite easy, however, a lot of time was spent on data preparation and only a few minutes were spent running the necessary model. Is it like that with everyone or only am I focusing too much on the data preparation part?)

1 Like

Yes, data preparation is one of the key parts of the entire process. If you don’t have a good dataset, it may lead you to bad results like bad predictions. In this case, you will need to go back to data analysis and improve your dataset. In your case, as you had good results, it means that you are in the correct way how you managed your dataset.
Keep learning!

1 Like

Key attributes of a data scientist:

  • curiosity,
  • mathematical aptitude,
  • coding competency,
  • creativity.

Hi @carlosrl ,
Appreciate the support :D, thank you for clearing.

Gotcha! Thanks a ton @Rorisang , appreciate it.

1 Like

@carlosrl @Rorisang I was curious to know how the career paths differ for a Data Scientist as opposed to an ML Developer. Which is the better option in today’s world?(I know it’s subjective, but would like to know your povs)

Hi, @mamba824
The tools for the two roles are mainly the same. You definitely will need Python coding. Master Python to a point where you are able to ingest, transform and use the data to get insights. This will be the data science part. This then extends/feeds the ML (that is part of the ‘use the data to get insights’). As an ML developer you will be training ML models to automate tasks such as image classification, forecasting, etc. You can go the ML developer route exclusively but that will leave you with big blind spots. You need to be able to curate your own data so that it fits the purposes of building the ML apps. It is very rare in the real world to get data that is fit for purpose and ready to be used to train the ML models. I would suggest start with data science and the ML part will be ‘automatically’ included there to some extent. You will build ML apps under data science. The advantage will be that you will have that expertise of ETL’ing your own data and judging whether it will meet your model needs. You don’t want your ML products to be held back because you cannot source, clean and prepare your own data.

Like the say ‘garbage in, garbage out’. Data wrangling is the foundation as @carlosrl is pointing out.

Hi @Rorisang ,
This is the best answer I’ve gotten till now, this gives me more clarity into how to go about things. You have really helped me, appreciate the help.
Thank you so much

A pleasure, @mamba824 .

I’m not an expert in ML but I builded few projects @ my university and I’ll say the things that nobody said to me when I was working,
So if you just count the number of models or methods just for classification task it may be well above 50+, so given ML hat reached a point where you have plenty of things in hand and you want the best out of it…

  1. Spend time finding out the best pipeline for the task in hand with the help of research papers, these papers could also be old based on the task you’re working on.
  2. Now that you got a general approach i.e the pipeline based on your research in step 1, choose the best practices again by research analysis for each of the component in your pipeline, for example if you’re doing feature selection and the model you want use after step 1 is SVM, try doing research analysis like “optimal feature selection for SVM” kind off research papers.
  3. Dive deep into the selected approach and try to customize them to optimize further for your application in hand…
2 Likes

Thank you so much @vishal_sivakumar for such an elaborate response. This is something that I will try out, looks like a fine approach!
Appreciate it.