Need advice in Exploratory Data Analysis

Hi everyone,

I recently completed the first two courses in the Machine Learning Specialization on Coursera and have been trying to apply what I’ve learned by participating in beginner-level competitions on Kaggle and exploring some DrivenData challenges.

However, I noticed that I’m struggling when it comes to effectively cleaning the data and drawing meaningful insights. Specifically, I find it difficult to decide which features to include or exclude based on their relevance to the target variable — something I’ve seen other participants do quite well.

I’m wondering if it would be beneficial to first take a course on Exploratory Data Analysis (EDA) or a similar topic to strengthen my understanding. If so, I’d be grateful if you could recommend any good resources.

On the other hand, it’s also possible that I may have chosen datasets that aren’t very beginner-friendly. If that’s the case, I’d really appreciate any suggestions for simpler datasets that are well-suited for someone at my level.

Thank you so much in advance for your help!

1 Like

Unless you have a very good reason not to, start by including all the data.

2 Likes

Hi @Karthik14, this is a great question, and I relate to your experience. Kaggle is an intimidating place, and many times, I didn’t understand the code or the solutions. Keep in mind that this code (at least the top solutions) is written by the top percent of data scientists and machine learning engineers, so it might be difficult to understand.

What I did is try to learn more Python, many times lack of understanding of Python leads to a poor understanding of the code and solution, also I concluded that having a broad understanding of every field of data is what makes the difference, for instance, a medical student learns from Obstetrics to traumatology, it doesn’t matter what you want to master you need to learn the broad aspects of medicine, the same applies to this, understand data analytics, data engineer, data science and machine learning engineer is what would make you stand out no matter the specialized path you choose.

Currently, there is a specialization going on for all the fields I just mentioned. I haven’t taken the data analytics specialization, so I cannot say.

I took this career path on Dataquest data analytics and it was really useful for the incremental learning approach, and the practical exercise.

If you have doubts feel free to reach out!

3 Likes

@Karthik14 Since you’re looking to improve your Exploratory Data Analysis (EDA) and feature selection skills, here are some great courses to check out:

Data Analysis with Python (IBM)

  • Covers EDA, data wrangling, and visualization with Pandas, Matplotlib, and Seaborn.
  • Hands-on projects with real datasets.

Feature Engineering for Machine Learning (Google Cloud)

  • Focuses on selecting, creating, and transforming features for better model performance.
  • Uses Python, TensorFlow, and Google Cloud tools.

Try beginner-friendly datasets from

UCI Machine Learning Repository – Many simple datasets with clear feature-target relationships.

Hope this help you. :blush:

2 Likes

And also I recommend the every beginner to start with Iris dataset from UCI Machine Learning Repository and also try ** PyCaret

2 Likes

Thank you so much for your thoughtful advice!

I really appreciate your suggestions and will definitely work on gaining knowledge in the areas you’ve highlighted. Also, thanks for recommending Dataquest. it looks like a fantastic resource, and I’m excited to give it a try.

2 Likes

Thank you so much for sharing these resources!

They align perfectly with what I’m looking for, and I’m excited to check them out. I really appreciate it. :blush:

1 Like