Need advice in Exploratory Data Analysis

Karthik14 · March 29, 2025, 11:12am

Hi everyone,

I recently completed the first two courses in the Machine Learning Specialization on Coursera and have been trying to apply what I’ve learned by participating in beginner-level competitions on Kaggle and exploring some DrivenData challenges.

However, I noticed that I’m struggling when it comes to effectively cleaning the data and drawing meaningful insights. Specifically, I find it difficult to decide which features to include or exclude based on their relevance to the target variable — something I’ve seen other participants do quite well.

I’m wondering if it would be beneficial to first take a course on Exploratory Data Analysis (EDA) or a similar topic to strengthen my understanding. If so, I’d be grateful if you could recommend any good resources.

On the other hand, it’s also possible that I may have chosen datasets that aren’t very beginner-friendly. If that’s the case, I’d really appreciate any suggestions for simpler datasets that are well-suited for someone at my level.

Thank you so much in advance for your help!

TMosh · March 29, 2025, 11:40am

Unless you have a very good reason not to, start by including all the data.

pastorsoto · March 29, 2025, 12:56pm

Hi @Karthik14, this is a great question, and I relate to your experience. Kaggle is an intimidating place, and many times, I didn’t understand the code or the solutions. Keep in mind that this code (at least the top solutions) is written by the top percent of data scientists and machine learning engineers, so it might be difficult to understand.

What I did is try to learn more Python, many times lack of understanding of Python leads to a poor understanding of the code and solution, also I concluded that having a broad understanding of every field of data is what makes the difference, for instance, a medical student learns from Obstetrics to traumatology, it doesn’t matter what you want to master you need to learn the broad aspects of medicine, the same applies to this, understand data analytics, data engineer, data science and machine learning engineer is what would make you stand out no matter the specialized path you choose.

Currently, there is a specialization going on for all the fields I just mentioned. I haven’t taken the data analytics specialization, so I cannot say.

I took this career path on Dataquest data analytics and it was really useful for the incremental learning approach, and the practical exercise.

If you have doubts feel free to reach out!

Prashant_Upadhyaya · March 29, 2025, 6:15pm

@Karthik14 Since you’re looking to improve your Exploratory Data Analysis (EDA) and feature selection skills, here are some great courses to check out:

Data Analysis with Python (IBM)

Covers EDA, data wrangling, and visualization with Pandas, Matplotlib, and Seaborn.
Hands-on projects with real datasets.

Feature Engineering for Machine Learning (Google Cloud)

Focuses on selecting, creating, and transforming features for better model performance.
Uses Python, TensorFlow, and Google Cloud tools.

Try beginner-friendly datasets from

UCI Machine Learning Repository – Many simple datasets with clear feature-target relationships.

Hope this help you.

Prashant_Upadhyaya · March 29, 2025, 6:20pm

And also I recommend the every beginner to start with Iris dataset from UCI Machine Learning Repository and also try ** PyCaret

Karthik14 · March 30, 2025, 5:37am

Thank you so much for your thoughtful advice!

I really appreciate your suggestions and will definitely work on gaining knowledge in the areas you’ve highlighted. Also, thanks for recommending Dataquest. it looks like a fantastic resource, and I’m excited to give it a try.

Karthik14 · March 30, 2025, 5:41am

Thank you so much for sharing these resources!

They align perfectly with what I’m looking for, and I’m excited to check them out. I really appreciate it.

Zain_Abbas1 · May 4, 2025, 11:24am

I am trapped in a similar problem. I completed the ML specialization and then also completed the first course of DL specialization. They are all wonderful but when I have started to practice all my learning I feel a bit lost on how to deal with this raw data I got from Kaggle. By the way, I have picked up the most beginner level dataset from Kaggle just for learning.

Can you share any updates on what approach you have taken and where you are at the moment?
Thanks

Karthik14 · May 4, 2025, 4:53pm

Thanks for sharing your experience—it’s reassuring to know that I’m not the only one navigating this learning curve!

I’ve been practicing on the datasets in UCI Machine Learning Repository , it has many simple datasets with proper filtering options to narrow down your dataset search. Apart from that, I’ve explored the contents of Data Analysis with Python (IBM) course suggested by @Prashanth_Upadhyaya, it seems promising. I’ve decided to take this course in June hoping that it will help me learn some data preprocessing techniques.

Furthermore, I’ve used the “Deep Research” feature in ChatGPT to come up with a proper comprehensive roadmap that I’am planning to refer for my practice.
The link to the chat → ChatGPT - ML Datasets Roadmap Guide

I’d love to hear more about how things go on your end if you try out any of the resources I mentioned. Please feel free to share your findings—maybe we can help each other fill in the gaps!

Karthik14 · May 12, 2025, 9:42am

By the way, I would really appreciate it if anyone can give me feedback on this dataset roadmap that I generated, in terms of how relevant it is and does it require any modifications.

Thank you

Topic		Replies	Views
What to follow? AI Discussions	2	147	March 26, 2023
What should i do? Probability & Statistics for Machine Learning &... week-module-2	4	48	February 21, 2025
My AI/ML journey AI Discussions careers	4	277	July 1, 2024
How to proceed after course? Supervised ML: Regression and Classification	4	363	December 30, 2023
Problem in ML AI Discussions ai-discussions , project	2	65	December 24, 2024

Need advice in Exploratory Data Analysis

Try beginner-friendly datasets from

Related topics