Hello my fellow ambitious learners! Practice makes perfect!

Hi!

As a mentor, I constantly come across questions like, “Where can I practice the skills?” I agree that getting the certificate or checking off every course item won’t suddenly make us super confident in solving real world problems, but at the end of the day, don’t we want that confidence, don’t we? (Although I believe we should be confident because we need to be brave to take the first step - just like to ignite an engine, we need the first spark)

Additionally, because the courses are self-paced, some of you may complete them quickly while others may need more time to really take in the information. However, I believe that we genuinely require both - the time and the intensity - for practice rather than really for the classes. Why? Since it takes time for humans to develop that muscle memory through experience. In the same way that you don’t “need to recall” how to brush your teeth because it is ingrained in your muscle memory, you want to train your brain to do so when handling data problems.

We are aware that building it up requires time and extensive trial-and-error, so if you need some advice, this is the post to read.

  1. I outlined the four categories of issues as follows: Python / ML concept / Hands-on experience / Debugging skills. We need to know what we are not good at to improve it, so keep those in mind.

  2. If you are still working through the courses and find it very difficult to move on, you may want to make a study plan which is not the main idea of this post but you may check this out.

  3. Find a challenge, please. These Getting-Started problems and these Playground problems are available on Kaggle. You should spend 30 to 60 minutes looking at some of them and selecting 2-3 potential problems for yourself. I wouldn’t select a time-series problem, an image recognition problem, or a language challenge because they are not topics covered in the MLS. The best would be those with tabular datasets and non-time-series data.

  4. Make a plan for one to two hours. Make a list of what you’re going to do with the data by consulting the course tables of content, the labs, or your notes. For instance, loading the dataset, checking it out, visualizing something, and blah, blah, blah. You don’t need a precise plan to start with because this one should develop as you practice, but make an effort to include all you learned in these classes into your plan, including:
    a. C1 W2 feature scaling and engineering
    b. C2 W3 model evaluation, bias and variance
    c. For the other course material, you must choose what to include because they use different modeling techniques that won’t always be applicable to any given scenario.
    The plan can also be used for revisions and as a firm foundation for your future work. Despite the fact that you won’t need to rely on the plan in the future because it will have ingrained itself in your muscle memory.

  5. Start coding. You should set up your own Jupyter environment, in my opinion. To find instructions, google for something like “how to install Jupyter notebook in Windows 10”.

  6. Search on Google for fixes for bugs or possible methods of doing things. For instance, “How to import a csv file in Python using Pandas” or “How to visualize data in Python”; or, if you run into a problem, “NameError: name x is not defined”. I agree that searching for an error message is simpler than searching for solutions because the latter necessitates the proper search terms. As a result, you might need to conduct a few more searches than usual to develop that sense.

  7. Run and evaluate your program or plan. As you carry out the strategy in step 4, you’ll need to learn how to code, which you can do by going back to the course material or by googling as in step 6. Additionally, because you did the research on your own, you may find additional items that were overlooked in the plan. These items may just not be covered by the courses, but they may still be helpful. It can involve novel methods for visualizing data or fresh approaches to data preprocessing. If you think they are necessary, add some of them directly into your plan or place others on a separate waiting list so they won’t interfere.

  8. Share your research with us. You might share your notebook with us for your findings (better to have it organized, explained, and free of any temporary items:wink:) so that we can all offer advice on how you might have done it better or differently. Please share your discoveries, but remember to Google yourself for “how to accomplish XYZ” and mistake remedies as stated in step 6. Even if your method is not the most effective, what matters most are your discoveries and your approach. Your work should be 90% of the time, others’ suggestions should be minimial.

  9. Repeat step 3, 6, 7, 8 for working on more and more problems.

Some advices/tips:

  1. It may take you days to solve each of your initial challenges, but don’t give up; days of practice are nothing compared to years of application. It takes time for a human to develop that muscle memory.

  2. As if you were ready for autonomous work, do more research and experimentation. When you are stuck for ideas, search “How to XYZ”; if you are unsure whether the ideas will work, try them. This is also the quickest method of learning the solution without having to wait for others.

  3. Aim harder and higher. Building the ideal model is a worthy goal, but keep in mind that we are also aiming for many other things besides the ideal model. You will come across methods for fine-tuning your model; feel free to get carried away with them if you like, but please remember that you always have other things to worry about and that you still have a long way to go before you can actually create the best, most beneficial model using data from the real world.

  4. Here are links to some official tutorials. If you enjoy searching for solutions yourself, they could be very boring ;).

Good luck everyone!
Raymond

49 Likes

Thank you Raymond for sharing your thoughts with us :grin:

No problem :wink:

To be honest this approach is quite self-guided and probably won’t be easy at first for everyone. However I believe it will also be very rewarding because it is a process we need to think a lot and try a lot, and as a result, I hope we will also learn a lot and develop the muscle memory and confidence.

Also, we can discuss our findings.

Raymond

3 Likes

This is beyond awesome Raymond!! Thank you!!!

1 Like

Raymond,
Thank you so much for your thoughtful guidance on this matter. It answers my interrogation as I need to get actual practice to get more confident.

1 Like

Hello Brigitte @GwadaDLT,

You are welcome! Good luck with your journey, and feel free to share any findings and maybe we can provide some different angles.

Cheers,
Raymond

Thank you for this comprehensive list. I will start doing Kaggle competitions now, starting with the Titanic one. I want to do my first self written entry. I want to do it using logistic regression. Should I use scikit for it? I’m not sure how i should approach this.

Hello @Clemens! I am glad to hear that you are making a move! Yes, please use scikit learn and explore all input arguments of the logistic regression method. If you have time, please also try to implement it with tensorflow keras (1 input layer + 1 output layer with sigmoid activation). I think these will help you be familiar with existing tools. Also, with sklearn’s result as a baseline, you can practice to tune your tensorflow neural network (such as the learning rate) with the objective to match with sklearn’s result. As a further step, you can expand your neural network by adding hidden layer(s).

Lastly, please remember to implement 4a and 4b in the first post of this topic. In particular, you will need to split your dataset into training / cv / testing sets for 4b.

Good luck!
Raymond

2 Likes