I am not familiar with Python so I am totally lost with the required loops, dictionaries, and match statements - basically I don’t know which values I need where or how to access the values I think I need. Also, in Exercise 5, in the case parts of the match statement, there are no "None"s indicated - but I clearly need to put something there, right?
For reference, I have managed to finish everything else in this entire course as it was primarily “math/stats” based and not as much coding based. The only coding I have done in the last 20 years is in VBA. I took this course to gain insight for my students (I teach Statistics and other math courses) in how these topics relate to Machine Learning. If there is anything you can suggest to help someone as clueless as myself get through these last exercises, I would greatly appreciate it.
I would suggest for you and it is beneficial in many ways to take some online courses about python. There is a good one in Coursera, “Python for Everybody” by University of Michigan I think.
Hello @DebbieA.
This course seems to depend on a intermediate knowledge of Python. I think the use of classes and data frames are the challenge for you on this assignment. The exercise #5 uses both and if you succeed you’ll finish the assignment (with some effort, but you’ll have the tools).
I recommend the basic “Python 3 programming” course from Michigan which is offered by Coursera, if you want to get the tools.
I’ll try to give you some tips for all Python tools you need in Ex #5, if you want to keep trying:
- Remember the breeds are numbered 0, 1, 2, so you can loop over them as a list which in Python is just putting them this way [ 0, 1, 2 ].
You can iterate over a list with “for item in list_name” or “for item in […]”.
First you’ll iterate over the breeds list.
The second iteration is over the features list.
-
The case statements are just equivalent to a series of “if - else if” conditionals. Each case is about a “feature”.
-
I recommend you to use two statements for each “case” conditional, because it is easier to visualize. Let’s see each one separately:
4.1 First you read (mu, sigma) or (n, p) or (a, b), depending on the “case”. You should assign from the proper “estimate function” for each case, the functions you have just coded before.
As the functions return two parameters, you can assign them this way:
x, y = estimate_function_name(data_frame_column)
The estimate function needs the correct “column” from the data frame “df_breed”. You can read a column using df_breed[column_name]. Pass it in the estimate function.
4.2 Now it’s time for the second statement, which is saving the parameters to a dataclass object. You can do it with this command:
dataclass_object = proper_params_function(x, y)
The proper_params_function is different for each case depending on its appropriate distribution (gaussian, binomial or uniform).
- Finally, you create the dictionary. As it is a dictionary within a dictionary, the exercise divides it in two parts. You have a dictionary of “features” for each “breed”. So, the inner dictionary is the features’.
First save the dataclass to the inner_dict:
inner_dict[inner_part] = dataclass_object
And then save the inner_dict for each outer variable:
params_dict[outer_variable] = inner_dict
I hope you succeed!
thank you - this is very helpful!
the only part this is still confusing to me is where you mentioned the proper_params_functions(x,y) for defining the dataclass object - as we already utilized the distributions themselves to get the parameters, I don’t see why we need the distributions again - is the only difference that you are returning different parameters? And if so, how do we get two values into the dataclass object? Should I be thinking of it as an array? I think I am just not (once again) understanding the different data type constructs here.
Thank you again for your help. Some things are making sense now.
First, sorry! I messed things a little. I also struggle with Python classes/objects.
Consider proper_params_functions as proper_class. Previously you can find the script which declares a class (the notebook provided). There you have 3 subs for gaussian, binomial and uniform. Pick the appropriate for each “case”.
With this you’re saving the params in an object that soon you’ll save in the inner dictionary.
About using the distributions “again”. There’s a subtle difference. First you generated artificial data to build a dataset using the inverse distributions to create data from given mean, std deviation, n, p, etc. In real life, you would collect lots of data to build the dataset of breeds and features. And later you would estimate the distributions and calculate the parameters (mu, sigma, a, b, etc), which is exactly what you’re doing in ex5.
The confusion is that we simulated data from known distributions and we have to pretend we didn’t to later fit a distribution for each breed-feature pair at ex5.
I hope I made it clear
And finally, the reason why using classes… I can’t help you much, because I wouldn’t use them if I wrote this code. But I’m not a programmer. I really struggle with classes. I never use them if I must not. I use reluctantly in some specific types of scripts. But experienced programmers say it makes things easier…
Thank you! I think exercise 5 is sorted now.
Nice job on sorting this @DebbieA! And thanks to @Joao_Carlos_Lima_Sel for providing such detailed feedback.
The use of dataclasses was to showcase this feature of Python which a lot of people don’t use but are extremely helpful specially in case in which you just want to hold some data. That being said we will consider reducing the complexity of the coding exercises since it is looking like some learners are struggling with it. May I ask @DebbieA, did you find value in learning about these features or it was just annoying for you? Thanks!
@a-zarta
I would say there was some value in learning about the data classes however there was a great deal of frustration, too. I felt very much out of place at times. It did prompt me to learn more about Python which I found interesting. As I mentioned in my original post, my hope was to learn more about the specific math used in Machine Learning to share with my students. The math itself was not new to me but the application to machine learning was. Learning some new data structures was an unexpected “bonus”. Thank you for asking!
how did you sort it out?
Going back a little you can see the expected format of the dictionary. You can see that there is a dictionary in another. And also the information to store in it.
The code itself gives a hint about it, requiring two parts for saving first the inner part and then the outer part.
But don’t think I got it easily
I remember that I needed to print each variable, list, dict, class and reason about each step of this function. I think I was the most difficult part of the assignment. It depended too much, in my opinion, on python skills.
i am matlab user, Python is more difficult to understand
My first Machine Learning course used Matlab. I agree with you that it is much simpler, maybe because I used a lot during Engineering Graduation.
Therefore I spent the last years working with Python and now I’m really adapted to it. It has some advantages because of its open source libraries, which are really good, and the huge community to share thoughts, insights or get help sometimes.
I would encourage you to take a basic Python Course. It could be “Python 3 programming - specialization” or “Python for Everybody”, both offered in Coursera. It will certainly open new roads for you.
I couldn’t have completed this excercise without your explanation, thank you very much!
@a-zarta for future references, perhaps these hints could be included under each match-case statement?
None, None = estimate_None(None[None])
dataclass_object = params_None(None, None)
Even if I have been using Python for more than 2 years. I never used match case statement so far. I looked at how to use it from this link:
Hi Everyone, thanks for your help. I feel this assignment exercise required more information, specially for those like me that are pretty weak in probably and are using this as an opportunity to reinforce in this area.
I am struggling on this exercise for sure!!
It was absolutely annoying for me. You should consider that people don’t generally learn well when they are frustrated or thinking about another goal. In this case, we’re trying to complete an assignment and unlikely to explore features that are used in it, especially when, as it seems to me here, they’re only there to add needless complexity.