Hello,
would someone please, explain the function “load_planar_dataset()” from “planar_utils” module, and specifically what is the term “np.c_[rnp.sin(t), rnp.cos(t)]” is for?
does it have an actual use other than making the scatter plot look pretty (like a flower)?
if it has another use, please clarify, and explain the whole function if possible.
Thanks
np.c_
is used to concatenate arrays. Use help(np.c_)
to learn more about it. Each row represents the position of the datapoint in 2D space. You can see the plot in the notebook. The dataset aims to show that there is no straight line that splits the 2 classes with 100 % accuracy.
Thank you, your reply was really helpful, but I still have a question.
why it has to be drawn in this way(like a flower)?
I mean you could use “r = np.linspace(0.0,1,N)” and still get your point, that there is no straight line to split the classes, and yet you get a propeller-looking scatter plot.
I think what I am not understanding exactly is the “math” the whole function is built on, so please if there is a resource that explains why to use this and not that and how the whole "load_planar_dataset()"function is working, post it.
And I am not sure about the following idea but, I believe if you change “Theta” in a certain way you could split the two classes easily so why not.
again I am not sure about the italic style lines.
Thanks.
The dataset is created such that the classes are not linearly separable. Neural networks are capable of figuring out non-linear decision boundaries. A linear classifier like LogisticRegressionCV
is incapbable of achieving good accuracy.
A point in space can be represented by the horizontal & vertical components: (r * cos(\theta), r * sin(\theta)). This dataset just changes things around. This change doesn’t really affect the focus of the assignment. If you need more details, look into the function that generates the data points.
Thanks again, would you please elaborate on “The dataset is created such that the classes are not linearly separable.”
maybe by answering:
- how would you create such an image dataset?
- what tools to use when creating it?
- is there a way to cast a linearly separable dataset to a non-linearly separable dataset and vice versa?
note: I already have the course certificate, only used the assignment in the title for reference. I am curious about how the assignment model is fully built because I am trying to build my own logistic regression project and it feels like I have so much to learn before doing so but not sure what. I think I am feeling lost.
Please keep in mind that the dataset when plotted looks like a flower. No single datapoint represents an image. So, it’s safe to call it just a dataset and not an image dataset. mnist is an example of an image dataset since each row holds pixel values of an image.
You can generate data that doesn’t have a linear decision boundary by using non-linear transformations just like shown in the assignment.
Here’s an example in tensorflow playground where you can look at decision boundaries of both kinds.
Dataset that’ll have a linear decision boundary:
Dataset that’ll have a non-linear decision boundary:
Packages like scikit-learn provide data generators to produce synthetic data.