In the practice Lab, Exercise 2 there is this the following line:
“X, y, centers, classes, std = gen_blobs()”
I tried to print it out and it is a large array of values. Does it have some constant shape of 800 that is then divided to our train, cv and test sets.
I could not find out more about it on the internet. Is it some ready built function etc. from libraries or toolkits we are using here
As far as I understand it, it is a function, that the team has built themselves.
These values are likely randomly generated and are used as a dataset for subsequent modelling.
X = are 2D values, imagine dots on a graph
y = target values for each dot
centers = not too sure, but these could be the distribution centers of each class
classes = possible target classes
std = standard deviation of the distribution
To make things easier, you could plot X and see for yourself, that these are really just blobs plt.scatter(X[:, 0], X[:, 1])
You’re welcome. It indeed seems to be an elaborate data generator. I usually draw random samples from distributions/ranges using the numpy or random library. But once you need to do this more often and reliably, I think it makes sense to create a function that does it all for you.