What does a "gen_blobs()" do?

In the practice Lab, Exercise 2 there is this the following line:

“X, y, centers, classes, std = gen_blobs()”

I tried to print it out and it is a large array of values. Does it have some constant shape of 800 that is then divided to our train, cv and test sets.

I could not find out more about it on the internet. Is it some ready built function etc. from libraries or toolkits we are using here

As far as I understand it, it is a function, that the team has built themselves.
These values are likely randomly generated and are used as a dataset for subsequent modelling.

X = are 2D values, imagine dots on a graph
y = target values for each dot
centers = not too sure, but these could be the distribution centers of each class
classes = possible target classes
std = standard deviation of the distribution

To make things easier, you could plot X and see for yourself, that these are really just blobs
plt.scatter(X[:, 0], X[:, 1])

Hope it helps (and that it’s correct)

Thanks! It is also plotted on the next line of the exercise,

I was just wondering what it is, because it seems to generate neat data for analyzing :slight_smile:

You’re welcome. It indeed seems to be an elaborate data generator. I usually draw random samples from distributions/ranges using the numpy or random library. But once you need to do this more often and reliably, I think it makes sense to create a function that does it all for you.

Generating blobs is a tool provided with sklearn.
It creates a data set without having to package a static data file along with the assignment.