Make_blobs function

hi dear community,
I dont understand what exactly make_blobs function does. the sklearn documentation is also not very explanotory

I just presume that it creates an array or a 1-n matrix as an example which is fine. maybe I should not really care what the function does at this point.
but then this code segment gives a perplexing output

p_nonpreferred = model.predict(X_train)
print(p_nonpreferred [:2])
print(“largest value”, np.max(p_nonpreferred), “smallest value”, np.min(p_nonpreferred))

out:
[[3.54e-03 3.31e-03 9.71e-01 2.21e-02]
[9.93e-01 6.77e-03 6.97e-05 3.26e-05]]

why is the output layer which should be an array of propobilities rather a matrix with a dimension (m,n)?

Hello @mehmet_baki_deniz,

I am not sure which lab those codes are from, but if I just look at these outputs:

[[3.54e-03 3.31e-03 9.71e-01 2.21e-02]
[9.93e-01 6.77e-03 6.97e-05 3.26e-05]]

It is an array of (2, 4) because print(p_nonpreferred [:2]) printed the first 2 samples, and the problem is a classification of 4 categories. They ARE probabilities, because numbers in any row are added up to 1.

In general, p_nonpreferred should have a shape of (m, 4) where m is the number of samples, and 4 is the number of classes.

============================

Regarding make_blob, if you look at this part of the documentation:

image

It says it returns an array of shape (n_samples, n_features) for X where each row is a sample and the samples’ labels are in y. This example has a graph for a set of data generated by make_blob. The dataset has 4 features, but only the first 2 features are used for the visualization.

Besides reading the documentation, I also suggest you to use the function and print the outcomes to inspect them. To make it easier, you might use small values for n_samples and n_features such as 10 and 2.

Cheers,
Raymond

2 Likes