Hi! I am watching hints of your solution of compute_centroids(X, idx, K) and I really can’t understand what you offer. See screen
Is it something like inner syntax in numpy matrix that contain connection to any cluster?
Hi! I am watching hints of your solution of compute_centroids(X, idx, K) and I really can’t understand what you offer. See screen
Is it something like inner syntax in numpy matrix that contain connection to any cluster?
“idx” is a list containing the value of nearest cluster of respective data point( index_value)
you are going to make this list before defining compute_centroid function.
for example:
idx=[1,0,2,1]
datapoint 0 - is nearest to cluster 1
datapoint 1 - is nearest to cluster 0
datapoint 2 - is nearest to cluster 2
datapoint 3 - is nearest to cluster 1
“idx” can be replaced with any name of your choice. its not an inbuilt function.
X= list that contains all the data points
X[idx==1] gives you all the datapoints which has been assigned to cluster 1
As for the above example,
X[idx==k] = [0,3] (data points assigned to cluster 1 as for the list “idx”)
I might have included the words boolean and indexing in the explanation.
This code is leveraging some of the power of Python to achieve multiple steps written in a single expression. First, the comparison operator == is creating a new list that is the boolean result of comparing each value of some existing list with a value (here 0). Then, that list of boolean values is used to index, or filter, or slice, a subset out of some existing list. This second new list is comprised of each element where the boolean comparison evaluated to True. The list that was used to generate the booleans and the list against which the booleans are applied can be the same, or, as in this example, different, as long as they have the same shape.
boolean_list = (one_existing_list == 0) #create first new list with boolean values eg [True False …True]
filtered_list = another_existing_list[boolean_list] #create second new list by slicing out the True ‘rows’
You can find discussions of this usage by searching the interweb on python boolean mask or python boolean indexing. It also shows up in the context of Numpy and Pandas. It’s pretty common in Python-based data sciencey stuff and can also be applied in multidimensional cases.
HTH
what is this syntax? can you explain me? I see in first time. Is it something like:
boolean_list = (i for i in one_existing_list if i == 0)
and this too? Indexing from list by list??
Not exactly. As I attempted to describe in the narrative in my first reply above, this step creates a new list of the same length as one_existing_list where each element of the new list has a boolean value. Whether each element value is True or False depends on whether the conditional == is satisfied.
Yes, but it is using the list of boolean values based on the conditional, and only slicing (keeping) the elements of the target list where the elements of the boolean list have value True
Can you give me links in docs, please? Because I can’t understand anything and can’t repeat in python console
As I wrote above…
Maybe worth reading through some of those
so, is this only about numpy functionality, right?
True Positives (tp) means the number of times the prediction is correct, right? So how do you know if the prediction is correct? You create a boolean list of the predictions that have a positive value (True iff ==1) and a boolean list of all the training data that have a positive value (True iff == 1) then AND those two lists together and count how many True values result.
I’m not sure what your background is, or your purpose for taking these classes, but I highly recommend adopting the practice of writing little test scripts to work through these questions on your own. In this example, print out what predictions consists of. Print out what (predictions == 1) consists of. Print out what ((predictions == 1) & (y_val == 1)) consists of. Now not only do you have the answer to this question, but you start to develop a skill for answering future questions as well. Good luck on your journey.
It works with other libraries too…like pandas
So, I would like to conclude
In numpy and pandas we have mechanism of python boolean mask or python boolean indexing that helps us to convert array of elements to an array of False and True
>>> l = np.array([1,2,3])
>>> l>1
array([False, True, True])
>>> l==2
array([False, True, False])
This new array with True and False can be applied as filter on another array that is connected with the first and usually has the same length. So, we will get just elements from another_existing_list
, on which place is True in the boolean_list
In initial case it was the filter that we derived from idx and was applied on array X.
Sometimes we want to separate only that True in two filters (masks) that are True in both. We can do like this
>>> (l>1) & (l>2)
array([False, False, True])