# C_3W_2 final lab struggling with def sq_dist(a,b): function

in C_3W_2, working on the final lab “C3_W2_RecSysNN_Assignment” but I am struggling with the exercise to write a function to compute the square distance. I have gone back over the lectures and past exercises but no where do i see any reference to the def sq_dist(a,b): function. is it correct that this is a new function for us to learn in this final lab of C_3W_2? if so, i am at a loss what code is required. are vectors a and b the movie vector and user vector? for we had not defined a and b as such earlier in the lab. any guidance would be much appreciated.

It’s a good chance to try to translate some maths or concept into code ourselves! Let’s take a look at the distance definition required by the exercise (you may also find this in the exercise’s description right above the exercise’s code cell)

This is called the square distance, as the function’s name suggests. It takes two vectors (\mathbf{v_m^{(k)}} and \mathbf{v_m^{(i)}}) as input and compute their distance. The symbols of the vectors look complicated, so why don’t we just call them \mathbf{a} and \mathbf{b} instead? I hope you can see that we do have some freedom to choose what symbol to use, as long as we stick with the choice in the rest of the calculation.

So we are changing from \mathbf{v_m^{(k)}} to \mathbf{a}, and \mathbf{v_m^{(i)}} to \mathbf{b}. I will leave the work to change the symbols in the above formula to you. The change should be very simple, it like changing an apple to an orange by taking away all apples and putting oranges to every position the apples originally hold. However, note that on the right hand side of the formula, there is an additional subscript l for each symbol. We cannot get rid of it in the changing. The subscript represents we are taking out the l-th component of the corresponding vector. This representation has nothing to do with the change of the symbol, and should thus be kept before and after the change. So, for example, you should see a_l at the end after changed - see, the l remains.

Now we also know that it has something to do with the components, and suggested by the summation sign and the index below and above the sign that goes from l=1 to n, we need to take out each and every component from the two vectors, pair-by-pair, subtract among within the pair before taking the square, and add the squaring results up.

In course 1 and 2, we should have a lot of chance having to go through elements inside an array (computational equivalence to the mathemathical term vector) using loops, and a loop is what you may use to compute it.

Lastly, in the function’s def line, a and b have been provided as the function’s input arguments, so in your code, you need to manipulate these two variables as explained in above, and return a float value which is the distance.

I hope this is clear! Give it a try and good luck

Raymond

but should it be:

So we are changing from \mathbf{v_m^{(k)}} to \mathbf{a} , and \mathbf{v_m^{(i)}} to \mathbf{b}.

so the 2nd k vector in what you write should be the i vector?

You are absolutely right. It was my mistakes. I will change that back to i.

Thank you.
Raymond

P.S. I edited your post to better format things and show the equations.

1 Like

yes. sorry. the formatting was messy. how did you fix it?

Let the quotation block starts in a new line, and wrap the equation syntax with two dollar signs. For example, $syntax syntax syntax$

You should also be able to see my edit result, by clicking the edit button of your post.

Thanks for your feedback so far Raymond. i am struggling with how we call/define m. I understand we will loop for l in range(m): and I beleive we can use np.sum to define d. but how do i define m?

is your m the size of the vector? If so, you may use len to find m.

If you accumulate your squaring results to a list, then you may use np.sum to act on the list and get the result.

The usual tricks in this course is instead to use an accumulator variable, let’s say, total, and assign 0 as its initial value, then accumulate the results to the accumulator variable.

Either way will work and it is your choice or it is your experiment work to do.

mmm ok. but len() takes exactly one argument, so I cannot define m as the len of a and b. and trying to take the len() of the sq_dist gives TypeError: object of type ‘function’ has no len().

going with total and setting initially to 0. would i then define m as equal to total and then loop for l in range m?

Again, you need to consider what the meaning of your m is. Would you please tell me that? if m is the size of the vector, then you put the vector (which is either a or b) into the len function to find out the length of it.

If you know what you are looping over, you should know the meaning of m.

m is the number of movies within the dataset for which we will provide recommendations. is that correct?

i suggest you to consider the code focusing on and only focusing on this formula, because you are implementing this very simple maths formula.

Why do you need a loop - because you need to repeat some calculations (it doesn’t matter whether it is movie, or user, or apple, or orange).

What calculation do you want to repeat?

How many times to repeat? To repeat is to loop, then how many times to loop over?

What is the purpose of range(something), what’s the meaning of for i in range(2), and how’s it different from for i in range(3)?