Hi, I’m struggling to understand why the choice of 32 output layer neurons and what exactly is represented in each neuron? Also, how do we know that the output layers for the user and movie models line up across those 32?

I am not familiar with this course but the output should match the classes that you want to detect!

I guess I’m wondering where and how that’s determined when the input model has fewer than 32 input features.

Are you referring to this?

But it’s a good question, I’m working through the details to see if I understand how this specific lab is implemented. There appear to be some key items in the model that are not explained very clearly in the lab.

Exactly. The users model has 14 features and the items/movies model has 16 features. Both models have 32 outputs and are then dot-product combined. I don’t understand where the 32 comes from or what each output represents for either model (other than perhaps the 14 genres). Since they are combined via a dot product, the outputs for both models must align on some consistent configuration, otherwise wouldn’t we just be dot-producting arbitrary numbers together?

The 14 user features are genres as are 14 of the 16 features in the movies model (and ordered the same way), so that seems to account for at least 14 of the 32 outputs.

I feel like there’s a key bit of information I’m missing here somewhere.

In general the numbers of output units from an NN does not have to equal the number of input features. They’re independent.

Say you were trying to predict the current temperature, and you had 20 different types of measurements in the data set (location, humidity, cloud cover, day of the year, time of day, etc…)

That would be 20 inputs and 1 output.

Ok, fair statement and I understand your point. But in that scenario you’d also have a single target, temperature, that would be part of your training data. In this case there are 32 outputs for both models and I have no idea what they are or why we have 32 of them. :-/

I agree, and that’s what I’m looking into.

At least the final output is a single number (a prediction) because of the dot-product of the 32 value output vectors from the user and movie models. So at least that part makes sense.

I’m starting to think that the 32 is a somewhat arbitrary choice to drive precision. Though they are the output layers of two independent models, you could kind of also think of them as being hidden layers one step before getting the single final output. In that sense, perhaps they’re no different than the 256 or 128 neuron hidden layers.

That’s entirely possible.

Hello @as75

You may consider it as 32 latent factors are configured to represent an item or an user. If it was set to 1 instead, then each user would have been represented by just one value which might not be enough to differentiate an user from another user. Two users might be similar in one latent aspect but different in another, which means that we need a representation richer than just 1-dimension.

They are latent because we can’t interpret them but, in reality, we know that we can’t describe the diversity with just one dimension, can we?

Ofcourse precision matters, and a good representation should drive the precision?

Cheers,

Raymond