# C3_W2_RecSysNN_Assignment dataset questions

I have a couple of questions on this assignment:

1. " The reduced dataset has đť‘›đť‘˘=397 users, đť‘›đť‘š=847 movies". Later on we have this:

I thought â€śitem_trainâ€ť was an array of movies where each row is a separate movie ( with number of movie features for columns), of which we have 847? Where does the 50884 comes from, and what are these items?

1. This line of code produces 5 identical rows:

What does each row represent? User features per each movie (which would be all identical)? Why do we need an array like this instead of just using a 1-dimensional vector per user?
I am just confused about the data we are using for training and how we arrive at it from the original catalog of 397 users, 847 movies and 25521 ratings.

Thank you!

Hello @Svetlana_Verthein

Please check this thread out which will explain how we have gone from 25521 ratings to 50884 records.

For the 5 identical rows, please check this post out.

If you have other questions, let me know here.

Cheers,
Raymond

1. Thank you, here is from that thread:
â€śIf you de-duplicate the data, you get 25521 unique rows.â€ť - De-duplicating 50884 comes to 25442, not 25521, 79 less. Does that mean mean that 25442 movie ratings were duplicated and 79 were not?

2. From second thread: â€śI think we can attack a data problem in many different ways, and the way this assignment adopt is to â€śexpandâ€ť user-movie ratings into several rowsâ€ť - so the user vector gets replicated nm times - for the ease of programming/calculations/understanding?

thank you

@Svetlana_Verthein

Where do you get this number: 25442?

because 50884 / 2 = 25442

Why do you divide it by 2?

Isnâ€™t it what â€śde-duplicateâ€ť mean?

No.

Some ratings are repeated to boost the number of training examples of underrepresented genreâ€™s.

First, it is â€śsome ratingsâ€ť get repeated, not â€śall ratingsâ€ť get repeated. Also, getting repeated doesnâ€™t necessarily mean â€śgetting repeated onceâ€ť.

â€śde-duplicateâ€ť means undo those repeating.