In the lesson-3 Recommender Systems
, in the default practise notebook we use a dataset of 100 observations from.csv file (nrows=100) for creating embedding vectors of df['article']
and storing in PineconeVDB. Why is that we see that number of vectors created as 1000. as shown in:
{'dimension': 1536,
'index_fullness': 0.0,
'namespaces': {'': {'vector_count': 1000}},
'total_vector_count': 1000}
Similarly in the demo by the instructor, he selected 1000 rows from the CSV, but he got 10500 vectors created. I think the reason is to do with the chunking of the data by text_splitter
. Can someone explain the calculation behind this? Thank you.