Why mismatch between number of data points and number of vectors created?

vsrinivas · August 14, 2024, 9:24am

In the lesson-3 Recommender Systems, in the default practise notebook we use a dataset of 100 observations from.csv file (nrows=100) for creating embedding vectors of df['article'] and storing in PineconeVDB. Why is that we see that number of vectors created as 1000. as shown in:

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 1000}},
 'total_vector_count': 1000}

Similarly in the demo by the instructor, he selected 1000 rows from the CSV, but he got 10500 vectors created. I think the reason is to do with the chunking of the data by text_splitter. Can someone explain the calculation behind this? Thank you.

Topic		Replies	Views
Pinecone - Lesson 1 - Semantic Search: Fails to Create Embeddings and Upsert to Pinecone Building Applications with Vector Databases week-1	1	148	March 3, 2024
Lesson 3 - Recommender Systems - can't embed vectors into Pinecone Building Applications with Vector Databases	0	142	March 3, 2024
C1W4_Assignment Index Errors Linear Algebra for Machine Learning and Data Sc... week-4	7	293	August 15, 2024
RAG - Can't upsert embeddings - Prepare the Embeddings and Upsert to Pinecone Building Applications with Vector Databases week-1	1	146	March 3, 2024
C3_W2_RecSysNN_Assignment dataset questions Unsupervised Learning, Recommenders, Reinforcement week-2	9	558	February 27, 2023

Why mismatch between number of data points and number of vectors created?

Related topics