Hi everyone,
I’m currently going through the RAG course and noticed something in the get_chunks_fixed_size_with_overlap function that I’d like to clarify.
In the current implementation, the loop steps by chunk_size, but the overlap is added by extending the start index backwards:
for i in range(0, len(text_words), chunk_size):
chunk_words = text_words[max(i - overlap_int, 0): i + chunk_size]
This means that from the second chunk onwards, each chunk actually contains chunk_size + overlap_int words instead of a fixed chunk_size. For example, with chunk_size=10 and overlap_fraction=0.2:
- Chunk 1: words[0:10] → 10 words
- Chunk 2: words[8:20] → 12 words
- Chunk 3: words[18:30] → 12 words
However, the diagram in section 2.2 shows chunks of equal/fixed size, where the step size is reduced to maintain the overlap while keeping chunk size constant.
To match the diagram, I believe the implementation should be:
step = chunk_size - overlap_int
for i in range(0, len(text_words), step):
chunk_words = text_words[i : i + chunk_size]
This way, every chunk stays at exactly chunk_size words, and the overlap is achieved by reducing the step between chunks.
Is this intentional, or could it be a small bug? Would love to hear thoughts from the instructors or other learners.
Thanks!
hi. @ulyaaliyeva206
Great focused learning is a sign of enjoying what you are learning, the instructor actually did really explained RAG techniques with detailed explanation.
As far as I remember when I took the course, the reason behind including the previous chunk size and end chunk size, is mainly improve semantic fragmention, maintain the contextual relativity, preventing any hallucinations thereby improving overall retrieval accuracy.
This part the instructor, @Zain_Hassan mentions in one of the video but I cannot remember! the particular video title where he mentions this part.
Regards
Dr. Deepti
Hi @Deepti_Prasad, thanks for the response!
I totally agree that overlap is important for maintaining context and improving retrieval accuracy — that’s not what I’m questioning 
My point is specifically about the chunk size growing. With the current implementation:
- Chunk 1: 10 words (correct)
- Chunk 2: 12 words (chunk_size + overlap)
- Chunk 3: 12 words (chunk_size + overlap)
The overlap is achieved by extending the chunk backwards, which makes it larger. But the diagram in section 2.2 shows fixed-size chunks where the overlap is achieved by reducing the step instead.
Both approaches give you overlap
but:
- Current code: chunks become bigger than chunk_size
- Diagram approach: chunks stay exactly chunk_size
So the question isn’t “should we have overlap?” (yes!) but rather “should chunks stay at fixed size while overlapping?” — because the diagram shows one thing and the code does another.
Small difference, but wanted to flag it for accuracy!

I did see the diagram but I read the instructions in that diagram which is matching with lab codes implementation.
probably the figure needs to be updated 
!??
Will inform the staff.
thank you for informing.
Regards
Dr. Deepti