C1M3 ungraded lab mixed_chunking() docstring is a little bit misleading :)

Hey all! :waving_hand:

Loving the course! Quick note on mixed_chunking :memo:

The docstring says:

“larger chunks can be further split at the middle or specific markers”

But the code only handles small chunks (merging them). Large chunks just get appended as-is :thinking::

if len(new_buffer_words) < min_length:
    chunk_buffer = new_buffer    # small → merge ✅
else:
    new_chunks.append(new_buffer)  # big → no split ❌

This can be a bit misleading when reading the docstring — I initially thought the function would also split large chunks, and spent some time looking for that logic in the code :sweat_smile:

Maybe the docstring could be updated to reflect the current behavior? Or was the split logic planned for later? :blush:

hi @ulyaaliyeva206

GLAD YOU ARE ENJOYING THE COURSE.

did you check the metadata? they probably only provided the instruction but didn’t implement in the function.

Probably because of min_length, chunk_size, number of tokens the lab is working upon?

Regards

Dr. Deepti

Hi @Deepti_Prasad, thanks for the reply!

Yes, that’s exactly my point — the docstring describes behavior that isn’t actually implemented in the code. The function only merges small chunks (< min_length), but doesn’t split large ones.

You’re right that with the current lab data and parameters, it probably doesn’t cause issues. But it can be confusing for learners (like me :sweat_smile:) who read the docstring first and then try to find the splitting logic in the code.

A small update to the docstring to match the actual behavior would help avoid that confusion.

Thanks! :folded_hands:

I will convey to learning technologist of the course about your feedback.

regards

Dr. Deepti