Sentence chunking is easy to code up. Just split a corpus of text on newlines and periods. This works great if each chunk as subject-verb-object type structures like:
Llamas are vegetarians and have very efficient digestive systems.
Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old.
Llamas will sometimes spit on people.
But what is the strategy if the text is like this:
In general, llamas:
- will sometimes spit on people
- can live to be 20 years old
- are vegetarians and have very efficient digestive systems
Where the subject, “llamas” is on a different line from the details (bullet points). What should the strategy be here?