Combining data parallelism with model parallelism

Week 1 quiz had a question about combining data parallelism with model parallelism. I don’t remember seeing this in any of the lecture videos of week 1. Where was this covered? Is this from the optional video (if so it shouldn’t be optional) or any linked reading materials?