Doubt regarding implementation of an idea

To better understand the text generation portion, I was thinking of creating a project implementing it. To put it simply, the task is to fill in the blanks, with the program finding the most suitable word for it.

Unlike the example covered, this will have to take into account both the text on the left and the one on the right. The one in the course only looks at one direction (left to right). So I was wondering if there was a good approach to doing so.

My current idea:
Have two different sequences. One for predicting left to right (for the text on the left), and one for predicting right to left for the text on the right. The final value, will take into account both of them, to choose the best label.

Please search online for masked language modeling.

Thanks for pointing me in the right direction. This will probably make the implementation much easier for me.