LLM to generate Russian text using only common words

I plan to create an app for learning Russian quickly. I’ll utilize open-source models from the Hugging Face website. The app should generate text using common words. Initially, users will become familiar with these words. Later, the network will create text stories to help users memorize them. Over time, users will build a substantial vocabulary. Ideally, the network should generate text using only familiarized or common words. I see two options:

  1. Train the network using a corpus of common words, although finding such a corpus might be challenging.
  2. Fine-tune the network after training it on a general corpus, but this approach also requires a common words text corpus.

Additionally, you can encourage the network to use specific words by providing constraints during text generation.

Have you posted this topic in the correct forum area?

It sounds like a project description, but you posted it in “AI Questions”.

Sorry, it was a question.

Edited:

I plan to create an app for learning Russian quickly. I’ll utilize open-source models from the Hugging Face website. The app should generate text using common words. Initially, users will become familiar with these words. Later, the network will create text stories to help users memorize them. Over time, users will build a substantial vocabulary. Ideally, the network should generate text using only familiarized or common words. I see two options:

  1. Train the network using a corpus of common words, but I cant find such corpus
  2. Fine-tune the network after training it on a general corpus, but this approach also requires a common words text corpus.

Additionally, how to encourage the network to use specific words by providing constraints during text generation?