How to implement the RL solution as image given below in tensorflow?

I am giving a rough overview:

Input: Inputs given to model of some shape

Output: Output from model of some shape
reward=reward_function()
memory=Sequential_Memory()
NN= Sequential_Tensorflow_model()
DQN = DQN model from keras-rl, where DQN(model=NN) NN is passed to DQN as parameter

  • Now I want to train on X_train,y_train, where the inputs are given to NN Model first and passed next to DQN model, where the experiences from this training is stored to DQN Memory by DQN.forward(reward=reward_function,observation=X_train[some_index]).
  • Here the NN_model is trained and DQN Model is storing experiences and learning from reward
  • Now after training on X_train, y_train. Now I want to test on test set, where the predictions of test set should be contributed by both NN model and DQN model. Unlike training, where the NN model is trained on X_train,y_train and DQN is storing experiences and learning how to predict actions from reward.
  • I want the predictions on X_test, where both DQN and NN model should contribute for prediction.

In Simple Terms:

  1. During Training → The NN model is trained on X_train, y_train, DQN(model=NN) is storing experiences and learning thorugh reward
  2. During Testing → I want the predictions on X_test, where both DQN+NN models contribute to prediction, not only NN model.

In the given image, the model is Hugging face transformer model for extractive question-answering and here dqn=DQN(model=Hugging_face_model).

I want to train, dqn as seen image and note the difference in testing, where I am using DQN for predicting. Can I do like this or is there any alternative?

I am searching for lot of literature but not able to find how to implement this?

Course 3 Week 3 has a complete presentation of Reinforcement Learning.