Hi dear, I have a problem in RL to block some outputs from training

Hi dears

I am doing some exercises of reinforcement learning and considering this model:

self.optimizer = tf.keras.optimizers.Adam ( learning_rate = LR )

self.model = tf.keras.Sequential ()
self.model.add ( tf.keras.layers.Dense ( self.deep , activation = tf.nn.relu , name = 'DenseInput' , input_shape = [ self.state_size ] ) )
self.model.add ( tf.keras.layers.Dense ( self.deep , activation = tf.nn.relu , name = 'DenseDeep' ) )
self.model.add ( tf.keras.layers.Dense ( 4 , activation = None , name = 'DenseOutput' ) )

self.model.compile ( optimizer = self.optimizer , loss = 'mean_squared_error' , metrics = [ 'mse' , 'mae' , 'accuracy' ] )
self.model.summary ()

I’d like to ignore in training time some labels because i need to update the model only to fitting one specific action, in fact we can see the model’s outputs as actions we can do for each state ( states are inputs ).

Like we do in our mind i want to ignore some labels that otherwise will be trained.
is possible doing that using a labels composed like this: [ None , None , 0.76 , None ] for one single state?
Is there a way to communicate to TensorFlow to ignore labels tagged as None ?

In internet i saw more solutions made using torch’s functions and using some minimize functions we can get from our optimizers but to be honest i don’t like it too much.

I would be very happy to find a way to training my model choosing the right labels for me.
i can’t use softmax because any outputs containing generic value ( rewards ) with a vary large range.

To be complete my description:

self.model = tf.keras.Sequential ()
self.model.add ( tf.keras.layers.Dense ( self.deep , activation = tf.nn.relu , name = 'DenseInput' , input_shape = [ self.state_size ] ) )
self.model.add ( tf.keras.layers.Dense ( self.deep , activation = tf.nn.relu , name = 'DenseDeep' ) )
self.model.add ( tf.keras.layers.Dense   ( 1         , activation = None       , name = 'DenseOutput' ) )

This is the middle model i need to training some times and i need to transfer what i learned in the previous model.
I can’t do this because if i will training any single state in this way i will lose the power of the optimization → i need to training all this middle steps in the first model ( so i can collect the loss in the main process ).

best regards
Samir ( nickname )

Problem solved here: deepL_RL/exercise_space_ship at main · AmalLight/deepL_RL · GitHub.

For this propose i used RNN ( short memory with less weights ) not LSTM ( long memory ).
To training a single action it is sufficient to train in this way: ( predicted_action_0 , predicted_action_1 , label_for_action_2 , predicted_action_3 ).

Thank you for your time :joy: