Please help me with reinforcement learning

7 OCT 24

  • Week 3

I am currently working on a reinforcement learning project for an AI traffic management system. Although I have nearly completed the project, it is not learning properly. The model keeps predicting the wrong output, like 0 time or negative values. I have already tried training it more, but I believe my code might be incorrect. Here are my specific questions:

  1. In my state function, each road has 9 features, and there are n roads. I have created two classes: one for the environment (env()) to reset and take steps, and another for the DQN agent to take actions. Do I need to input data for all the roads when predicting the clearance time for the current road, or should I only use the data for the current road? Here is a code snippet:

    def build_model(state_size, action_size):
        model = models.Sequential()
        model.add(layers.Dense(64, input_dim=state_size, activation='relu'))
        model.add(layers.Dense(64, activation='relu'))
        model.add(layers.Dense(action_size, activation='relu'))  
        model.compile(loss='mse', optimizer='adam')
        return model
    
  2. I also suspect there might be something wrong with my DQN agent setup. I can’t provide my full code here, but could you recommend general things to watch out for when working with reinforcement learning?

Hello @danish19,

Assuming that all of the n roads can affect one another, it is thus reasonable to provide all of them as input to the model. Then, we might use the skills in MLS Course 2 Week 3 to make improvement. For example, with n \times 9 features, you might want to try bigger networks to lower the bias if both your training & validation losses are well below expectation.

You mentioned that the model predicts time, but from the code snippet, it predicts a value for each action. Is each of those values a time? And you choose the action with minimal time? If so, it is the opposite of the normal formalism of Q learning where a larger reward is better, because, in your case, a smaller time is better. I am not sure about the consequence of such change , but to change it back, a quick thing to try is to negate the predictions and offset it back to a positive value such that it becomes “larger is better”.

Cheers,
Raymond