Please help me with reinforcement learning

danish19 · October 7, 2024, 2:25pm

7 OCT 24

Week 3

I am currently working on a reinforcement learning project for an AI traffic management system. Although I have nearly completed the project, it is not learning properly. The model keeps predicting the wrong output, like 0 time or negative values. I have already tried training it more, but I believe my code might be incorrect. Here are my specific questions:

In my state function, each road has 9 features, and there are n roads. I have created two classes: one for the environment (env()) to reset and take steps, and another for the DQN agent to take actions. Do I need to input data for all the roads when predicting the clearance time for the current road, or should I only use the data for the current road? Here is a code snippet:
```
def build_model(state_size, action_size):
    model = models.Sequential()
    model.add(layers.Dense(64, input_dim=state_size, activation='relu'))
    model.add(layers.Dense(64, activation='relu'))
    model.add(layers.Dense(action_size, activation='relu'))  
    model.compile(loss='mse', optimizer='adam')
    return model
```
I also suspect there might be something wrong with my DQN agent setup. I can’t provide my full code here, but could you recommend general things to watch out for when working with reinforcement learning?

rmwkwok · October 12, 2024, 1:36am

Hello @danish19,

Assuming that all of the n roads can affect one another, it is thus reasonable to provide all of them as input to the model. Then, we might use the skills in MLS Course 2 Week 3 to make improvement. For example, with n \times 9 features, you might want to try bigger networks to lower the bias if both your training & validation losses are well below expectation.

You mentioned that the model predicts time, but from the code snippet, it predicts a value for each action. Is each of those values a time? And you choose the action with minimal time? If so, it is the opposite of the normal formalism of Q learning where a larger reward is better, because, in your case, a smaller time is better. I am not sure about the consequence of such change , but to change it back, a quick thing to try is to negate the predictions and offset it back to a positive value such that it becomes “larger is better”.

Cheers,
Raymond

Topic		Replies	Views
Input to DQN in reinforcement learning Unsupervised Learning, Recommenders, Reinforcement week-2	2	476	May 16, 2023
Reinforcement learning: How can you be sure the NN calculates the right thing? Unsupervised Learning, Recommenders, Reinforcement week-3	1	523	August 14, 2022
Want more on DQN, basically on multi-agent Unsupervised Learning, Recommenders, Reinforcement week-3	4	527	November 3, 2022
Tried to make a Deep Q-learning script from scratch using tensorflow Unsupervised Learning, Recommenders, Reinforcement week-4	5	190	May 30, 2024
Deep Reinforcement Learning Unsupervised Learning, Recommenders, Reinforcement week-3	1	499	January 2, 2023

Please help me with reinforcement learning

Related topics