Seeking Assistance with Training a DQN Agent in a Two-Player Board Game Environmenta

osobenz · December 9, 2023, 9:12pm

Hello everyone,

I’m working on a reinforcement learning project where I need to train an agent to play a two-player board game called “Force 3”. I’m using a Deep Q-Network (DQN) architecture for this purpose. My Q-network has dense layers and is set up to output a distinct Q-value for each possible action combination in the action space, which is quite extensive (3 x 9 x 9).

Here’s my challenge: despite running a significant number of training episodes (2000 episodes with 100 timesteps each), the agent’s cumulative rewards continue to decrease, suggesting that the agent is not improving its performance over time. I have reviewed and adjusted the reward function to balance rewards and penalties, but this does not seem to be enough.

I suspect the agent is not learning effectively because it’s playing “alone” without an active opponent. In the current code, the agent performs its action, receives a reward, and the game state updates, but no logic is implemented for the opponent’s actions between the agent’s turns.

Here’s a snippet from my step function in the “Force3Env.py” :

def step(self, action, move_two):
** # Convert the action (an integer) into a specific action in the game**
** action_type, row, col, target_row, target_col = self.convert_to_action_tuple(action)**

** # Apply the action in the game and retrieve the additional information**
** board, game_over, winner, success, error_message = self.force3.step((action_type, row, col, target_row, target_col), move_two)**

** # Calculate the reward**
** reward = self.calculate_reward(board, game_over, winner, action_type, row, col, target_row, target_col)**

** done = game_over**
** info = {‘winner’: winner}**
** return np.array(board).reshape(-1), reward, done, info**

And here’s a snippet from my step function in the “Force3.py”:

def step(self, action, move_two):
** if self.game_over:**
** return self.board, self.game_over, self.winner, False, “Game is over. Please reset.”**

** action_type, row, col, target_row, target_col = action**

** # Check if the action is valid**
** if not self.is_valid_move(action_type, row, col, target_row, target_col):**
** return self.board, self.game_over, self.winner, False, “Invalid move.”**

** if action_type == ‘place_round’:**
** self.board[target_row][target_col] = self.current_player**
** self.round_tokens_placed[self.current_player] += 1**
** elif action_type == ‘move_square’:**
** self._move_square(row, col, target_row, target_col, move_two)**
** elif action_type == ‘move_round’:**
** self._move_round(row, col, target_row, target_col)**

** # Search for a winner**
** self.game_over, self.winner = self.check_winner()**

** # Change the current player**
** if not self.game_over:**
** self.current_player = -self.current_player**

** return self.board, self.game_over, self.winner, True, None # Aucune erreur**

I’m seeking advice on the best way to integrate an opponent into this environment to enhance the agent’s learning. How do you usually handle opponent actions in such environments? Should I implement a basic AI for the opponent, or are there standard strategies I could employ to make training more realistic and effective?

Any help or ideas on how to improve the situation would be greatly appreciated. I’m also open to sharing more details if needed.

Thank you in advance for your time and expertise!

Topic		Replies	Views
Using Deep Q Network for Game Unsupervised Learning, Recommenders, Reinforcement week-3	1	492	November 30, 2022
Want more on DQN, basically on multi-agent Unsupervised Learning, Recommenders, Reinforcement week-3	4	528	November 3, 2022
Tried to make a Deep Q-learning script from scratch using tensorflow Unsupervised Learning, Recommenders, Reinforcement week-4	5	192	May 30, 2024
Need help with my qlearning reinforced learning game playing agent AI Discussions ai-discussions , project	1	16	November 12, 2024
Please help me with reinforcement learning Unsupervised Learning, Recommenders, Reinforcement the-batch , ai-discussions , langchain	1	42	October 12, 2024

Seeking Assistance with Training a DQN Agent in a Two-Player Board Game Environmenta

Related topics