Seeking Assistance with Training a DQN Agent in a Two-Player Board Game Environmenta

Hello everyone,

I’m working on a reinforcement learning project where I need to train an agent to play a two-player board game called “Force 3”. I’m using a Deep Q-Network (DQN) architecture for this purpose. My Q-network has dense layers and is set up to output a distinct Q-value for each possible action combination in the action space, which is quite extensive (3 x 9 x 9).

Here’s my challenge: despite running a significant number of training episodes (2000 episodes with 100 timesteps each), the agent’s cumulative rewards continue to decrease, suggesting that the agent is not improving its performance over time. I have reviewed and adjusted the reward function to balance rewards and penalties, but this does not seem to be enough.

I suspect the agent is not learning effectively because it’s playing “alone” without an active opponent. In the current code, the agent performs its action, receives a reward, and the game state updates, but no logic is implemented for the opponent’s actions between the agent’s turns.

Here’s a snippet from my step function in the “Force3Env.py” :

def step(self, action, move_two):
** # Convert the action (an integer) into a specific action in the game**
** action_type, row, col, target_row, target_col = self.convert_to_action_tuple(action)**

** # Apply the action in the game and retrieve the additional information**
** board, game_over, winner, success, error_message = self.force3.step((action_type, row, col, target_row, target_col), move_two)**

** # Calculate the reward**
** reward = self.calculate_reward(board, game_over, winner, action_type, row, col, target_row, target_col)**

** done = game_over**
** info = {‘winner’: winner}**
** return np.array(board).reshape(-1), reward, done, info**

And here’s a snippet from my step function in the “Force3.py”:

def step(self, action, move_two):
** if self.game_over:**
** return self.board, self.game_over, self.winner, False, “Game is over. Please reset.”**

** action_type, row, col, target_row, target_col = action**

** # Check if the action is valid**
** if not self.is_valid_move(action_type, row, col, target_row, target_col):**
** return self.board, self.game_over, self.winner, False, “Invalid move.”**

** if action_type == ‘place_round’:**
** self.board[target_row][target_col] = self.current_player**
** self.round_tokens_placed[self.current_player] += 1**
** elif action_type == ‘move_square’:**
** self._move_square(row, col, target_row, target_col, move_two)**
** elif action_type == ‘move_round’:**
** self._move_round(row, col, target_row, target_col)**

** # Search for a winner**
** self.game_over, self.winner = self.check_winner()**

** # Change the current player**
** if not self.game_over:**
** self.current_player = -self.current_player**

** return self.board, self.game_over, self.winner, True, None # Aucune erreur**

I’m seeking advice on the best way to integrate an opponent into this environment to enhance the agent’s learning. How do you usually handle opponent actions in such environments? Should I implement a basic AI for the opponent, or are there standard strategies I could employ to make training more realistic and effective?

Any help or ideas on how to improve the situation would be greatly appreciated. I’m also open to sharing more details if needed.

Thank you in advance for your time and expertise!