Multiple environment resets per iteration using Ray RLlib DQN (D3QN) with a simple custom Gym environment


Hello,

I’m experimenting with Ray RLlib’s DQN (Dueling Double DQN) on a minimal custom environment, but I keep seeing many resets in a single training iteration, even though each episode completes immediately. I’ve tried tuning all the batch-size/horizon parameters but the behavior persists.


1. Custom environment

import gymnasium as gym
from gymnasium import spaces
from ray.tune.registry import register_env

class SimpleEnv(gym.Env):
    def __init__(self, config):
        self.observation_space = spaces.Box(low=0, high=1, shape=(1,), dtype=float)
        self.action_space      = spaces.Discrete(2)
        self.step_count        = 0
        self.horizon           = config.get("horizon", 1)

    def reset(self, seed=None, options=None):
        self.step_count = 0
        print("=== RESET ===")           # I see this printed many times!
        return [0.0], {}

    def step(self, action):
        self.step_count += 1
        done = self.step_count >= self.horizon
        print(f"Step: {self.step_count}, Done: {done}")
        return [0.0], 1.0, done, done, {}
        
register_env("SimpleEnv-v0", lambda cfg: SimpleEnv(cfg))
  • horizon is set to 1, so each episode should be exactly one step long.

2. DQN configuration

from ray.rllib.algorithms.dqn import DQNConfig

config = DQNConfig()

# Environment
config.environment("SimpleEnv-v0", env_config={"horizon": 1})

# Runner settings
config.env_runners(
    num_env_runners=0,
    rollout_fragment_length=1,
    batch_mode="complete_episodes"
)

# Training settings
config.training(
    dueling=True,
    double_q=True,
    train_batch_size=50,
    train_batch_size_per_learner=50,
    minibatch_size=25,
    num_steps_sampled_before_learning_starts=50,
    target_network_update_freq=1,
)

# Episode termination
config.soft_horizon = True
config.no_done_at_end = False

# API stack
config.api_stack(
    enable_rl_module_and_learner=True,
    enable_env_runner_and_connector_v2=True
)

algo = config.build_algo()
algo.train()

3. Observed output

![Many resets in one episode]

=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===

Despite using batch_mode="complete_episodes", rollout_fragment_length=1, and setting the horizon to 1, RLlib collects multiple “episodes” (and prints many === RESET ===) before each training update.


4. What I’ve tried

  • Tuned:
    • train_batch_size, train_batch_size_per_learner
    • num_steps_sampled_before_learning_starts
    • target_network_update_freq
    • Disabling/enabling RLModule API stack
  • Switched between single-agent and multi-agent replay buffers
  • Used both .env_runners(...) and .rollouts(...)

Nothing stops RLlib from issuing multiple resets per iteration.


5. Question

  1. Why does Ray RLlib reset the environment multiple times per training iteration, even with batch_mode="complete_episodes" and a horizon of 1?
  2. How can I force exactly one environment reset (one 1-step episode) per training iteration when using DQN/D3QN in RLlib?

Any pointers to the correct combination of RLlib settings (or a minimal working example) would be greatly appreciated!


  • Tuned:
    • train_batch_size, train_batch_size_per_learner
    • num_steps_sampled_before_learning_starts
    • target_network_update_freq ,and other hyperparameters
    • Disabling/enabling RLModule API stack
    • Running on minimal code on simplest custom env

Nothing stops RLlib from issuing multiple resets per iteration.

What platform are you running this on?
What version of the various packages are you using?

I runned on Python 3.11.4 on my device
,gymnasium v1.0.0
ray 2.42.1

Thanks, that info will be useful to someone who is familiar with those packages.