Multiple environment resets per iteration using Ray RLlib DQN (D3QN) with a simple custom Gym environment

taha_fawzy_anwar · April 26, 2025, 6:28pm

Hello,

I’m experimenting with Ray RLlib’s DQN (Dueling Double DQN) on a minimal custom environment, but I keep seeing many resets in a single training iteration, even though each episode completes immediately. I’ve tried tuning all the batch-size/horizon parameters but the behavior persists.

1. Custom environment

import gymnasium as gym
from gymnasium import spaces
from ray.tune.registry import register_env

class SimpleEnv(gym.Env):
    def __init__(self, config):
        self.observation_space = spaces.Box(low=0, high=1, shape=(1,), dtype=float)
        self.action_space      = spaces.Discrete(2)
        self.step_count        = 0
        self.horizon           = config.get("horizon", 1)

    def reset(self, seed=None, options=None):
        self.step_count = 0
        print("=== RESET ===")           # I see this printed many times!
        return [0.0], {}

    def step(self, action):
        self.step_count += 1
        done = self.step_count >= self.horizon
        print(f"Step: {self.step_count}, Done: {done}")
        return [0.0], 1.0, done, done, {}
        
register_env("SimpleEnv-v0", lambda cfg: SimpleEnv(cfg))

horizon is set to 1, so each episode should be exactly one step long.

2. DQN configuration

from ray.rllib.algorithms.dqn import DQNConfig

config = DQNConfig()

# Environment
config.environment("SimpleEnv-v0", env_config={"horizon": 1})

# Runner settings
config.env_runners(
    num_env_runners=0,
    rollout_fragment_length=1,
    batch_mode="complete_episodes"
)

# Training settings
config.training(
    dueling=True,
    double_q=True,
    train_batch_size=50,
    train_batch_size_per_learner=50,
    minibatch_size=25,
    num_steps_sampled_before_learning_starts=50,
    target_network_update_freq=1,
)

# Episode termination
config.soft_horizon = True
config.no_done_at_end = False

# API stack
config.api_stack(
    enable_rl_module_and_learner=True,
    enable_env_runner_and_connector_v2=True
)

algo = config.build_algo()
algo.train()

3. Observed output

![Many resets in one episode]

=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===
Step: 1, Done: True
=== RESET ===

Despite using batch_mode="complete_episodes", rollout_fragment_length=1, and setting the horizon to 1, RLlib collects multiple “episodes” (and prints many === RESET ===) before each training update.

4. What I’ve tried

Tuned:
- train_batch_size, train_batch_size_per_learner
- num_steps_sampled_before_learning_starts
- target_network_update_freq
- Disabling/enabling RLModule API stack
Switched between single-agent and multi-agent replay buffers
Used both .env_runners(...) and .rollouts(...)

Nothing stops RLlib from issuing multiple resets per iteration.

5. Question

Why does Ray RLlib reset the environment multiple times per training iteration, even with batch_mode="complete_episodes" and a horizon of 1?
How can I force exactly one environment reset (one 1-step episode) per training iteration when using DQN/D3QN in RLlib?

Any pointers to the correct combination of RLlib settings (or a minimal working example) would be greatly appreciated!

Tuned:
- train_batch_size, train_batch_size_per_learner
- num_steps_sampled_before_learning_starts
- target_network_update_freq ,and other hyperparameters
- Disabling/enabling RLModule API stack
- Running on minimal code on simplest custom env

Nothing stops RLlib from issuing multiple resets per iteration.

TMosh · April 26, 2025, 9:19pm

What platform are you running this on?
What version of the various packages are you using?

taha_fawzy_anwar · April 26, 2025, 9:36pm

I runned on Python 3.11.4 on my device
,gymnasium v1.0.0
ray 2.42.1

TMosh · April 26, 2025, 10:39pm

Thanks, that info will be useful to someone who is familiar with those packages.

Topic		Replies	Views
Tried to make a Deep Q-learning script from scratch using tensorflow Unsupervised Learning, Recommenders, Reinforcement week-module-4	5	193	May 30, 2024
Please help me with reinforcement learning Unsupervised Learning, Recommenders, Reinforcement the-batch , ai-discussions , langchain	1	43	October 12, 2024
Seeking Assistance with Training a DQN Agent in a Two-Player Board Game Environmenta AI Discussions ai-discussions	0	97	December 9, 2023
How to have same prediction in RL Unsupervised Learning, Recommenders, Reinforcement week-module-3	5	436	July 4, 2023
Train steps stop at 1, while it is set to 5 (UNQ_C4) NLP with Sequence Models week-module-4	5	521	March 7, 2023