How does the MCTSGridWrapper determine what is an obstacle/starting position/goal?

info5ec · September 30, 2025, 12:46am

I’ve been scouring documentation trying to figure this out. I know that the MCTSGridWrapper is a project of its own, but even when looking through the source code I cannot figure out how it “knows” that the black boxes/1’s are obstacles and the end goal is 3 while starting at 2?

How does it just know this without anything that sets that up?

I appreciate that these short courses are free, and maybe it’s just me, but it seems like the information that is left out of these courses is usually the same information that would help people bridge the gap between what they don’t already know and what you’re trying to teach them.

SteveArthur · October 1, 2025, 2:31pm

Great question — you’re definitely not alone in noticing that gap.

The MCTSGridWrapper typically has these conventions hard-coded into its internal logic rather than being explicitly configured in the examples. In many grid-based environments, values like:

1 = obstacle
2 = start
3 = goal
are simply assumed defaults. The wrapper then parses the grid, identifies those values, and builds the internal state representation for the MCTS algorithm based on those fixed mappings.

This can feel like “magic” if it’s not clearly documented — especially when the course focuses on higher-level concepts and skips over the environment setup details. A quick look at the source code usually reveals a section where the wrapper scans the grid and assigns roles (start, goal, obstacles) to specific cell values.

It’s a really good point that these kinds of implicit assumptions often make it harder for newcomers to connect the dots. Surfacing this in the course material would definitely help bridge that gap.

info5ec · October 1, 2025, 4:17pm

Thanks for the response! Much appreciated. I’ll have to take another look at the source code.

Topic		Replies	Views
State-action Value Function - Video Unsupervised Learning, Recommenders, Reinforcement week-module-3	8	568	March 7, 2023
State Action Value Function Example Unsupervised Learning, Recommenders, Reinforcement week-module-3	1	492	August 3, 2022
Need to know what approach to take toward building an artificially intelligent program to play spider solitaire AI Discussions ai-discussions , project	20	682	July 12, 2024
Why do we need board_proxy agent? AI Agentic Design Patterns with AutoGen	0	70	June 6, 2024
Random (stochastic) environment Q-values question Unsupervised Learning, Recommenders, Reinforcement week-module-3	5	496	April 14, 2023

How does the MCTSGridWrapper determine what is an obstacle/starting position/goal?

Related topics