Cookbook¶

Common patterns for training agents and building custom environments.

Decode what the agent sees¶

The default observation image is not pixel values — it’s a 7×7×3 array of integer indices. Decode them like this:

import gymnasium as gym
import minigrid
from minigrid.core.constants import IDX_TO_OBJECT, IDX_TO_COLOR, STATE_TO_IDX

env = gym.make("MiniGrid-DoorKey-8x8-v0")
obs, _ = env.reset()
img = obs["image"]   # shape (7, 7, 3)

for y in range(img.shape[1]):
    for x in range(img.shape[0]):
        obj_idx, color_idx, state = img[x, y]
        obj   = IDX_TO_OBJECT[obj_idx]    # "wall", "door", "goal", ...
        color = IDX_TO_COLOR[color_idx]   # "red", "blue", ...
        # state: 0=open/default, 1=closed, 2=locked

The cell at img[3, 6] is always the cell directly in front of the agent.

Use pixel observations for a CNN¶

Swap the encoded image for a rendered RGB image before passing it to a CNN:

from minigrid.wrappers import RGBImgPartialObsWrapper, ImgObsWrapper

env = gym.make("MiniGrid-Empty-8x8-v0")
env = RGBImgPartialObsWrapper(env, tile_size=8)  # agent POV, rendered pixels
env = ImgObsWrapper(env)                          # drop dict, keep only image
# obs.shape is (56, 56, 3)

Use RGBImgObsWrapper instead for a top-down view of the full grid.

Make the environment fully observable¶

Remove the partial FOV and expose the entire grid:

from minigrid.wrappers import FullyObsWrapper, ImgObsWrapper

env = gym.make("MiniGrid-FourRooms-v0")
env = FullyObsWrapper(env)
env = ImgObsWrapper(env)
# obs.shape is (width, height, 3)

Add exploration bonuses¶

Wrap with ActionBonus (bonus per novel state-action pair) or PositionBonus (bonus per novel position):

from minigrid.wrappers import ActionBonus, PositionBonus

env = gym.make("MiniGrid-MultiRoom-N6-v0")
env = ActionBonus(env)          # 1/sqrt(count) on novel (pos, dir, action)
# or
env = PositionBonus(env, scale=0.5)

These bonuses accumulate with the environment reward, so no other changes are needed.

Prevent death on lava¶

Train on lava environments without terminal failures by converting lava tiles into a step penalty:

from minigrid.wrappers import NoDeath

env = gym.make("MiniGrid-LavaCrossing-S9-N1-v0")
env = NoDeath(env, no_death_types=("lava",), death_cost=-1.0)

Fix seeds for reproducible evaluation¶

Cycle through a fixed list of seeds instead of random resets:

from minigrid.wrappers import ReseedWrapper

env = gym.make("MiniGrid-Empty-8x8-v0")
env = ReseedWrapper(env, seeds=[0, 1, 2, 3, 4])
# Each call to env.reset() cycles through 0→1→2→3→4→0→...

Train with Stable Baselines 3¶

MiniGrid’s encoded image observation works with a custom feature extractor. See docs/content/training.md for a full example. Minimal setup:

import gymnasium as gym
import minigrid
from minigrid.wrappers import ImgObsWrapper
from stable_baselines3 import PPO

env = ImgObsWrapper(gym.make("MiniGrid-Empty-8x8-v0"))
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=100_000)

For pixel observations, use a CnnPolicy with RGBImgPartialObsWrapper.

Create a minimal custom environment¶

Subclass MiniGridEnv, implement _gen_grid, register with Gymnasium:

import gymnasium as gym
from minigrid.minigrid_env import MiniGridEnv
from minigrid.core.grid import Grid
from minigrid.core.world_object import Goal
from minigrid.core.mission import MissionSpace


class MyEnv(MiniGridEnv):
    def __init__(self, size=8, **kwargs):
        super().__init__(
            mission_space=MissionSpace(mission_func=lambda: "reach the goal"),
            grid_size=size,
            max_steps=4 * size * size,
            **kwargs,
        )

    def _gen_grid(self, width, height):
        self.grid = Grid(width, height)
        self.grid.wall_rect(0, 0, width, height)  # border walls
        self.put_obj(Goal(), width - 2, height - 2)
        self.place_agent()
        self.mission = "reach the goal"


gym.register(id="MyEnv-v0", entry_point=MyEnv)
env = gym.make("MyEnv-v0", render_mode="human")

Create a multi-room environment¶

Use RoomGrid to get room management for free:

from minigrid.core.roomgrid import RoomGrid
from minigrid.core.mission import MissionSpace


class TwoRoomEnv(RoomGrid):
    def __init__(self, **kwargs):
        super().__init__(
            mission_space=MissionSpace(mission_func=lambda: "reach the goal"),
            room_size=6,
            num_rows=1,
            num_cols=2,
            max_steps=100,
            **kwargs,
        )

    def _gen_grid(self, width, height):
        super()._gen_grid(width, height)          # initialises room_grid
        self.remove_wall(0, 0, 0)                 # open passage between room 0 and room 1
        _, goal_pos = self.add_object(1, 0, kind="goal")  # place goal in room 1
        self.place_agent(0, 0)                    # start agent in room 0
        self.mission = "reach the goal"

Inspect what is on the grid¶

obs, _ = env.reset()

# Pretty-print the grid to the terminal
print(env.pprint_grid())

# Check a specific cell
obj = env.grid.get(3, 3)
if obj is not None:
    print(obj.type, obj.color)

# Check if the agent can see a cell
print(env.agent_sees(5, 2))

# Get agent state
print(env.agent_pos, env.agent_dir, env.carrying)

Render a frame without a window¶

Capture an RGB image without opening a pygame window:

env = gym.make("MiniGrid-Empty-8x8-v0", render_mode="rgb_array")
obs, _ = env.reset()
frame = env.render()         # np.ndarray (H, W, 3)

# Or get agent POV only:
frame = env.get_frame(agent_pov=True)

Stochastic actions¶

Simulate noisy actuators — the agent’s intended action is ignored with probability 1 - prob:

from minigrid.wrappers import StochasticActionWrapper

env = gym.make("MiniGrid-Empty-8x8-v0")
env = StochasticActionWrapper(env, prob=0.8)

Change the agent’s field of view¶

from minigrid.wrappers import ViewSizeWrapper

env = gym.make("MiniGrid-Empty-16x16-v0")
env = ViewSizeWrapper(env, agent_view_size=11)  # wider FOV

Must be an odd integer ≥ 3.