API Reference¶
Public classes and methods. Private members (leading underscore) are excluded; the protected helpers (_rand_int, _rand_elem, etc.) on MiniGridEnv are included because subclasses commonly call them in _gen_grid.
MiniGridEnv¶
minigrid.minigrid_env.MiniGridEnv
Abstract base class for all grid-world environments. Subclass it and implement _gen_grid.
Constructor¶
MiniGridEnv(
mission_space: MissionSpace,
grid_size: int | None = None,
width: int | None = None,
height: int | None = None,
max_steps: int = 100,
see_through_walls: bool = False,
agent_view_size: int = 7,
render_mode: str | None = None,
screen_size: int | None = 640,
highlight: bool = True,
tile_size: int = TILE_PIXELS,
agent_pov: bool = False,
)
Parameter |
Default |
Description |
|---|---|---|
|
— |
|
|
|
Square grid side length. Use instead of |
|
|
Grid dimensions (minimum 3×3) |
|
|
Steps before episode is truncated |
|
|
Agent can see through walls if |
|
|
FOV size in cells (odd integer ≥ 3) |
|
|
|
|
|
Shade cells outside agent FOV when rendering |
|
|
Pixels per tile |
|
|
Render from agent POV instead of top-down |
Key public attributes¶
Attribute |
Type |
Description |
|---|---|---|
|
|
The world grid |
|
|
Agent’s current position |
|
|
Direction: 0=right 1=down 2=left 3=up |
|
|
Object the agent holds |
|
|
Current mission string |
|
|
Steps taken in current episode |
|
|
Enum of available actions |
|
|
Gymnasium action space |
|
|
Gymnasium observation space |
|
|
Grid dimensions |
|
|
Max steps per episode |
|
|
Unit vector in agent’s facing direction |
|
|
Unit vector to agent’s right |
|
|
Cell directly in front of agent |
|
|
|
Methods¶
reset(seed=None, options=None) → (obs, info)¶
Resets the environment. Calls _gen_grid to rebuild the grid, then generates the first observation.
obs, info = env.reset(seed=42)
step(action) → (obs, reward, terminated, truncated, info)¶
Executes one action and returns the next observation. terminated=True when goal is reached or agent enters lava. truncated=True when max_steps is exceeded.
obs, reward, terminated, truncated, info = env.step(env.actions.forward)
place_obj(obj, top=None, size=None, reject_fn=None, max_tries=inf) → (int, int)¶
Places obj at a random empty cell within the rectangle defined by top and size. Uses rejection sampling.
Parameter |
Description |
|---|---|
|
|
|
Top-left corner |
|
|
|
|
|
Sampling limit before raising an error |
Returns the position where the object was placed.
put_obj(obj, i, j)¶
Places obj at the exact position (i, j). No rejection sampling.
place_agent(top=None, size=None, rand_dir=True, max_tries=inf) → (int, int)¶
Places the agent at a random empty cell. Ensures the agent is not facing an obstacle immediately.
Parameter |
Description |
|---|---|
|
Search rectangle (same as |
|
Randomise initial facing direction if |
gen_obs() → dict¶
Returns the current observation dict with keys "image", "direction", "mission". Useful for inspection; step and reset call this automatically.
gen_obs_grid(agent_view_size=None) → (Grid, np.ndarray)¶
Returns (view_grid, vis_mask) — the sub-grid visible to the agent and a boolean visibility mask. Useful for custom reward shaping based on what the agent can see.
get_frame(highlight=True, tile_size=TILE_PIXELS, agent_pov=False) → np.ndarray¶
Returns an RGB image (H, W, 3) of the current state. Use render_mode="rgb_array" for training; call this when you need a frame outside the standard render cycle.
agent_sees(x, y) → bool¶
Returns True if the non-empty cell at (x, y) is within the agent’s visible area.
hash(size=16) → str¶
SHA-256 hash of the current grid + agent state, truncated to size hex characters. Useful for detecting duplicate states.
pprint_grid() → str¶
Returns a human-readable string of the grid with the agent’s position marked. Useful for debugging.
Subclassing helpers¶
These protected methods are meant to be called from _gen_grid:
Method |
Description |
|---|---|
|
Random int in |
|
Random float in |
|
Random |
|
Random element |
|
|
|
Random color name from |
|
Random |
Grid¶
minigrid.core.grid.Grid
The world grid. Stored internally as a flat list; use get/set for access.
Constructor¶
Grid(width: int, height: int)
Methods¶
Method |
Description |
|---|---|
|
Returns |
|
Sets cell |
|
Deep copy |
|
Horizontal run of |
|
Vertical run of |
|
Rectangular border of walls |
|
Extract sub-grid; out-of-bounds becomes |
|
90° CCW rotation, returns new |
|
|
|
(classmethod) Reconstruct from encoded array |
|
RGB image of grid |
|
Compute visibility boolean mask |
WorldObj¶
minigrid.core.world_object.WorldObj
Base class for all objects. Subclass to create custom objects.
Constructor¶
WorldObj(type: str, color: str)
type must be a key in OBJECT_TO_IDX; color must be a key in COLOR_TO_IDX.
Attributes¶
Attribute |
Type |
Description |
|---|---|---|
|
|
Object type string |
|
|
Color name |
|
|
Nested object (used by |
|
|
Position at placement |
|
|
Current position |
Methods to override¶
Method |
Default |
Override when |
|---|---|---|
|
|
Agent should be able to walk into this cell |
|
|
Agent should be able to carry this object |
|
|
Object can hold another inside it |
|
|
Object blocks line of sight |
|
|
Object reacts to the toggle action |
|
|
Object has meaningful state beyond default |
|
— |
Always override for custom visuals |
Built-in subclasses¶
Wall(color="grey")¶
Blocks movement and vision.
Floor(color="blue")¶
Walkable decorative tile. can_overlap() = True.
Door(color, is_open=False, is_locked=False)¶
Open: walkable, transparent.
Closed: blocks movement and vision; toggle opens it.
Locked: toggle requires agent to carry a
Keyof the same color.
Key(color="blue")¶
Pickupable. Unlocks Door of matching color on toggle.
Ball(color="blue")¶
Pickupable. No other special behaviour.
Box(color, contains=None)¶
Pickupable container. Toggle replaces the box with its contents on the grid.
Goal(color="green")¶
Walkable. Stepping onto it ends the episode with positive reward.
Lava()¶
Walkable. Stepping onto it ends the episode with reward 0.
RoomGrid¶
minigrid.core.roomgrid.RoomGrid
Extends MiniGridEnv for multi-room environments.
Constructor¶
RoomGrid(
room_size: int = 7,
num_rows: int = 3,
num_cols: int = 3,
max_steps: int = 100,
**kwargs,
)
Total grid size is ((room_size-1)*num_cols + 1) × ((room_size-1)*num_rows + 1).
Attributes¶
Attribute |
Type |
Description |
|---|---|---|
|
|
Side length of each room |
|
|
Room grid dimensions |
|
|
2-D array of |
Methods¶
get_room(i, j) → Room¶
Room at column i, row j.
room_from_pos(x, y) → Room¶
Room that contains grid coordinate (x, y).
place_in_room(i, j, obj) → (WorldObj, (int, int))¶
Places an existing object inside room (i, j), avoiding walls and doors.
add_object(i, j, kind=None, color=None) → (WorldObj, (int, int))¶
Creates and places a new object in room (i, j).
Parameter |
Description |
|---|---|
|
|
|
Color name. Random if |
add_door(i, j, door_idx=None, color=None, locked=None) → (Door, (int, int))¶
Cuts a door in room (i, j)’s wall toward a neighbour.
Parameter |
Description |
|---|---|
|
0=right, 1=down, 2=left, 3=up. Random if |
|
Door color. Random if |
|
Whether door starts locked. Random if |
remove_wall(i, j, wall_idx)¶
Removes the shared wall between room (i, j) and its neighbour in direction wall_idx (0=right, 1=down, 2=left, 3=up). No door may already exist there.
place_agent(i=None, j=None, rand_dir=True) → np.ndarray¶
Places the agent in room (i, j) (random room if None).
connect_all(door_colors=COLOR_NAMES, max_itrs=5000) → list[Door]¶
Adds unlocked doors until all rooms are reachable from each other. Returns the list of doors added.
add_distractors(i=None, j=None, num_distractors=10, all_unique=True) → list[WorldObj]¶
Adds random objects to a room (or the whole grid if i/j are None) to increase difficulty.
Parameter |
Description |
|---|---|
|
No two distractors share the same |
Room¶
minigrid.core.roomgrid.Room
Returned by RoomGrid.get_room. Mostly read-only in practice.
Attribute |
Type |
Description |
|---|---|---|
|
|
Top-left grid coordinate |
|
|
Width × height in cells |
|
|
4 doors (right, down, left, up) |
|
|
Door positions |
|
|
Adjacent rooms |
|
|
Room is behind a locked door |
|
|
Objects currently in room |
Method |
Description |
|---|---|
|
Random position inside the room |
|
Whether |
Actions¶
minigrid.core.actions.Actions — IntEnum
Name |
Value |
Effect |
|---|---|---|
|
0 |
Turn counter-clockwise |
|
1 |
Turn clockwise |
|
2 |
Move one cell forward |
|
3 |
Pick up object in front |
|
4 |
Drop carried object in front |
|
5 |
Toggle/activate object in front |
|
6 |
Signal task complete (no movement) |
Constants¶
minigrid.core.constants
Name |
Type |
Description |
|---|---|---|
|
|
|
|
|
Color name → RGB array |
|
|
Color name → index (0–5) |
|
|
Index → color name |
|
|
Object type → index (0–10) |
|
|
Index → object type |
|
|
|
|
|
Direction index → |
|
|
Default pixels per tile: 32 |
Wrappers¶
minigrid.wrappers
All wrappers follow the Gymnasium Wrapper interface — wrap and unwrap freely.
import gymnasium as gym
from minigrid.wrappers import ImgObsWrapper, FullyObsWrapper
env = gym.make("MiniGrid-Empty-8x8-v0")
env = FullyObsWrapper(env)
env = ImgObsWrapper(env)
ImgObsWrapper(env)¶
Strips the observation dict and returns only the "image" array.
Before:
obsis{"image": ..., "direction": ..., "mission": ...}After:
obsisuint8 (view_size, view_size, 3)
Use when your policy only reads the image channel.
FullyObsWrapper(env)¶
Replaces the partial 7×7 FOV with the full encoded grid.
After:
obs["image"]isuint8 (width, height, 3)
RGBImgPartialObsWrapper(env, tile_size=8)¶
Replaces the encoded image with a rendered RGB agent POV.
After:
obs["image"]isuint8 (view_size*tile_size, view_size*tile_size, 3)— pixel values, not object indices
RGBImgObsWrapper(env, tile_size=8)¶
Replaces the encoded image with a rendered RGB image of the full grid.
After:
obs["image"]isuint8 (height*tile_size, width*tile_size, 3)
OneHotPartialObsWrapper(env, tile_size=8)¶
Converts the encoded image to one-hot vectors per cell.
Each cell becomes
11 (object) + 6 (color) + 3 (state) = 20bitsAfter:
obs["image"]isuint8 (view_size, view_size, 20)
FlatObsWrapper(env, maxStrLen=96)¶
Flattens the entire observation (image + one-hot mission string) into a 1-D array.
After:
obsis a singlefloat641-D Box
DictObservationSpaceWrapper(env, max_words_in_mission=50, word_dict=None)¶
Converts the mission string to a fixed-length array of vocabulary indices.
Method |
Description |
|---|---|
|
(static) Returns the default MiniGrid vocabulary dict |
|
Converts a string to a list of word indices |
ViewSizeWrapper(env, agent_view_size=7)¶
Changes the agent’s FOV size. Must be an odd integer ≥ 3.
ReseedWrapper(env, seeds=(0,), seed_idx=0)¶
Forces the environment to cycle through a fixed list of seeds on each reset(), ignoring any seed passed in. Useful for reproducible evaluation.
ActionBonus(env)¶
Adds an intrinsic bonus of 1/sqrt(count) for each (position, direction, action) triplet, encouraging exploration of novel state-action pairs.
PositionBonus(env, scale=1)¶
Adds an intrinsic bonus of 1/sqrt(count) * scale based on position visits only.
NoDeath(env, no_death_types: tuple[str, ...], death_cost: float = -1.0)¶
Prevents the episode from terminating on dangerous tiles (e.g. lava). Instead applies death_cost as a penalty.
from minigrid.wrappers import NoDeath
env = NoDeath(env, no_death_types=("lava",), death_cost=-1.0)
DirectionObsWrapper(env, type="slope")¶
Adds a "goal_direction" key to the observation.
|
Value |
|---|---|
|
|
|
|
SymbolicObsWrapper(env)¶
Fully observable grid where each cell encodes [x, y, object_idx] instead of [obj_idx, color_idx, state].
StochasticActionWrapper(env, prob=0.9, random_action=None)¶
Executes the intended action with probability prob; otherwise substitutes a random action (or random_action if provided). Simulates noisy actuators.