SO101-Nexus
Concepts

Observations

Composable observation components, observation modes, and how to inspect them.

Every SO101-Nexus environment builds its observation from a list of observation components. Components are lightweight descriptor classes that tell the environment which data to include. State components contribute slices of a flat vector; camera components add image tensors to a dictionary observation.

Observation Components

State Components

State components produce fixed-size slices of the observation vector.

ComponentDimensionsDescription
JointPositions6Current angle of each robot joint
EndEffectorPose7TCP position (3) + quaternion orientation (4)
TargetOffset3Vector from gripper tip to goal position
GazeDirection3Unit vector from gripper toward the target object
GraspState1Binary grasp flag (1.0 = grasping, 0.0 = not)
ObjectPose7Target object position (3) + quaternion orientation (4)
ObjectOffset3Vector from gripper tip to target object
TargetPosition3Absolute goal position (x, y, z)

Camera Components

Camera components add image tensors to a dict-style observation space. They have size = 0 in the state vector.

ComponentKeyDescription
WristCamera"wrist_camera"RGB image from the camera mounted on the robot's wrist
OverheadCamera"overhead_camera"RGB image from a stationary camera above the workspace

Camera components accept width and height parameters (default 640x480). WristCamera also supports domain randomization parameters for FOV and pitch.

from so101_nexus_core import WristCamera, OverheadCamera

# Default resolution
wrist = WristCamera()

# Custom resolution with FOV randomization
wrist = WristCamera(width=224, height=224, fov_deg_range=(60.0, 90.0))

# Overhead camera
overhead = OverheadCamera(width=320, height=240, fov_deg=45.0)

Composing Observations

Pass a list of components via the observations parameter on any config. Each task provides sensible defaults when observations is not specified.

from so101_nexus_core import (
    PickConfig, JointPositions, EndEffectorPose, GraspState,
    ObjectPose, ObjectOffset, WristCamera,
)

# State-only observations (default for PickLift)
config = PickConfig(observations=[
    EndEffectorPose(),
    GraspState(),
    ObjectPose(),
    ObjectOffset(),
])

# Add a wrist camera to the observation
config = PickConfig(observations=[
    EndEffectorPose(),
    GraspState(),
    ObjectPose(),
    ObjectOffset(),
    WristCamera(width=224, height=224),
])

When the observation list contains only state components, the observation is a flat NumPy array. When it contains one or more camera components, the observation becomes a dictionary with a "state" key (flat vector from all state components) plus one key per camera (e.g. "wrist_camera", "overhead_camera").

Default Observations by Task

Each task config auto-populates observations if you don't provide one:

TaskDefault ComponentsState Dimensions
PickLiftEndEffectorPose, GraspState, ObjectPose, ObjectOffset18
PickAndPlaceEndEffectorPose, GraspState, TargetPosition, ObjectPose, ObjectOffset, TargetOffset24
ReachJointPositions6
LookAtJointPositions6
MoveJointPositions6

Observation Modes

The obs_mode config parameter controls the semantic intent of the observation:

obs_mode="state" (default)

The observation contains whatever components are listed in observations. This is useful for state-based reinforcement learning where the policy has access to ground-truth information.

obs_mode="visual"

Designed for vision-based policies. Requires at least one camera component (e.g. WristCamera() or OverheadCamera()) in the observations list. Construction raises an error if no camera component is present.

from so101_nexus_core import PickConfig, JointPositions, WristCamera

config = PickConfig(
    obs_mode="visual",
    observations=[JointPositions(), WristCamera(width=224, height=224)],
)

Inspecting Observations

import gymnasium as gym
import so101_nexus_mujoco

# State-only (default)
env = gym.make("MuJoCoPickLift-v1")
obs, info = env.reset()
print(f"Observation shape: {obs.shape}")  # (18,)
env.close()

# With camera
from so101_nexus_core import (
    PickConfig, EndEffectorPose, GraspState, ObjectPose, ObjectOffset, WristCamera,
)

config = PickConfig(observations=[
    EndEffectorPose(),
    GraspState(),
    ObjectPose(),
    ObjectOffset(),
    WristCamera(width=224, height=224),
])
env = gym.make("MuJoCoPickLift-v1", config=config)
obs, info = env.reset()
print(obs["state"].shape)          # (18,) — state components
print(obs["wrist_camera"].shape)   # (224, 224, 3) — camera image
env.close()

Choosing the Right Setup

Use Caseobs_modeComponentsWhy
State-based RL training"state"Default (no cameras)Policy uses privileged state directly
Vision-based RL training"visual"JointPositions + WristCameraPolicy learns from camera images; no ground-truth state
Multi-view vision"visual"JointPositions + WristCamera + OverheadCameraPolicy fuses multiple camera views
Debugging / visualization"state"Default + WristCameraFull state + camera for analysis

On this page