Mobile manipulation in dynamic indoor scenes

DREAM: Dynamic Resilient Spatio-Semantic Memory with Hybrid Localization for Mobile Manipulation

Zhijie Yan1 Shufei Li2 Ze Zhang1 Xin Liu1 Yuhang Zheng3 Zuoxu Wang1
1Beihang University 2City University of Hong Kong 3National University of Singapore

DREAM couples online spatio-semantic memory, hybrid localization, task-oriented navigation, and robust manipulation so a mobile robot can work in previously unseen environments while targets move and the scene changes.

DREAM framework executing a dynamic mobile manipulation task
DREAM actively acquires targets, reacquires relocated objects, and updates task-relevant scene memory during long-horizon manipulation.

Abstract

Reliable mobile manipulation in dynamic indoor environments requires a 3D semantic representation that remains consistent with the evolving real world. Most existing systems rely on pre-built maps, assume static environments, or presuppose highly accurate camera poses; when these assumptions break, navigation and manipulation operate on stale information.

DREAM is a mobile manipulation framework for previously unseen indoor environments without any pre-built map. It integrates a lightweight indoor LiDAR-Inertial-Visual SLAM backend with dynamic spatio-semantic memory, Redundancy-Aware Memory Pruning, hybrid localization, task-oriented navigation, and robust grasping and placement strategies.

Highlights

01

No Pre-Built Map

DREAM operates in dynamic, previously unseen indoor environments from a natural-language instruction.

02

Dynamic Memory

Online spatio-semantic memory updates voxelized 3D semantics and prunes stale redundant information.

03

Hybrid Localization

Multi-sensor SLAM and visual loop closure provide temporally consistent poses in dynamic scenes.

04

Real Robot Tasks

The system is evaluated on navigation, pickup, placement, and long-horizon mobile manipulation.

Primary videos

Combination Demos, Extended Behaviors, and SLAM Test

The two combination videos, the more-videos reel, and the SLAM test are the main viewing layer. Each combination video merges three synchronized perspectives so the full task can be understood at a glance, while the SLAM test shows the localization foundation for memory construction and downstream mobile manipulation.

Main demo 01

Combination View: Scenario 01

A compact view of the robot, localization stream, and server-side state for the first long-horizon manipulation sequence.

Main demo 02

Combination View: Scenario 02

A second complete run that highlights dynamic scene updates, reacquisition, and manipulation under changing target layouts.

More videos

Additional Dynamic Manipulation Results

Extended behaviors and supplementary runs for understanding how DREAM responds across varied navigation and manipulation conditions.

Robust SLAM

SLAM Test in a 100 x 50 m Scene

With camera, LiDAR, and mobile-base calibration transforms measured only from CAD files, the SLAM system still achieves high localization accuracy. This robust localization is the core support for memory construction and later mobile manipulation.

Sub videos

Inside Each Combination Video

These clips are supporting views for the two combination videos. They isolate the third-person recording, SLAM/localization behavior, and server-side interface so each component in the merged demo is easier to inspect.

01

Scenario 01 Breakdown

Three source views used to explain the first combination demo.

Third-person robot execution
SLAM and localization view
Server-side memory and task state
02

Scenario 02 Breakdown

Three source views used to explain the second combination demo.

Third-person robot execution
SLAM and localization view
Server-side memory and task state

Method

Dynamic Memory, Hybrid Localization, and Task-Oriented Navigation

DREAM is organized around a closed loop across perception, memory, localization, navigation, and manipulation.

Hardware platform: AgileX Ranger Mini 3.0 mobile base, UFACTORY xArm6 manipulator, wrist-mounted Intel RealSense D435i RGB-D camera, and Livox MID-360 LiDAR with a built-in IMU.

Overview of the DREAM system architecture
System overview. DREAM builds and updates dynamic spatio-semantic memory while planning actions around task-relevant objects. The robot CAD assets are available in the repository docs directory, including ROBOT.jpg and ROBOT.zip exported from SolidWorks.
Hybrid localization pipeline in DREAM
Hybrid localization reacquires moved targets and keeps robot pose estimates aligned with changing scenes.
Task-oriented navigation in DREAM
Task-oriented navigation prioritizes uncertain or task-relevant regions and selects manipulation-aware docking poses.

Experiments

Real-World Dynamic Mobile Manipulation

DREAM is deployed on a real mobile manipulation robot in four dynamic indoor laboratory environments and is evaluated on navigation, pickup, placement, and long-horizon task success.

83.8-89.2%

Navigation success

93.8-94.4%

Pickup success

91.7-93.3%

Place success

55.0-70.0%

Long-horizon success

Across four scenes, DREAM improves long-horizon success by 10-20 percentage points over the DynaMem baseline while using less spatio-semantic memory and computation.