Intel Labs Introduces SPEAR: An Open-Source Photorealistic Simulator for Embodied AI

Mike_Roberts · ‎12-13-2022

Mike Roberts is a research scientist at Intel Labs, where he works on using photorealistic synthetic data for computer vision applications.

Highlights:

Intel Labs collaborated with the Computer Vision Center in Spain, Kujiale in China, and the Technical University of Munich to develop the Simulator for Photorealistic Embodied AI Research (SPEAR).
This highly realistic, open-source simulation platform accelerates the training and validation of embodied AI systems in indoor domains.
SPEAR can be downloaded under an open-source MIT license.

Interactive simulators are becoming powerful tools for training embodied artificial intelligence (AI) systems, but existing simulators have limited content diversity, physical interactivity, and visual fidelity. To better serve the embodied AI developer community, Intel Labs has collaborated with the Computer Vision Center in Spain, Kujiale in China, and the Technical University of Munich to develop the Simulator for Photorealistic Embodied AI Research (SPEAR). This highly realistic simulation platform helps developers to accelerate the training and validation of embodied agents for a growing set of tasks and domains.

With its large collection of photorealistic indoor environments, SPEAR applies to a wide range of household navigation and manipulation tasks. Ultimately, SPEAR aims to drive research and commercial applications in household robotics and manufacturing, including human-robot interaction scenarios and digital twin applications.

To create SPEAR, Intel Labs worked closely with a team of professional artists for over a year to construct a collection of high-quality, handcrafted, interactive environments. Currently, SPEAR features a starter pack of 300 virtual indoor environments with more than 2,500 rooms and 17,000 objects that can be manipulated individually. These interactive training environments use detailed geometry, photorealistic materials, realistic physics, and accurate lighting. New content packs targeting industrial and healthcare domains will be released soon.

SPEAR_Blog_Fig 1 (1).jpg

Figure 1. Scenes may be cluttered with objects that can be manipulated individually. A strong impulse can be applied to all objects at the start of the simulation to create the disordered environment. Messy room configurations could serve as initial states for a cleaning task.

By offering larger, more diverse, and realistic environments, SPEAR helps throughout the development cycle of embodied AI systems, and enables training robust agents to operate in the real world, potentially even straight from simulation. SPEAR helps to improve accuracy on many embodied AI tasks, especially traversing and rearranging cluttered indoor environments. Ultimately, SPEAR aims to decrease the time to market for household robotics and smart warehouse applications, and increase the spatial intelligence of embodied agents.

Challenges in Training and Validating Embodied AI Systems

In the field of embodied AI, agents learn by interacting with different variables in the physical world. However, capturing and compiling these interactions into training data can be time consuming, labor intensive, and potentially dangerous. In response to this challenge, the embodied AI community has developed a variety of interactive simulators, where robots can be trained and validated in simulation before being deployed in the physical world.

While existing simulators have enabled rapid progress on increasingly complex and open-ended real-world tasks such as point-goal and object navigation, object manipulation, and autonomous driving, these sims have several limitations. Simulators that use artist-created environments typically provide a limited selection of unique scenes, such as a few dozen homes or a few hundred isolated rooms, which can lead to severe over-fitting and poor sim-to-real transfer performance. On the other hand, simulators that use scanned 3D environments provide larger collections of scenes, but offer little or no interactivity with objects. In addition, both types of simulators offer limited visual fidelity, either because it is too labor intensive to author high-resolution art assets, or because of 3D scanning artifacts.

Overview of SPEAR

SPEAR was designed based on three main requirements: (1) support a collection of environments that is as large, diverse, and high-quality as possible; (2) provide sufficient physical realism to support realistic interactions with a wide range of household objects; and (3) offer as much photorealism as possible, while still maintaining enough rendering speed to support training complex embodied agent behaviors.

SPEAR 2.png

Figure 2. SPEAR enables embodied AI developers to train a navigation policy on an OpenBot entirely in simulation.

Motivated by these requirements, SPEAR was implemented on top of the Unreal Engine, which is an industrial-strength open-source game engine. SPEAR environments are implemented as Unreal Engine assets, and SPEAR provides an OpenAI Gym interface to interact with environments via Python.

SPEAR currently supports four distinct embodied agents:

The OpenBot Agent provides identical image observations to a real-world OpenBot, implements an identical control interface, and has been modeled with accurate geometry and physical parameters. It is well-suited for sim-to-real experiments.
The Fetch Agent and LoCoBot Agent have also been modeled using accurate geometry and physical parameters, and each has a physically realistic gripper. These agents are ideal for rearrangement tasks.
The Camera Agent can be teleported anywhere, making it useful for collecting static datasets.

SPEAR 3.png

Figure 3. The LoCoBot Agent is suitable for both navigation and manipulation in simulation. This agent’s realistic gripper makes it ideal for rearrangement tasks.

By default, agents return photorealistic egocentric observations from camera sensors, as well as wheel encoder states and joint encoder states. Additionally, agents can optionally return several types of privileged information. First, agents can return a sequence of waypoints representing the shortest path to a goal location, as well as GPS and compass observations that point directly to the goal, both of which can be useful when defining navigation tasks. Second, agents can return pixel-perfect semantic segmentation and depth images, which can be useful when controlling for the effects of imperfect perception in downstream embodied tasks and collecting static datasets.

SPEAR currently supports two distinct tasks:

The Point-Goal Navigation Task randomly selects a goal position in the scene’s reachable space, computes a reward based on the agent’s distance to the goal, and triggers the end of an episode when the agent hits an obstacle or the goal.
The Freeform Task is an empty placeholder task that is useful for collecting static datasets.

SPEAR is available under an open-source MIT license, ready for customization on any hardware. For more details, visit the SPEAR GitHub page.