Introducing SLAM

In my previous blog on the history (and future) of augmented reality, I introduced SLAM – Simultaneous Localization and Mapping. SLAM is one of a number of key areas that will be made possible or further innovated through Augmented Reality (AR) and Virtual Reality (VR). There are other areas including graphics, haptics, display, connectivity, audio and Artificial Intelligence (AI), but to me SLAM promises to be the most transformative.

But what is it? And why does SLAM promise to be so transformative? Firstly, let me first explain what SLAM is and how it works.

Positioning and orientation

SLAM is a series of complex computations and algorithms which use sensors to construct maps and structures in unknown environments and localize the device position and orientation. Cameras and Inertial Measurement Units (IMUs) are available in 80 percent of smartphones globally, but it’s mainly higher-end smartphones that can actually run SLAM today, which is a smaller percentage of all smartphones overall.

From a technical standpoint, SLAM is essentially a procedure that estimates the position and orientation of a device and builds a map of the environment using a camera input feed and IMUs (accelerometer and gyroscope) readings. Being able to determine the accurate location of a device in an unknown environment brings a lot of opportunities for developers to create new use cases and applications.

How SLAM works

So how does the location determined by SLAM work on today’s smartphones? If you look at the picture below, the smartphone is building information about the environment through gathering data from IMU sensors and processing a video feed from the monocular camera.

 Smartphone SLAM

IMUs provide information about the actual movements of the smartphone, but unfortunately not accurately enough to rely only on their readings. This limitation is overcome by combining it with the camera feed that provides valuable information about the environment. The camera feed is used to detect “2D feature points” within the current frame and then attempts to correlate them with the points found in the previous frame. The points are likely be found in different positions from frame-to-frame, with this being the essence of the algorithm. Based on these relative points and the IMUs readings, the algorithm can work out the 3D position and orientation of the device with an accuracy to the centimeter range.

IMUs work at a higher frequency than the camera, with this helping to compensate for sudden camera movements that would otherwise provide blurry images. The magic of SLAM is that by combining IMUs with the camera feed it achieves an accuracy higher than the ones associated with the camera and IMUs separately.

As a result, the SLAM algorithmic pipeline also provides the 3D position of the 2D feature points that the system has been tracking. This set of 3D points is known as a “point cloud”. The pipeline described above can also provide a “sparse point cloud” which we can then use to extract planes from the environment. Obviously, the devil is in the detail and it is more complicated than what I’ve described, but this is, in essence, how “Sparse SLAM” works.

 Sparse SLAM flowerpot point cloud

Through having an accurate position and orientation of the device provided by SLAM and knowing the camera parameters, it is then possible to render 3D virtual objects on top of the camera feed as “they were there”. This allows for the enhancement of the real environment through virtual objects and features. As shown in the picture below, the virtual flower pot is rendered on top of the real flower pot. When the user moves around the room, the virtual wooden pot remains exactly in the position where it had been originally placed. This could provide a number of handy use cases for retail, for example, consumers could use SLAM to establish whether a wooden pot would be a good purchase through the virtual simulation.

 Virtual flower pot SLAM

"Sparse SLAM" to "Dense SLAM"

The features described above are all part of “Sparse SLAM” but looking to the future it is likely that mapping will evolve to “Dense SLAM”, with the simple planes moving towards more accurate geometrical representations of the real world. In my opinion, this is where the real fun with AR can begin, as developers are able to not only mix but also modify the real-world environment against the virtual. For example, as shown in the image below, users can replace the real wall with a virtual wall made of bricks. There are plenty of other examples of mixing the real world with the virtual, such as cutting out real objects from the environment and replacing them with different shapes of virtual objects. There is also huge potential for gaming innovation – just imagine seeing a zombie smashing down a real wall and coming out if it! Terrifying, but exciting at the same time.

 Dense SLAM flowerpot

Looking even further ahead, the ultimate goal for SLAM is rendering a virtual object in the real environment, so no one can tell what is real and what is virtual. This is something that won’t happen overnight, but is likely to be the long-term future goal for those working in the AR and VR industries.

Both Sparse and Dense SLAM are technically possible today, but Sparse SLAM is far more common. Currently, Sparse SLAM is a method that is present on today’s high-end smartphones based on monocular RGB cameras and IMU sensors. Whereas Dense SLAM can be found in more AR dedicated devices such as HoloLens, which has four cameras for tracking user positions, a Time of Flight sensor and IMU sensors. Dense SLAM is likely to be the future, as this will enable the ultimate AR experience. However, it won’t be enabled in all AR devices overnight, so it will be an interesting journey of adoption from Sparse towards Dense SLAM.

SLAM methods by complexity

Future use cases

A big focus for SLAM will be mobile devices. However, the technology is not just for mobile, as it will transcend across a number of different devices such as cars, robots, smartphones, VR headsets, laptops, etc. All these devices have different variations of inputs needed for SLAM, so there are likely to be a significant amount of different SLAM solutions in the future.

Indeed, SLAM is likely to play a pivotal role in a number of exciting use cases for businesses and consumers, such as navigation, advertising and gaming. In my next blog on SLAM, I will explore these potential use cases in more detail and highlight how it could be a truly transformative technology for users worldwide.

In the meantime, if any developers are working on exciting SLAM solutions, then I would encourage you to get in contact with me at sylwester.bala@arm.com

I’ll be presenting more about SLAM in my Techtalk at Arm TechCon on Thursday 18th October at 3.30pm.  

Register for Arm TechCon 2018

Anonymous
Graphics & Multimedia blog