Mobile inside-out VR Tracking, now readily available on your phone with Unity.

Introduction

VR is all about immersion, and the ability to track the user's position in space is a key element of it. However, to date, this has only been available in desktop and console VR, even though modern smartphones already incorporate the essential technology to make it possible in mobile VR too. This blog explains how to achieve inside-out tracking in mobile VR using only Unity and AR SDKs with today's handsets.

If you have ever tried a room scale VR console or desktop game then you will understand how badly I wished to implement mobile inside-out VR tracking. The problem was that there were no SDK’s and/or phones to try it. At the beginning of the year, I saw the first opportunity at CES when the news about the new ASUS release supporting Tango and Daydream became public. This ended up being an accidental leak, because as it turned out, the ASUS release was not available until June. Only then I could create my first inside-out mobile VR tracking project in Unity for Daydream using the Tango SDK. When I got it working, it was amazing to walk in the real world and see how my camera in VR also moved around virtual objects. It felt so natural and it is something you need to experience yourself.

The second chance I had to implement inside-out tracking became available when Google released the ARCore SDK. On the same day, Unity released a version supporting it. I was so excited I couldn’t wait! So, that weekend I got my second inside-out mobile VR tracking project in Unity. This time for the Samsung GearVR using the Google ARCore SDK on a Samsung Galaxy S8. This mobile device has an Arm Mali-G71 MP20 GPU capable of delivering high image quality in VR by using 8x MSAA running consistently @ 60 FPS.

This blog is intended to share my experience in developing inside-out mobile VR tracking apps and making it available to Unity developers. The Unity integration with ARCore SDKs is not yet prepared to do inside-out mobile VR tracking out of the box (or it wasn’t intended to do it) so I hope I will save you some time and pain with this blog.

I hope you will experience the same satisfaction I had when you implement your own Unity mobile VR project with inside-out tracking.  I will explain step by step how to do it with the ARCore SDK. As I implemented it on the Tango SDK first, I have also included a step by step guide for that as an additional appendix item.

Mobile inside-out VR tracking using the Google ARCore SDK in Unity

I won’t point out all the steps you need to follow to get Unity working. I assume you have Unity 2017.2.0b9 or later, and have the entire environment prepared to build Android apps. Additionally, you need a Samsung Galaxy S8. Unfortunately, you can try inside-out VR tracking based on Google ARCore only on this phone and Google Pixel and Pixel XL so far.

The first step is to download the Unity package of the Google ARCore SDK for Unity (arcore-unity-sdk-preview.unitypackage) and import it to your project. A simple project will be enough; just a sphere, a cylinder and a cube on a plain.

You will also need to download the Google ARCore service. It is an APK file (arcore-preview.apk), and you need to install it on your device.

At this point you should have a folder in your project called “GoogleARCore” containing a session configuration asset, an example, the prefabs, and the SDK.

Google ARCore folder

Figure 1. The Google ARCore SDK folders after imported in Unity.

We can now start integrating ARCore in our sample. Drag and drop the ARCore Device prefab that you will find in the Prefabs folder into the scene hierarchy. This prefab includes a First-Person Camera. My initial thought was to keep this camera that automatically converts to the VR camera when ticking the “Virtual Reality Supported” box in Player Settings. I understood later that this is a bad decision. The reason for this is that this is the camera used for AR. We mean the camera used to render the phone camera input together with the virtual objects we add to the “real world scene”. I have identified three big inconveniences so far:

  • You need to manually comment the line that calls _SetupVideoOverlay() in the SessionComponent script because if you untick the “Enable AR Background” option in the session settings asset (see Fig. 3) then the camera pose tracking doesn’t work at all.
  • You can’t apply any scale factor you may need to use to map the real world to your virtual world. You can’t always use a 1:1 map.
  • After selecting the Single-pass Stereo Rendering option, I got the left eye rendered correctly but not-so-good rendering in the right eye. Single-pass Stereo Rendering is something we need to use, to reduce the load on the CPU and accommodate the additional load that ARCore tracking brings.

So, we will use our own camera. As we are working on a VR project, place the camera as a child of a game object (GO); so we can change camera coordinates according to the tracking pose data from the ARCore subsystem. It is important to note here that the ARCore subsystem provides the camera position and orientation, but I decided to use only the camera position and let the VR subsystem to work as expected. The head orientation tracking the VR subsystem provides is in sync with the timewarp process and we don’t want to disrupt this sync.

The next step is to configure the ARCore session to use exclusively what we need for tracking. Click on the ARCore Device GO and you will see in the Inspector the scripts attached to it as in picture below:

ARCore Device Inspector

Figure 2. The AR Core Device game object and the scripts attached to it.

Double click on Default SessionConfig to open the configuration options and untick the “Plane Finding” and “Point Cloud” options as we don’t need them as they add a substantial load on the CPU. We need to leave “Enable AR Background” (passthrough mode) ticked in options otherwise the AR Session component won’t work and we won’t get any camera pose tracking.

ARCore Session Config

Figure 3. The session settings as we need to set.

The next step is to add our own ARCore controller. Create a new GO ARCoreController and attach to it the script HelloARController.cs we will borrow from the GoogleARCore/HelloARExample/Scripts folder. I renamed it to ARTrackingController and remove some items we don’t need.  My ARCoreController looks as the picture below. I have also attached to it a script to calculate the FPS.

ARCore Controller

Figure 4. The ARCoreController GO.

The Update function of the ARTrackerController script will look like below:

public void Update ()
{
    _QuitOnConnectionErrors();
    if (Frame.TrackingState != FrameTrackingState.Tracking) {
        trackingStarted = false;  // if tracking lost or not initialized
        m_camPoseText.text = "Lost tracking, wait ...";
        const int LOST_TRACKING_SLEEP_TIMEOUT = 15;
        Screen.sleepTimeout = LOST_TRACKING_SLEEP_TIMEOUT;
        return;
    }
    else {
        m_camPoseText.text = "";
    }
    Screen.sleepTimeout = SleepTimeout.NeverSleep;
    Vector3 currentARPosition = Frame.Pose.position;
    if (!trackingStarted)
    {
        trackingStarted = true;
        m_prevARPosePosition = Frame.Pose.position;
    }
    //Remember the previous position so we can apply deltas
    Vector3 deltaPosition = currentARPosition - m_prevARPosePosition;
    m_prevARPosePosition = currentARPosition;
    if (m_CameraParent != null) {
        Vector3 scaledTranslation = new Vector3 (m_XZScaleFactor * deltaPosition.x, m_YScaleFactor * deltaPosition.y, m_XZScaleFactor * deltaPosition.z);
        m_CameraParent.transform.Translate (scaledTranslation);
        if (m_showPoseData) {
            m_camPoseText.text = "Pose = " + currentARPosition + "\n" + GetComponent<FPSARCoreScript> ().FPSstring + "\n" + m_CameraParent.transform.position;
        }
    }
}

 

public void Update ()
{
    _QuitOnConnectionErrors();
    if (Frame.TrackingState != FrameTrackingState.Tracking) {
        trackingStarted = false;  // if tracking lost or not initialized
        m_camPoseText.text = "Lost tracking, wait ...";
        const int LOST_TRACKING_SLEEP_TIMEOUT = 15;
        Screen.sleepTimeout = LOST_TRACKING_SLEEP_TIMEOUT;
        return;
    }
    else {
        m_camPoseText.text = "";
    }
    Screen.sleepTimeout = SleepTimeout.NeverSleep;
    Vector3 currentARPosition = Frame.Pose.position;
    if (!trackingStarted)
    {
        trackingStarted = true;
        m_prevARPosePosition = Frame.Pose.position;
    }
    //Remember the previous position so we can apply deltas
    Vector3 deltaPosition = currentARPosition - m_prevARPosePosition;
    m_prevARPosePosition = currentARPosition;
    if (m_CameraParent != null) {
        Vector3 scaledTranslation = new Vector3 (m_XZScaleFactor * deltaPosition.x, m_YScaleFactor * deltaPosition.y, m_XZScaleFactor * deltaPosition.z);
        m_CameraParent.transform.Translate (scaledTranslation);
        if (m_showPoseData) {
            m_camPoseText.text = "Pose = " + currentARPosition + "\n" + GetComponent<FPSARCoreScript> ().FPSstring + "\n" + m_CameraParent.transform.position;
        }
    }
}

I removed everything but the checking of connection errors and the right tracking state. I have replaced the original class members by the ones below:

public Text m_camPoseText;
public GameObject m_CameraParent;
public float m_XZScaleFactor = 10;
public float m_YScaleFactor = 2;
public bool m_showPoseData = true;
private bool trackingStarted = false;
private Vector3 m_prevARPosePosition;

You then need to populate the public members in the Inspector. The camPoseText is used to show in the screen some data for debugging as errors, when tracking is lost, the phone camera position obtained from the Frame and the virtual camera position after applying the scale factors.

As I mentioned before, you hardly will be able to always map your real environment one to one to the virtual scene, and this is the reason I have introduced a couple of scaling factors for the movement on the XZ plane and in the Y axis (up-down).

The scale factor depends of the virtual size (vSize) we want to walk through and the actual space we can use in the real world. If the average step length is 0.762 m and we know we have room in the real world to do only nSteps, then a first approximation to the XZ scale factor will be:

scaleFactorXZ = vSize / (nSteps x 0.762 m)

I kept the _QuitOnConnectionErrors() class method and only changed the message output to use the Text component m_camPoseText.

private void _QuitOnConnectionErrors()
{
    // Do not update if ARCore is not tracking.
    if (Session.ConnectionState == SessionConnectionState.DeviceNotSupported){
        m_camPoseText.text = "This device does not support ARCore.";
        Application.Quit();
    }
    else if (Session.ConnectionState == SessionConnectionState.UserRejectedNeededPermission){
        m_camPoseText.text = "Camera permission is needed to run this application.";
        Application.Quit();
    }
    else if (Session.ConnectionState == SessionConnectionState.ConnectToServiceFailed){
        m_camPoseText.text = "ARCore encountered a problem connecting. Please start the app again.";
        Application.Quit();
    }
}

After all this work, your hierarchy (besides your geometry), should look like in the picture below:

ARCore Component Hierachy

Figure 5. The needed ARCore game objects as listed in the hierarchy.

As in my project, the camera is colliding with some chess pieces in a chess room (this is an old demo I use every time I need to show something quick) I have added a CharacterController component to it.

At this point we are almost ready. We just need to set up the player settings. Besides the standard settings we commonly used for Android, Google recommends:

   Other Settings > Multithreaded Rendering: Off

   Other Settings > Minimum API Level: Android 7.0 or higher

   Other Settings > Target API Level: Android 7.0 or 7.1

   XR Settings > ARCore Supported: On

Below you can see a capture of my XR Settings. It is important to set the Single-pass option to reduce the number of draw calls we issue to the driver (almost halved).

The XR Settings

Figure 6. The XR Settings.

If you build your project following the above described steps you should get the mobile VR inside-out tracking working. For my project, the picture below was my rendering result. The first line of text shows the phone camera position in the world supplied by Frame.Pose. The second line shows the FPS, and the third line shows the position of the VR camera in the virtual world.

Although the scene is not very complex in terms of geometry, the chess pieces are rendered with reflections based on local cubemaps, there are camera-chess pieces and chess pieces – chess room collisions. I am using 8x MSAA to achieve high image quality. Additionally, the ARCore tracking subsystem is running and all this on the Samsung S8 CPU and Arm Mali-G71 MP20 GPU render the scene at a steady 60 FPS.

Screenshot from Samsung Galaxy S8 running VR in developer mode with inside-out tracking

Figure 7. A screenshot from a Samsung Galaxy S8 running VR in developer mode with inside-out tracking.