You may not notice them, but holograms have been present in our daily life for some time. Because they are incredibly difficult to copy, analog holograms have been used extensively as an anti-counterfeiting measure in credit cards, bank notes, driver licenses and in many other applications (figure 1).
Figure 1: Holograms to prevent counterfeiting – on a banknote (left) and on a credit card (right).
With increasing computing power and the emergence of new use cases like Augmented Reality (AR), there has been continuous research and development into display applications of digital holography. Many popular Sci-Fi movies and TV shows have been inspired by the notion of holographic display when showcasing what the future of advanced visualization might look like. Think of Stars War, Minority Report, Star Trek and many more. But is holographic display really something from the far future? In this article, we show the recent algorithmic and computational advances that demonstrate how holographic display can now be achieved on mobile processors. First let us see how classical holograms were created.
Figure 2: Classical hologram creation (left) and replay (right). Images courtesy of VividQ.
Before the computer age, analog holograms were recorded and reproduced in a similar way to music vinyl records. In this process, two laser beams light the target object (figure 2). The resulting interference pattern (a hologram), which encodes the whole (“holos”) phase information about the object, is recorded on a photo-sensitive film at a very high resolution. When lighting the film with a laser (figure 2), the diffraction that takes place reproduces a replay field, which to the eye appears as a three-dimensional image. This image is a perfect representation of the recorded object, because it retains the depth, parallax and other properties of the original scene. As a physicist, this is the process I am familiar with from when I studied holography at university some years ago.
In the previous blog, we described how Arm is collaborating with a UK start-up VividQ to enable digital holographic displays in consumer electronics. In Computer-Generated Holography (CGH), the interference patterns are created digitally from a variety of data sources, from game engines to depth-sensing cameras. They are then presented on a micro-display that is the equivalent of the photo-sensitive film in classical holograms, and similarly replays the three-dimensional image when illuminated with laser light. CGH, however, is extremely computationally intensive - it used to take days to produce a single digital holographic image. When I first visited VividQ, not far from Arm’s headquarters in Cambridge, I could not believe it. The holographic display prototype, using VividQ software for CGH, was able to project in front of me a 3D holographic video of an animated scene built in Unity, in real-time. It was just mind blowing. Let us see how this was achieved.
Traditionally in CGH, holograms were calculated with Point-Based Compute (PBC). Three-dimensional virtual objects can be represented as point clouds, which carry color and depth information. In PBC, emission of light from each point on a virtual object to each pixel of the display used to project the hologram is calculated, and the values are all summed up [1, 2]. Therefore, the compute power required for this process is huge and scales horribly with the resolution – requiring roughly O(N4) operations, where N is the side-length of the display.
It was only a few years ago when practical solutions to the computational problem of CGH were proposed [3]. The breakthrough was like the change from analog to digital audio processing. In the case of music, the digitization process samples the audio signal in fixed time intervals. Comparably, the new ideas in the field of holography work by sampling depth slices of the 3D virtual object that is to be projected. Figure 3 summarizes the slicing process and the generation of interference patterns (holograms) used for a holographic projection.
Figure 3: CGH pipeline based on FFTs.
During the slicing process, each point of the object’s point cloud is classified into depth layers. In the next step, depth layers are resampled and rasterized into depth grids. At this stage, each depth grid contains the points that are at the same depth. In practice, each depth grid can be considered as a 2D image and the points of each depth layer can be perceived effectively as pixels. This layered approach allows the CGH problem to be reduced to a Fast Fourier Transform (FFT) problem with O(N2 log N) time complexity where N is the side length of the display. Next, chromatic gridding takes place, where depth grids are separated into RGB channels. The hologram is then generated by performing the diffraction calculation on the depth grids using FFTs. Each RGB channel requires its own FFT calculation, as the rate of diffraction depends on each color wavelength. Finally, the RGB channel holograms are combined into a single-color hologram.
Since we have now generated the hologram (interference pattern), where do we record or print it to reconstruct with a laser beam and produce the holographic projection? Digital technologies have evolved in this field, with analog films being replaced by so-called Spatial Light Modulators (SLM). The SLM can dynamically display calculated diffraction patterns, and when it is illuminated with red, green and blue laser diodes, it produces the hologram projection.
Obviously, this is a very simplified version of the hologram generation process and we are not considering here the technical challenges associated with the display itself. Nevertheless, we can see that the compute power required will increase with the number of depth layers and the resolution of depth grids. As the number of layers increases, we can expect the weight of FFT calculations to be more significant. GPUs are great in parallel computing. They can exploit the full potential of the arithmetic capabilities which are in general bigger on high-end GPUs. This is the reason why today GPUs are the recommended compute unit for the hologram generation process.
During my visit to VividQ, I tried their prototype holographic headset (figure 4). VividQ has patented both its software and reference designs, looking to accelerate the mass adoption of real-time holographic display in AR headsets and smart glasses, automotive heads-up displays, and consumer electronics.
Figure 4: VividQ’s binocular holographic headset prototype with a depth sensor attached.
Today, VividQ’s holographic headset prototype is connected to a desktop computer, which runs an app rendering virtual content from a game engine like Unity or Unreal. The color and depth information from the virtual content is sent to the CGH pipeline (Fig. 3). The set-up uses information from the depth-sensing camera as well to achieve virtual-to-real and real-to-virtual realistic occlusions when projecting holograms in the environment (figure 5). FFT computation happens in the GPU.
Figure 5: VividQ’s 3D holographic projection of two objects at different distances captured on camera.
Trying VividQ’s holographic headset was an eye-opening experience as my knowledge of holography stopped with my university studies when CGH was a sci-fi subject. I realized how far the development of holographic technology has reached since the 80s. But there were still more surprises in store for me as you will see in the next section.
Moving CGH from desktop to mobile can seem like an uphill task. If achieving CGH on a desktop with a powerful CPU and GPU has been a challenge, does it make sense to think about moving hologram generation to mobile? However, if we think about it more, the move to mobile seems like the next natural step, if we want to see a wide adoption of real-time holographic displays.
A similar process has already happened in VR. If you have followed developments in this industry, you heard about HTC Vive or Oculus Rift VR headsets. They are tethered to a powerful PC and can render a virtual scene at high-resolution and frames per second (FPS) for each eye. Then, in 2018, Oculus Go was released (the first standalone VR headset), and last year its second iteration was revealed - the very successful Oculus Quest. Standalone means that it contains all the necessary components to provide VR experiences and does not need to be tethered to an external device to use. The benefits of this step are clear: no more cables, free movement with the headset and lower power consumption, all of this on a mobile SoC. We can get the same benefits from moving CGH to a mobile SoC. This brings the technology to AR’s likely future where a compact, low-power holographic display is a must. Here is where Arm’s Total Compute strategy will help greatly.
Figure 6: Different elements of a future Total Compute Overall Solution.
To meet increasing requirements in terms of computing power and power consumption, Arm is performing a strategic shift through Total Compute: from optimizing individual IPs to adopting a system level solution view of the entire SoC design (figure 6). This means that the entire system should work together seamlessly to provide maximum performance for compute-intensive workloads packed into a low power SoC envelope. This new approach will analyze how interconnected data and compute is best deployed between the different IP blocks and compute domains. It includes not only the main compute domains - CPU, GPU and NPU - but also the software frameworks and compute libraries to improve the performance across them. Meanwhile, new tools like Performance Advisor identify bottlenecks and help achieve the best performance across the whole system.
This approach is particularly beneficial for high-performance calculations such as FFTs, which are the core part of CGH. The latest Mali Premium GPUs – Mali-G78 and Mali-G77 – and the Mainstream Mali-G57 take advantage of the Arm Compute Library (ACL). This is a highly optimized collection of low-level functions including a highly efficient OpenCL accelerated implementation of FFT calculations. FFTs work in the complex domain where we can use FP32 and FP16 floating precision. Every improvement in hardware back-end performance directly translates into an increase of Multiply-Accumulate operations per second (MAC/s), and thus into an increase of FFT calculation performance. This is particularly important given that FFT calculations take up 60-90% of total compute required in a holographic display, even when accounting for pre- and post-processing operations.
At 2019 Light Field and Holographic Display Summit, Darran Milne, CEO of VividQ, shared FLOP requirements to generate a single frame of a holographic image for a specified number of target layers on a 2048x1536 display, using VividQ’s real-time algorithm available at a time (table 1).
Layers
Complexity (per frame)
2
7 GFLOP
4
12 GFLOP
8
22 GFLOP
16
42 GFLOP
Table 1: FLOP requirements for generating a single 2048x1536 frame using VividQ’s real-time algorithm.
To put those numbers into perspective, even for the 1280x720 display, with a traditional Point based Compute for CGH, the computational requirement per frame would be around 7000 GFLOP. This demonstrates how much more efficient VividQ layer-based method can be, leveraging FFTs and the relevant Arm libraries. The 1000 times reduction in computational requirements is also the result of VividQ proprietary methods beyond FFTs, including depth levels optimization and dynamic layers allocation [4]. What is important, VividQ methods optimize not only for computational requirements but also for high image quality which is necessary for holographic display applications in consumer devices. Algorithms available in VividQ Software Development Kit (SDK) are optimized for different display types, sizes, and bit-depths, as well as various image characteristics such as high contrast. Since the user or calling program can request a particular number of output layers, one can use only the required amount of compute for the given optical system and input scene. However, it is worth noting that simple scenes may only contain data at a couple of bit depths. This huge amount of flexibility allows the Arm Mali GPU running VividQ software to deliver holograms for a wide range of applications in real-time.
Let’s have a look in more detail at the Arm Mali-G76 GPU computing capabilities. In a single Mali-G76 core there are 3 execution engines with 8 threads each, capable of delivering ~ 3 FP32 instructions (MUL + ADD) per clock cycle (3x8x3=72 FLOP/cycle/core). It means that a 10 core G76 GPU running at 720 MHz in the Samsung Galaxy S10 would deliver 720x106x72x10 FLOP or ~ 518 GFLOP/s. For FP16 precision, the figure is doubled to 1.04 TFLOP/s. This is a theoretical maximum – in practice the real figure will be impacted by bandwidth limitations and ultimately by the power consumption. In the case of compute, for heavy algorithms like FFT calculations, it is possible to get a significant fraction of the theoretical maximum. Even if we only count a single FLOP per thread per cycle, and 60 percent utilization, we can hit over 100 GFLOP/s.
As we can see, in principle bandwidth should not be a problem, but in practice it may. When using a GPU, it is not possible to sustain the processing intensity required for high-resolution and complex use-cases, without entering thermal throttling and fast battery draining. Nevertheless, there are also simple applications. For example, holographic projection of text and icons in Augmented Reality devices, where the number of layers can be limited while still providing significant advantages to today’s AR displays. According to some evaluations performed by the Developer Advocacy team at Arm, who has been supporting VividQ, the computation of a single slice on a Samsung Galaxy S10 takes 8ms for 720x1280 resolution. It means a full-color single layer would require 24ms and the system would theoretically run at 40 FPS. This is the first time that CGH has been demonstrated to run real-time on a mobile GPU. VividQ has recently demonstrated that use case with their concept holographic operating system (figure 7) featuring icons, text, and familiar apps such as social media, routinely requiring only 2 depth layers.
Figure 7: VividQ’s concept holographic operating system as viewed through the headset.
But holographic display is a lot more than FFT calculations. The CPU plays a key role in content generation and other computing tasks. Different parts of the holographic system, such as display and laser drivers, must work efficiently to avoid bottlenecks and achieve a high apparent resolution of holographic images. Total Compute aims to meet all these requirements as part of an Arm system-wide approach to design that will enable the next wave of digital immersion. At the same time, the team at VividQ continues their algorithmic research to achieve an even higher-quality holograms and applications beyond AR wearables. VividQ’s proprietary hologram simulation tool (also running on a GPU) allows different optical set-ups to be simulated with a high degree of accuracy. This allows for the rapid prototyping of new optical systems without hardware experimentation, ultimately resulting in a faster journey to good-looking holographic images (figure 8). Happily, the algorithmic variations required do not significantly affect the amount of compute, so these new kinds of displays are still compatible with the Arm’s Total Compute architectures to achieve real-time performance on mobile processors.
Figure 8: Holographic images created with VividQ SDK 4.2.0 with a standard generation algorithm (b) and a high black-level procedure under development (c), with replays mathematically simulated, with regards to the target image (a).
The latest developments in CGH bring holographic display from sci-fi to reality. True depth perception delivered by holographic displays presents substantial advantages compared to today’s 3D and AR displays. With recent advances in computation methods, holographic displays can be a viable alternative for commercial application of AR, from smart glasses, to automotive HUDs and new consumer electronics. To achieve its true potential, holographic display has to move away from desktop-based compute to mobile SoCs. The collaboration between Arm and VividQ is aimed at achieving CGH on mobile processors. Here we can combine VividQ’s deep software expertise in holography, with the low-power, and high-performance Arm IP. The Arm Total Compute approach for immersive computing is committed to an integral system improvement on both performance and power consumption helping to achieve future holographic displays. VividQ SDK allows for a real-time, high-quality CGH across different end-applications.
[CTAToken URL = "https://www.arm.com/solutions/mobile-computing/ar-vr" target="_blank" text="Learn more about Arm's AR and VR solutions" class ="green"]