I have a question about how to get the maximum calculation capability of NEON. In our video processing application, we should access several frame video. Then if the video is HD…
Matthew Gretton-Dann titled this presentation: Porting & Optimising Code 32-bit to 64-bit
The title is accurate but he does a better job of giving a high level overview of the ARMv7 and ARMv8 architecture differences, C++11 memory models (which become…
Root of Trust implementation – Connected devices with authentication requirements need a root of trust in the system architecture. This is particularly important for devices that can be updated over the air. In a system with TrustZone technology, code…
This is the second in a two-part post from Arm Principal Researcher Dr Nathan Chong on his joint research with Tyler Sorensen (Imperial College London) and Dr John Wickerson (Research Fellow, Imperial College London), and published as a PLDI 2018 Distinguished…
This is the first in a two-part post from Arm Principal Researcher Dr Nathan Chong on his joint research with Tyler Sorensen (Imperial College London) and Dr John Wickerson (Research Fellow, Imperial College London). This research was published as a PLDI…
In my previous posts, I have introduced the concept of memory access ordering and discussed barriers and their implementation in the Linux kernel. I chose to do it in this order because I wanted to start by communicating the underlying concepts before…
Ideally, caches act as some magic make-it-go-faster logic, sitting between your processor core (or cores) and your memory bank. Whilst it can be beneficial to consider specific cache features when writing some performance-critical code, it is usually…