Emerging non-volatile memories (NVM), such as 3D XP and STT-MRAM, offer the promises of combining the performance and byte-addressability of DRAM with the density and non-volatility of NAND. Such non-volatile memories can be revolutionary for computer systems. NVM can be used to sit between DRAM and SSD as a fast storage tier, displace DRAM as main memory for cost reduction or transformative capacity, or completely remove the storage tier by acting as both memory and storage (Figure 1). At Arm Research, we’re particularly interested in the NVM as persistent memory use case, where memory and storage are merged as one tier and no more data copying is needed between storage and memory, such use case poses interesting challenges that are worth addressing, such as ensuring always correct recovery of systems following power or system failures.
Figure 1. Multiple system use-cases for emerging non-volatile memories.
For systems with non-volatile main memories, i.e., NVDIMM, failure atomicity guarantees systems can always recover to a consistent state following a power or system failure. Such failure atomicity can be achieved with journaling and flushing as with filesystems for storage. Similarly, with non-volatile main memories, failure atomicity can be achieved with user applications using logging, flushing, and barriers that order such operations. Logging, either undo or redo logging, ensures atomicity when a failure interrupts the last atomic operation from completion. Cache flushing ensures volatile caches do not hold persistent data from reaching the point of persistence, so persistent data won’t be lost when a sudden failure occurs. Barriers help prevent potential reordering in the memory hierarchy, as caches and memory controllers may reorder memory operations. For example, a barrier ensures the undo log copy of the data gets persisted onto the persistent memory before the data is mutated in-place, so it’s guaranteed that the last atomic operation can be rewound, should a failure occur. However, it’s non-trivial to add such failure atomicity in user applications with low-level operations such as write logging, cache flushing, and barriers [1].
The paper was presented at PLDI’18 in Philadelphia
Arm Research worked with University of Michigan on addressing the programming challenges with persistent memory, i.e., simplifying persistent programming for porting legacy applications to persistent memory while limiting the performance degradation. The work resulted in a joint paper titled “Persistency for Synchronization-Free Regions” that was presented at PLDI’18 in Philadelphia. The paper reduces developer efforts for porting legacy applications to persistent memory to recompilations only. No code rewrite is needed for multithreaded code written in C++ with lock primitives, as compilers can be instrumented to take care of failure atomicity by detecting critical sections (or synchronization-free regions) and instrumenting with undo logging. However, the convenience of failure-atomic synchronization-free regions (SFR) does not come for free, due to additional compiler passes and instrumentation code emitted in such compiler passes. The paper proposes a decoupled-SFR approach that decouples logging from each worker thread by creating a background thread that takes care of logging only with each worker thread. The decoupled-SFR performs 65% better than state-of-the-art ATLAS design as evaluated with workloads such as TPCC and TATP (Exhibit 1).
From
HP
UoM & Arm
Multithread Support
Yes
Developer Effort
No rewrite. Compiler infers from locks
No rewrite. Compiler infers from C++ sync primitives
Granularity
Coarse (outermost CS)
Coarse (4 KB pages)
Fine (sync free regions as delimited by sync-ops)
Performance Overhead
<1% to 4x vs DRAM – cache flushing
>2x faster than ATLAS
65% better than ATLAS
Use the links below to download the full paper, 'Persistency for Synchronization-Free Regions', to watch the talk, or to see the talk slides from PLDI.
Read the full paper Watch the talk Download the talk slides