Persistency for Synchronization-Free Regions

July 2, 2018

3 minute read time.

Emerging non-volatile memories (NVM), such as 3D XP and STT-MRAM, offer the promises of combining the performance and byte-addressability of DRAM with the density and non-volatility of NAND. Such non-volatile memories can be revolutionary for computer systems. NVM can be used to sit between DRAM and SSD as a fast storage tier, displace DRAM as main memory for cost reduction or transformative capacity, or completely remove the storage tier by acting as both memory and storage (Figure 1). At Arm Research, we’re particularly interested in the NVM as persistent memory use case, where memory and storage are merged as one tier and no more data copying is needed between storage and memory, such use case poses interesting challenges that are worth addressing, such as ensuring always correct recovery of systems following power or system failures.

Multiple system use-cases Figure 1. Multiple system use-cases for emerging non-volatile memories.

For systems with non-volatile main memories, i.e., NVDIMM, failure atomicity guarantees systems can always recover to a consistent state following a power or system failure. Such failure atomicity can be achieved with journaling and flushing as with filesystems for storage. Similarly, with non-volatile main memories, failure atomicity can be achieved with user applications using logging, flushing, and barriers that order such operations. Logging, either undo or redo logging, ensures atomicity when a failure interrupts the last atomic operation from completion. Cache flushing ensures volatile caches do not hold persistent data from reaching the point of persistence, so persistent data won’t be lost when a sudden failure occurs. Barriers help prevent potential reordering in the memory hierarchy, as caches and memory controllers may reorder memory operations. For example, a barrier ensures the undo log copy of the data gets persisted onto the persistent memory before the data is mutated in-place, so it’s guaranteed that the last atomic operation can be rewound, should a failure occur. However, it’s non-trivial to add such failure atomicity in user applications with low-level operations such as write logging, cache flushing, and barriers [1].

PLDI 2018

The paper was presented at PLDI’18 in Philadelphia

Programming Challenges with Persistent Memory

Arm Research worked with University of Michigan on addressing the programming challenges with persistent memory, i.e., simplifying persistent programming for porting legacy applications to persistent memory while limiting the performance degradation. The work resulted in a joint paper titled “Persistency for Synchronization-Free Regions” that was presented at PLDI’18 in Philadelphia. The paper reduces developer efforts for porting legacy applications to persistent memory to recompilations only. No code rewrite is needed for multithreaded code written in C++ with lock primitives, as compilers can be instrumented to take care of failure atomicity by detecting critical sections (or synchronization-free regions) and instrumenting with undo logging. However, the convenience of failure-atomic synchronization-free regions (SFR) does not come for free, due to additional compiler passes and instrumentation code emitted in such compiler passes. The paper proposes a decoupled-SFR approach that decouples logging from each worker thread by creating a background thread that takes care of logging only with each worker thread. The decoupled-SFR performs 65% better than state-of-the-art ATLAS design as evaluated with workloads such as TPCC and TATP (Exhibit 1).

SFR compared to other state-of-the-art lock-based persistent programming models

	ATLAS [2]	NV-Threads[3]	SFR[4]
From	HP	HP	UoM & Arm
Multithread Support	Yes	Yes	Yes
Developer Effort	No rewrite. Compiler infers from locks	No rewrite. Compiler infers from locks	No rewrite. Compiler infers from C++ sync primitives
Granularity	Coarse (outermost CS)	Coarse (4 KB pages)	Fine (sync free regions as delimited by sync-ops)
Performance Overhead	<1% to 4x vs DRAM – cache flushing	>2x faster than ATLAS	65% better than ATLAS

Find out more

Use the links below to download the full paper, 'Persistency for Synchronization-Free Regions', to watch the talk, or to see the talk slides from PLDI.

Read the full paper Watch the talk Download the talk slides

References

Marathe, V.J., Seltzer, M., Byan, S. and Harris, T., 2017, July. Persistent memcached: bringing legacy code to byte-addressable persistent memory. In 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 17).
Chakrabarti, D.R., Boehm, H.J. and Bhandari, K., 2014. Atlas: Leveraging locks for non-volatile memory consistency. ACM SIGPLAN Notices, 49(10), pp.433-452.
Hsu, T.C.H., Brügner, H., Roy, I., Keeton, K. and Eugster, P., 2017, April. Nvthreads: Practical persistence for multi-threaded applications. In Proceedings of the Twelfth European Conference on Computer Systems (pp. 468-482). ACM.
Gogte, V., Diestelhorst, S., Wang, W., Narayanasamy, S., Chen, P.M. and Wenisch, T.F., 2018, June. Persistency for synchronization-free regions. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 46-61). ACM.

0 comments
0 members are here

Research Articles

HOL4 users' workshop 2025

Hrutvik Kanabar

Tue 10th - Wed 11th June 2025. A workshop to bring together developers/users of the HOL4 interactive theorem prover.
- March 24, 2025
TinyML: Ubiquitous embedded intelligence

Becky Ellis

With Arm’s vast microprocessor ecosystem at its foundation, the world is entering a new era of Tiny ML. Professor Vijay Janapa Reddi walks us through this emerging field.
- November 28, 2024
To the edge and beyond

Becky Ellis

London South Bank University’s Electrical and Electronic Engineering department have been using Arm IP and teaching resources as core elements in their courses and student projects.
- November 5, 2024

Research Articles

Persistency for Synchronization-Free Regions

Programming Challenges with Persistent Memory

SFR compared to other state-of-the-art lock-based persistent programming models

ATLAS [2]

NV-Threads[3]

SFR[4]

Find out more

References

HOL4 users' workshop 2025

TinyML: Ubiquitous embedded intelligence

To the edge and beyond