Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Servers and Cloud Computing blog Scaling GenAI Infrastructure with proteanTecs and Arm’s Neoverse CSS
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • Neoverse CSS
  • Arm Total Design
  • Server and Infrastructure
  • infrastructure
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Scaling GenAI Infrastructure with proteanTecs and Arm’s Neoverse CSS

Marc Meunier
Marc Meunier
October 2, 2025
7 minute read time.
This blog is co-authored by Ziv Paz, VP of Business Development, proteanTecs and Marc Meunier, Director Hardware Ecosystem, Infrastructure, Arm. 

AI and datacenter systems are being pushed to their limits, with soaring complexity, nonstop inference workloads, and rising energy demands. Addressing these pressures requires more than incremental improvements, it calls for collaboration across the ecosystem. That’s why proteanTecs has joined forces with Arm, bringing our real-time monitoring technology into Arm’s Neoverse Compute Subsystems (CSS). Successful integration brings a customer-ready solution - designed to accelerate power efficiency, performance, and reliability at scale.

Challenges Facing Next-Gen AI Infrastructure

The cloud AI landscape is at an inflection point. Explosive growth in model complexity, inference demand, and system scale has strained the very fabric of compute infrastructure. Training runs that once required thousands of GPUs now demand tens of thousands, with costs reaching hundreds of millions of dollars. Inference, once considered “easier,” now drives massive daily workloads that push energy budgets and hardware reliability to the brink.

  • Power efficiency: AI data centers will consume over 90 TWh annually by 2026. Excessive voltage guard bands, designed for worst-case scenarios, drive unnecessary energy waste.
  • Performance at scale: Even small throughput inefficiencies cascade at hyperscale. A 10% gain in throughput can reduce training times by weeks and save millions in infrastructure costs
  • Reliability and resilience: Silent Data Corruption (SDC) is an invisible risk. A single undetected error can corrupt weights across thousands of GPUs, invalidating billion-dollar training runs.

For hyperscalers, the stakes are clear: every watt saved, every percentage of performance reclaimed, and every silent error prevented translates into millions of dollars and competitive advantage.

Meeting these challenges requires more than node upgrades or incremental optimizations. It demands in-situ visibility into how chips behave under real workloads and operating conditions, and the ability to act on that knowledge in real time.

Growth in transistor density versus the PFLOPS

Growth in transistor density versus the PFLOPS required to train AI models from a 2021 baseline. By 2024, AI compute requirements surged by 6847%, while transistor density grew by only 183%. 2025 value is based on the projected PFLOPS required to train GPT-5. Source: Mollick, E. (2024). Scaling: The state of play in AI. One Useful Thing.

Deep Data Needed to Face these Challenges

Current methods for optimizing performance, power, and reliability all share the same blind spot: they don’t see how chips behave under actual workloads in the field. GenAI cloud operators pay for this lack of real-time visibility through higher power draw, lower throughput, and increased risk of failure. Performance tuning relies on static margins. Power controls are triggered by basic telemetry. Reliability checks happen too late, after failure is already underway. None of these approaches adapts to actual stress and environmental conditions during live operation.

That’s the gap.

proteanTecs closes this gap by providing deep data monitoring solutions that give system designers and operators unprecedented visibility into chip health and performance throughout the lifecycle.

The technology delivers a complete monitoring solution spanning silicon to system. At the hardware level, an on-chip HW IP Monitoring System combines lightweight Agents with built-in infrastructure for seamless access, control, and integration, enabling deep visibility from within the silicon. Complementing this are advanced EDA-based integration and implementation tools that ensure high coverage and smooth deployment with no design impact. On top of the hardware, a suite of machine learning–driven software applications run in the field and in real time, providing predictive monitoring.

By embedding Agents within the silicon, we enable performance improvements, power reduction, and diagnostics throughout the device’s mission.

The on-chip Agents provide parametric measurements in-situ and in functional mode, to detect timing issues, operational and environmental effects, aging and application stress. Among the suite of Agents are the Margin Agents that monitor timing margins of millions of real paths for more informed decisions. Margin Agents provide very high coverage of the design’s logic and monitor the real performance-limiting paths that traditional methods often miss. The real performance-limiting (minimum voltage or maximum frequency) paths are ensured to be covered for all devices in the process distribution, and for all the operating conditions and functional workloads.

 Proteantecs - Critical Path Monitoring

Unlike canary circuits (right, in yellow), proteanTecs uses
on-chip Margin Agents (left, in blue) that monitor true critical paths.

 

proteanTecs and Arm CSS: Customer-Ready Integration

Now, in collaboration with Arm, we’re bringing these capabilities directly into the heart of next-generation datacenter and AI infrastructure. As part of Arm Total Design, proteanTecs has successfully integrated its monitoring solutions into Arm’s Neoverse Compute Subsystems (CSS). This milestone means our Agent integration is validated, and optimized for Neoverse CSS, enabling mutual customers to benefit from seamless integration into their custom SoCs.

This milestone means:

  • Customer-ready integration: proteanTecs monitoring solutions are now natively available within Neoverse CSS-based custom SoCs.
  • Preferential access: As a member of Arm Total Design, proteanTecs gains early access to Neoverse CSS, enabling deep integration and joint validation.
  • Faster time-to-market: Mutual customers benefit from seamless adoption - cutting integration effort, validation cycles, and deployment risk.

The result: system designers can bring powerful AI/datacenter SoCs to market faster, with embedded visibility, power/performance optimization, and reliability monitoring built-in.

Demonstrating Coverage, Efficiency, and Seamless Integration

The integration of proteanTecs monitoring solutions into Arm’s Neoverse CSS has now been validated in practice, and the results underscore the value of a customer-ready reference design.

In this implementation - in an advanced process node, 200 Margin Agents (MAs) were integrated and implemented in one of the most advanced Arm Neoverse CPU core.  proteanTecs proprietary algorithms, part of proteanTecs EDA tools, provide the decision on which endpoints should be monitored by each Margin Agent. This ensures that the true performance-limiting paths are monitored.

This strategic monitoring achieved a coverage result of 96.63% (based on proteanTecs proprietary coverage metrics), a level of visibility that allows customers to make confident, data-driven decisions. For more information about proteanTecs’ coverage methodology, customers are encouraged to reach out to our support team.

 Parameter

Baseline

With proteanTecs

Comments

Number of added Margin Agents

0

200

 

proteanTecs coverage metric

0

96.63%

 

proteanTecs logic standard cells % addition

0

0.44%

Negligible

Total standard cell area addition %

0

0.14%

Negligible, no block area increase 

Equally important, the addition of monitoring capability had virtually no effect on the design itself. Timing and power measurements remained stable and well within normal run-to-run variation, confirming that the integration does not compromise efficiency. Max timing and power results are shown in the table below.

Parameter

Baseline

With proteanTecs (200 MA)

Comment

WNS [ns]

-0.047

-0.053

Minor impact, run to run variation

TNS [ns]

-42

-65.8

Minor impact, run to run variation

Total power [mW] % addition

0

0.007%

Negligible power increase

No manual timing fixes were applied, so the results reflect a true Synthesis and Place-and-Route tools output, ensuring transparency and reliability in the process.

Taken together, these findings provide customers with a reference implementation that demonstrates how proteanTecs can be embedded seamlessly into high-speed designs at advanced process nodes, without introducing overhead or risk.

proteanTecs’ solution is an open architecture and can work under partner monitoring frameworks. Among the supported frameworks is the Arm System Monitoring Control Framework (SMCF), which enhances monitoring for Arm CSS solutions. You can learn more about proteanTecs’ integration with SMCF here.

Unlocking Efficiency, Performance, and Reliability

proteanTecs’ suite of applications, now enabled for Neoverse CSS, ensure datacenter operators can optimize at runtime:

AVS Pro : Workload and reliability aware, real-time power reduction - delivering up to 14% lower power with no performance loss, while extending the device RUL by ~20%. To learn more, read the white paper here.

AFS Pro : Workload and reliability aware, real-time frequency increase - capturing frequency headroom for up to 10% performance boost.

RTHM : Monitors health in real-time, flagging risks before they cascade into SDC or system failures. Read more here.

Proteantecs - Monitoring

By embedding these capabilities into Neoverse CSS-based SoCs, mutual customers gain a powerful edge: the ability to scale AI infrastructure power efficiency, performance, and reliably.

Conclusion: Real-Time Monitoring for Scalable GenAI Chips

As GenAI chips reach unprecedented levels of complexity, chipmakers need visibility into how each chip truly behaves under live workloads.

proteanTecs delivers exactly that, with a new class of in-chip monitoring and applications that dynamically tune in real-time each device for optimal efficiency, performance, and RAS. Now, through successful integration with Arm’s Neoverse Compute Subsystems (CSS) as part of Arm Total Design, proteanTecs’ real-time monitoring solutions are validated, optimized, and customer-ready. This seamless integration enables mutual customers to accelerate time-to-market while benefiting from power reduction, performance improvement, and built-in resilience at hyperscale.

Arm at OCP

To learn more, please visit the Arm booth at the 2025 OCP Global Summit , San Jose, CA, Oct 13-16, 2025.

Anonymous
Servers and Cloud Computing blog
  • Refining MurmurHash64A for greater efficiency in Libstdc++

    Zongyao Zhang
    Zongyao Zhang
    Discover how tuning MurmurHash64A’s memory access pattern yields up to 9% faster hashing performance.
    • October 16, 2025
  • How Fujitsu implemented confidential computing on FUJITSU-MONAKA with Arm CCA

    Marc Meunier
    Marc Meunier
    Discover how FUJITSU-MONAKA secures AI and HPC workloads with Arm v9 and Realm-based confidential computing.
    • October 13, 2025
  • Pre-silicon simulation and validation of OpenBMC + UEFI on Neoverse RD-V3

    odinlmshen
    odinlmshen
    In this blog post, learn how to integrate virtual BMC and firmware simulation into CI pipelines to speed bring-up, testing, and developer onboarding.
    • October 13, 2025