Triangles Per Second 2: A Chocolate Teapot of a Graphics Benchmark

September 11, 2013

10 minute read time.

In the first blog of this series, I claimed that as a GPU performance metric, triangles per second (aka TPS or triangle rate) is a chocolate teapot — i.e. utterly and completely useless — and talked about the difficulty of getting a consistent definition of the metric. (Without such a definition, you can't compare triangle rate numbers from different vendors.) I also gave some technical background on how GPUs draw triangles. In this installment, I'll talk about how GPU vendors exploit the definition problem to claim enormous triangle rates, while actually telling you nothing about the performance of the GPU. Finally, I'll explain why, even if you were able to measure an honest triangle rate, you wouldn't want to.

How to draw a gazillion triangles per second
OK, so you're a struggling GPU vendor who wants to claim "N gazillion triangles per second". It doesn't matter what N is, as long as it's bigger than the competition is claiming. How can you do it? There are a lot of things you can do that aren't actually cheating, but can be counted on to mislead your less sophisticated customers — who will be the only ones asking about triangle rate anyway. The tricks boil down to choosing triangles and drawing conditions that reduce computation and memory traffic to much lower levels than any real application would ever experience. Let's start with the obvious ones:

First, use a vertex shader that doesn't do anything, and put all the vertices into screen coordinates before you send them to the GPU (Unless — as is often the case — your vertex processor can do simple transform and lighting in less time than it takes the setup engine to do triangle setup. Then you can do them without affecting performance, so go nuts. Now you can proudly claim "N gazillion transformed and lit triangles per second!".
Second: If your GPU is good at culling non-visible triangles, arrange to have as many triangles culled as you think you can get away with. (If it isn't, turn off culling and advertise "N gazillion visible triangles per second!". Fifty percent back-facing makes a certain amount of sense (see previous blog), and in some games fifty percent of triangles actually end up off-screen, so culling 75% of triangles is semi-justifiable. But why stop there? In a general view of a cube, you can see three faces, but there are viewpoints from which you can only see one. Let your conscience be your guide — if you have one.
Third, make sure the fragment shader does as little work as possible. Having it assign all fragments the same color (red is nice) is a good way to do this.
Fourth, make sure the fragment shader isn't called very often. A good way to do this is to draw a lot of triangles that are hidden behind other, opaque triangles. Most GPUs have special hardware to make sure that those triangles don't generate fragments. You'll still have to do triangle setup, but you won't have to actually draw anything.

The result of these shenanigans is that you'll probably be able to cut the work load down to where the only parts of the GPU doing significant work are the setup engine and the culling logic. Of course, the program you'll be running won't draw anything a rational person would want to look at, but who cares? It'll produce a really big TPS number.

How to Cheat
The fun doesn't stop with the tricks described above — if you're willing to stop pretending you have any scruples, you can go quite a bit further.

Customers who have a graphics programmer or two on staff may be wise to the "color every fragment red" trick, so they may insist that your fragment shader do something vaguely realistic. So, change your fragment shader to read the fragment color from a texture. If you make the texture image really, really small (say two pixels square), it will fit entirely into your texture cache, so reading it will be free. If the customer figures that one out, you can make the texture bigger, but tweak the LOD bias or texture coordinates (this is magic) to get the same effect. Now you can claim "N gazillion transformed, lit, and textured triangles per second!"
Who says you have to draw your triangles using a graphics API? A well-known and highly respected GPU vendor once promised me a triangle rate that was three times what any other vendor was quoting. After about a week of email tag with their engineering staff, it became clear that to hit the claimed triangle rate, you would have to throw away their graphics driver and write directly to the hardware registers, and you would only be able to render into a tiny on-chip framebuffer. If you did that, you could draw a truly ridiculous number of triangles per second. (Or so they believed "" nobody had ever been insane enough to try it.) The highest rate you could get in real life (i.e. using the OpenGL ES driver) was, you guessed it, about one third of the rate they originally claimed.
Finally, here's my very favorite evil triangle rate trick: In the previous blog we talked about indexed drawing, where you give the hardware a list of vertices, and then define triangles in terms of their positions in the list. Suppose you hand the driver vertex list [A,B,C] and ask it to draw the triangles specified by index list [0,1,2,0,1,2,0,1,2....], where the index list has, say, thirty thousand entries in it. That will draw ten thousand triangles, right? Of course, they all happen to be the same triangle, but you don't have to tell the customer that. Any decently architected GPU will shade each vertex once, and then find the results in its post-transform cache forever after, so the vertex processor won't have to do any work. And if you set the Z-test state correctly, only the first triangle will generate any fragments, so the fragment processor won't do any work either. Your triangle rate will be limited only by your ability to read in the index list and set up the implied triangles. (There are fancier versions of this where you draw a whole screenful of triangles, so that the image and the code look more complicated. This has the advantage that you're less likely to get caught, but as long as you re-draw each triangle many times, you'll get the same effect.)

There is an extra-warm spot in Engineering Hell for people who do things like this. Sadly, it is well populated; I'm not making these tricks up, and I didn't think of them myself.

So what can you do?
I hope it's clear by now that the only way to know what a vendor's claimed triangle rate means is to get hold of the program that produced it, and read every line very carefully including any data files it uses. Unfortunately, the code will be complex and the vendor (probably) more evil-minded than you are, so you may miss something. A safer option is to write your own test program and watch the vendor run it. It isn't enough to specify the geometry, or the number of pixels per triangle, or that you want "transformed, lit, and textured" or (even funnier) "real" triangles; you have to specify absolutely everything, at which point you've effectively written the code. Modern graphics APIs, even lean-and-mean ones like OpenGL ES, provide a great deal of flexibility, and anything you don't specify can and will be used against you.

However, I have some good news! This is not a problem you have to solve. In fact, reading vendors' triangle rate code, or writing your own triangle rate benchmark, would be a complete waste of your time. Even if you had a rigorously defined, carefully monitored, unquestionably honest triangle rate for every GPU you're interested in, you wouldn't want it. That's because even an "honest" triangle rate is utterly and completely useless: a chocolate teapot. Here's why:

What are benchmarks for?
Let's remember the goal here. The whole reason for having graphics benchmarks (or any benchmarks) is to give you a way to tell which GPU (or other piece of technology) is going to meet your needs most effectively. You want to know which one will allow you to render the most complex and beautiful games, or run the most sophisticated UI, or excel in whatever other dimension pleases you. If a benchmark doesn't tell you anything about how the GPU will perform on applications you care about, it isn't useful.

As we've seen, how many triangles per second your GPU can render depends on what kind of triangles you ask it to render. And given the sophistication of modern graphics content, you can't really talk about a "typical" triangle, any more than you can benchmark a CPU by asking how long it takes to run a program. Real applications draw a wide range of triangle sizes, using a huge range of shaders. About all you can say about those triangles is that very, very few of them look anything like the kind of triangles people draw when they are measuring triangle rate. So, triangle rate doesn't tell you anything about how fast a GPU will draw the kind of triangles you are actually interested in.

Triangle Rate Considered Irrelevant
You don't have to be a graphics geek to understand this. Here's an analogy: Suppose you decide that it's taking too long to get to work in the morning. You decide to buy a faster car, so you start gathering benchmark data. You discover that, given special tires and a long enough test track, the 993 series Porsche Carrera has a top speed around 170 miles per hour (mph) — respectable, but the McLaren F1 can hit 240mph, a 40% improvement. So, will the McLaren get you to work faster?

Photos via Tom Steiner & robad0b

I hope it's clear that the top speed of the car is completely irrelevant to the question. That's because the conditions under which you measure top speed have absolutely nothing to do with the conditions under which you drive to work.

The same is true of triangles per second. Different GPU vendors quote different peak triangle rates, and the differences between them can be large; but what the numbers have in common is, they are all completely irrelevant to performance on real-world applications. That's because the conditions under which people measure peak triangle rate bear no resemblance whatever to conditions experienced when the GPU is rendering useful content.

So, you'll have to forgive me if I start to twitch when someone asks me [DRRRRRRRRRRING]

(Sorry, I have to take this — it's our Marketing director, calling from the UK. I'll only be a second.)

"Hi Ian, what's up?"
"Tom, I know you hate this question, but we've got a potential customer who wants —"
"No! Don't say it. Ian, you promised not to ask again. You know it gives me bad thoughts."
"— who wants to know how many triangles per second the Mali T604 —"
"Arrrgh! I mean, ummm... Haha! Yes! I mean no. No can do, Ian! We don't use triangles per second any more. We use chocolate teapots!"
"Excuse me?"
"Chocolate teapots per second. CTPS. It's a much better metric than triangles per second. Haha! We tessellate the Utah teapot into 4000 triangles, so 10,000 CTPS is 40 million triangles per second. We use a modified Blinn-Phong shader with a gloss map, and we add a Fresnel term because chocolate is a dielectric. Did you know that? And we also — "
"Listen, Tom, maybe we should talk about this another time — "
"Great! Bye!"
Click

OK. Deep breath. I am a leaf floating on still water; I am a tree in a silent forest. I am calm.

There! All better. Well, that's enough for today I think. If you'll excuse me, I have to lie down now. Or perhaps — perhaps I'll just make myself a nice cup of tea.

Silliness aside — I'm kind of proud of the Porsche/McLaren thing, but do you have a better way of explaining the uselessness of triangle rate? Or, do you have other favorite ways to lie, cheat, and steal your way to an enormous triangle rate? Let me know...

Hessed Choi over 12 years ago

An interesting article... yes, even though every GPU vendor understands these useless performance metrics quite well, everybody is claiming higher triangle rates as well as higher fill rates !!! simply because every vendor says it.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Mobile, Graphics, and Gaming blog

Unlock the power of SVE and SME with SIMD Loops

Vidya Praveen

SIMD Loops is an open-source project designed to help developers learn SVE and SME through hands-on experimentation. It offers a clear, practical pathway to mastering Arm’s most advanced SIMD technologies…
- September 19, 2025
What is Arm Performance Studio?

Jai Schrem

Arm Performance Studio gives developers free tools to analyze performance, debug graphics, and optimize apps on Arm platforms.
- August 27, 2025
How Neural Super Sampling works: Architecture, training, and inference

Liam O'Neil

A deep dive into a practical, ML-powered approach to temporal super sampling.
- August 12, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Triangles Per Second 2: A Chocolate Teapot of a Graphics Benchmark

Unlock the power of SVE and SME with SIMD Loops

What is Arm Performance Studio?

How Neural Super Sampling works: Architecture, training, and inference