1. What is the relationship between FP32 operations/clock and Thread count? According to my understanding, FP32 operations/clock should be equal to Thread count. Does "FP32 operations/clock " mean a single shader core? If so, multiply it by the number of cores, which is indeed consistent with the thread count of some GPUs. But there are exceptions, such as T760. Also, if this table is based on a single shader core, the thread count in it should also be based on a single core, it should not be FP32 operations/clock * NumOfCore = thread count.
2. The thread count in the second table will vary according to the number of registers. Does the number of registers refer to the number of registers used by the entire shader or the number of registers used by a specific instruction? How do we evaluate the impact of this number on our game? For example, in a shader, I saw from the offline compiler that there are 100+ registers. What kind of impact will it have on the thread?
3. What is the relationship between Thread count, FP32/clock and the processing capacity of the Arithmetic processing unit in the core? See the description below for details
Valhall :There are 2 FMAs in the shader core, each of which is 16wide, which is 32 FP32 FMA, but the FP32 operations/clock on the dataset is 64?
Midgard :It can be seen from the document that T760 has 2 A pipelines, each of which can process 4 FP32 at the same time, that is, 8 FP32 in general, so why is 28 written in the database? 34 in the document?
Yes, exactly that.