Hi, I am trying to run Linux OpenCL applications within a fast-models simulation, targeting a Mali GPU. No errors occur when I run the applications, however, nothing is written to memory by the GPU. The current GPU fast-models available appear to only provide a register interface and simulate interrupts. I am wondering if it is possible to have the models also simulate memory accesses to DRAM, whether or not the model actually performs final reads/writes to DRAM?. I noticed all the GPU models except for G710 use a shared library similar to https://github.com/ARM-software/nomali-model. Is it possible to extend this shared library software to additionally simulate memory accesses? The library contains the following function pointers that are not currently being used (specified in nomali.h):
void (*memwrite)(nomali_handle_t h, void *usr,
nomali_addr_t addr, uint32_t value);
uint32_t (*memread)(nomali_handle_t h, void *usr,
As a second option, would using Generic Graphics Acceleration (GGA) provide this simulation functionality?
Your understanding above is correct, although the libnomali.so model in the FastModel package has more features than the version on github (in particular integration with GGA). The simulation of the GPU is split between the two parts. The Mali DDK + the FastModel GPU component + libnomali.so provide the control flow and OS integration. GGA provides some actual pixels.
In theory one might be able to modify the github version to perform some non-functional DRAM access using the functions you've highlighted above but there is no guarantee that the interface between the two versions is compatible and this would not be a supported use-case. And you would need to know enough about the operation of the GPU to know what memory you could safely access, and what sort of patterns to generate.
GGA does not use regular PVBus access for the majority of its work, only for processing the incoming job descriptors and writing the final pixels. Furthermore it does not currently support OpenCL. For those reasons it would not be a suitable solution for you.
Can you elaborate a little on what you are doing? There might be another approach. However in general I would not expect to be able to generate a particularly representative set of bus accesses this way. It will always likely be incomplete, will not take in to account cache behaviour or memory latency and such.