How to evaluate DSU performance


I want to evaluate the DSU performance with the configuration i have selected. What is the good way to start. eventually i would like to measure How many cycles each application (includes benchmark applications and my own) is taking with different configurations BEFORE simulating on RTL or FPGA emulation. 

I am thinking of tools from ARM which can be used to create SoC with ARM models.

Is there any tool from ARM which can be used to evaluate at core/system level performance?

Does ARM provide any C or system C models of DSU/Core they deliver?
At system level does ARM provide CMN700 model?

Does ARM provide fast functional and cycle accurate models?