Typical instruction per cycle in A53 CPU.

I program A53 without OS for some arithmetic operations. 

The task generates 2K 32b numbers using polynomial of CRC32, store, move from/to and compares different portions of 32b numbers in L1 data cache, continuously. 

Right now I get a instruction per cycle of 1.05 in Xilinx's Zynq Ultrascale device. What is a guesstimated IPC for such workloads?

I am pondering whether there are room for improvement from 1.05. 

I understand the A53 has two instruction decoders, would that mean the peak IPC would be 2? 

Thank you.