[font="Arial"]The code was run in a simulation enviroment and the operation of arm core can be observed.The arm core issued a axi transaction with 16(bits) * 4(length) for write access, while a 32(bits) * 8(length) transaction for read access. It is kind of werid. Why not a 32(bits) * 8(length) transaction can be issued to get a maximum throughput? The limitation is because of the cortex-a9 arch?[/font]