Instruction fetch alignment

Hello,

I have a question about the alignment of instruction cache accesses: I was wondering if the fetch PC has to be aligned to a 16-byte boundary or can I fetch from anywhere in an instruction cache block, especially in Cortex-A processors? A downside of the 16-byte alignment is that a fetch needs to be split into two fetch requests if it crosses a 16-byte boundary even within the same 64-byte cache block. For example, if we want to fetch byte_8 to byte_23 (16 bytes) from an instruction cache block, we need to make two cache accesses: first fetching byte_0 to byte_15 in one cycle and then byte_16 to byte_31 in the next cycle. Not sure if there is any advantage of the 16-byte alignment. Intel does seem to have this 16-byte alignment requirement as they recommend aligning the branch targets to 16-byte boundaries.

Best