And am I correct in interpreting the shift operations as data-processing and not permute instructions?
At which N{X} is the instruction loaded, 1 or 0? VADD loads the sources in N2 and produces the result in N3. Assume that VADD is issued at t = 0 (measured in cycles). Will the source registers be loaded at t = 2 or t = 1? And when saying that result is produced in N3, is that t = 3 or t = 2?
From your example I gather that even though the result is produced in N3, it isn't actually available for use by another instruction until N4. Is this correct?