Hi, here is my case: I make the game and go to optimize the mathematical calculations, I have been long and hard (with interruptions) writing math on neon (.s extension files). And in the end I got it, but then I decided to check how many times it increases productivity and it turned out that it does not increase, but on the contrary - reduces. That is, I received a situation where the multiplication in the usual way of 60 matrices is recalculated for: Seconds 0.000039 and Neon : Seconds 0.001612 =))
Do not tell me how this is possible at all and where could I make a mistake?
My simple matrix multiple:
//where myMatrix this = float[4][4]
myMatrix operator*(myMatrix mat1, myMatrix mat2){ myMatrix backMatrix; // first row backMatrix.elements[0][0] = mat1.elements[0][0] * mat2.elements[0][0] + mat1.elements[1][0] * mat2.elements[0][1]+ mat1.elements[2][0] * mat2.elements[0][2] + mat1.elements[3][0] * mat2.elements[0][3]; backMatrix.elements[1][0] = mat1.elements[0][0] * mat2.elements[1][0] + mat1.elements[1][0] * mat2.elements[1][1]+ mat1.elements[2][0] * mat2.elements[1][2] + mat1.elements[3][0] * mat2.elements[1][3]; backMatrix.elements[2][0] = mat1.elements[0][0] * mat2.elements[2][0] + mat1.elements[1][0] * mat2.elements[2][1]+ mat1.elements[2][0] * mat2.elements[2][2] + mat1.elements[3][0] * mat2.elements[2][3]; backMatrix.elements[3][0] = mat1.elements[0][0] * mat2.elements[3][0] + mat1.elements[1][0] * mat2.elements[3][1]+ mat1.elements[2][0] * mat2.elements[3][2] + mat1.elements[3][0] * mat2.elements[3][3]; // second's row backMatrix.elements[0][1] = mat1.elements[0][1] * mat2.elements[0][0] + mat1.elements[1][1] * mat2.elements[0][1]+ mat1.elements[2][1] * mat2.elements[0][2] + mat1.elements[3][1] * mat2.elements[0][3]; backMatrix.elements[1][1] = mat1.elements[0][1] * mat2.elements[1][0] + mat1.elements[1][1] * mat2.elements[1][1]+ mat1.elements[2][1] * mat2.elements[1][2] + mat1.elements[3][1] * mat2.elements[1][3]; backMatrix.elements[2][1] = mat1.elements[0][1] * mat2.elements[2][0] + mat1.elements[1][1] * mat2.elements[2][1]+ mat1.elements[2][1] * mat2.elements[2][2] + mat1.elements[3][1] * mat2.elements[2][3]; backMatrix.elements[3][1] = mat1.elements[0][1] * mat2.elements[3][0] + mat1.elements[1][1] * mat2.elements[3][1]+ mat1.elements[2][1] * mat2.elements[3][2] + mat1.elements[3][1] * mat2.elements[3][3]; // third's row backMatrix.elements[0][2] = mat1.elements[0][2] * mat2.elements[0][0] + mat1.elements[1][2] * mat2.elements[0][1]+ mat1.elements[2][2] * mat2.elements[0][2] + mat1.elements[3][2] * mat2.elements[0][3]; backMatrix.elements[1][2] = mat1.elements[0][2] * mat2.elements[1][0] + mat1.elements[1][2] * mat2.elements[1][1]+ mat1.elements[2][2] * mat2.elements[1][2] + mat1.elements[3][2] * mat2.elements[1][3]; backMatrix.elements[2][2] = mat1.elements[0][2] * mat2.elements[2][0] + mat1.elements[1][2] * mat2.elements[2][1]+ mat1.elements[2][2] * mat2.elements[2][2] + mat1.elements[3][2] * mat2.elements[2][3]; backMatrix.elements[3][2] = mat1.elements[0][2] * mat2.elements[3][0] + mat1.elements[1][2] * mat2.elements[3][1]+ mat1.elements[2][2] * mat2.elements[3][2] + mat1.elements[3][2] * mat2.elements[3][3]; // four's row backMatrix.elements[0][3] = mat1.elements[0][3] * mat2.elements[0][0] + mat1.elements[1][3] * mat2.elements[0][1]+ mat1.elements[2][3] * mat2.elements[0][2] + mat1.elements[3][3] * mat2.elements[0][3]; backMatrix.elements[1][3] = mat1.elements[0][3] * mat2.elements[1][0] + mat1.elements[1][3] * mat2.elements[1][1]+ mat1.elements[2][3] * mat2.elements[1][2] + mat1.elements[3][3] * mat2.elements[1][3]; backMatrix.elements[2][3] = mat1.elements[0][3] * mat2.elements[2][0] + mat1.elements[1][3] * mat2.elements[2][1]+ mat1.elements[2][3] * mat2.elements[2][2] + mat1.elements[3][3] * mat2.elements[2][3]; backMatrix.elements[3][3] = mat1.elements[0][3] * mat2.elements[3][0] + mat1.elements[1][3] * mat2.elements[3][1]+ mat1.elements[2][3] * mat2.elements[3][2] + mat1.elements[3][3] * mat2.elements[3][3]; return backMatrix;}and my Neon Matrix // neon matrix float[16]
myMatrixFunctionCode (float* losFloat, const float er[16], const float sef[16]) asm("myMatrixFunction")
//*** .s file .text .syntax unified .balign 4 .global myMatrixFunction .thumb .thumb_func
myMatrixFunction:
vld1.32 {d16-d19}, [r1]! vld1.32 {d20-d23}, [r1]! vld1.32 {d0-d3}, [r2]! vld1.32 {d4-d7}, [r2]! .macro mul_los_matrix store_q, column0_d, column1_d vmul.f32 \store_q, q8, \column0_d[0] @multiple col element 0 by matrix col 0 vmla.f32 \store_q, q9, \column0_d[1] @multiple-acc col element 1 by matrix col 1 vmla.f32 \store_q, q10, \column1_d[0] vmla.f32 \store_q, q11, \column1_d[1] .endm mul_los_matrix q12, d0, d1 @ matrix 0 * matrix 1 col 0 mul_los_matrix q13, d2, d3 @ matrix 0 * matrix 1 col 1 mul_los_matrix q14, d4, d5 @ matrix 0 * matrix 1 col 2 mul_los_matrix q15, d6, d7 @ matrix 0 * matrix 1 col 3 vst1.32 {d24-d27}, [r0]! vst1.32 {d28-d31}, [r0]!
b endend: bx lr.end
Everything works for me - it counts everything correctly and so on.But it does it very slowly.Here is my problem.I will be glad and grateful for any help