We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi, here is my case: I make the game and go to optimize the mathematical calculations, I have been long and hard (with interruptions) writing math on neon (.s extension files). And in the end I got it, but then I decided to check how many times it increases productivity and it turned out that it does not increase, but on the contrary - reduces. That is, I received a situation where the multiplication in the usual way of 60 matrices is recalculated for: Seconds 0.000039 and Neon : Seconds 0.001612 =))
Do not tell me how this is possible at all and where could I make a mistake?
My simple matrix multiple:
//where myMatrix this = float[4][4]
myMatrix operator*(myMatrix mat1, myMatrix mat2){ myMatrix backMatrix; // first row backMatrix.elements[0][0] = mat1.elements[0][0] * mat2.elements[0][0] + mat1.elements[1][0] * mat2.elements[0][1]+ mat1.elements[2][0] * mat2.elements[0][2] + mat1.elements[3][0] * mat2.elements[0][3]; backMatrix.elements[1][0] = mat1.elements[0][0] * mat2.elements[1][0] + mat1.elements[1][0] * mat2.elements[1][1]+ mat1.elements[2][0] * mat2.elements[1][2] + mat1.elements[3][0] * mat2.elements[1][3]; backMatrix.elements[2][0] = mat1.elements[0][0] * mat2.elements[2][0] + mat1.elements[1][0] * mat2.elements[2][1]+ mat1.elements[2][0] * mat2.elements[2][2] + mat1.elements[3][0] * mat2.elements[2][3]; backMatrix.elements[3][0] = mat1.elements[0][0] * mat2.elements[3][0] + mat1.elements[1][0] * mat2.elements[3][1]+ mat1.elements[2][0] * mat2.elements[3][2] + mat1.elements[3][0] * mat2.elements[3][3]; // second's row backMatrix.elements[0][1] = mat1.elements[0][1] * mat2.elements[0][0] + mat1.elements[1][1] * mat2.elements[0][1]+ mat1.elements[2][1] * mat2.elements[0][2] + mat1.elements[3][1] * mat2.elements[0][3]; backMatrix.elements[1][1] = mat1.elements[0][1] * mat2.elements[1][0] + mat1.elements[1][1] * mat2.elements[1][1]+ mat1.elements[2][1] * mat2.elements[1][2] + mat1.elements[3][1] * mat2.elements[1][3]; backMatrix.elements[2][1] = mat1.elements[0][1] * mat2.elements[2][0] + mat1.elements[1][1] * mat2.elements[2][1]+ mat1.elements[2][1] * mat2.elements[2][2] + mat1.elements[3][1] * mat2.elements[2][3]; backMatrix.elements[3][1] = mat1.elements[0][1] * mat2.elements[3][0] + mat1.elements[1][1] * mat2.elements[3][1]+ mat1.elements[2][1] * mat2.elements[3][2] + mat1.elements[3][1] * mat2.elements[3][3]; // third's row backMatrix.elements[0][2] = mat1.elements[0][2] * mat2.elements[0][0] + mat1.elements[1][2] * mat2.elements[0][1]+ mat1.elements[2][2] * mat2.elements[0][2] + mat1.elements[3][2] * mat2.elements[0][3]; backMatrix.elements[1][2] = mat1.elements[0][2] * mat2.elements[1][0] + mat1.elements[1][2] * mat2.elements[1][1]+ mat1.elements[2][2] * mat2.elements[1][2] + mat1.elements[3][2] * mat2.elements[1][3]; backMatrix.elements[2][2] = mat1.elements[0][2] * mat2.elements[2][0] + mat1.elements[1][2] * mat2.elements[2][1]+ mat1.elements[2][2] * mat2.elements[2][2] + mat1.elements[3][2] * mat2.elements[2][3]; backMatrix.elements[3][2] = mat1.elements[0][2] * mat2.elements[3][0] + mat1.elements[1][2] * mat2.elements[3][1]+ mat1.elements[2][2] * mat2.elements[3][2] + mat1.elements[3][2] * mat2.elements[3][3]; // four's row backMatrix.elements[0][3] = mat1.elements[0][3] * mat2.elements[0][0] + mat1.elements[1][3] * mat2.elements[0][1]+ mat1.elements[2][3] * mat2.elements[0][2] + mat1.elements[3][3] * mat2.elements[0][3]; backMatrix.elements[1][3] = mat1.elements[0][3] * mat2.elements[1][0] + mat1.elements[1][3] * mat2.elements[1][1]+ mat1.elements[2][3] * mat2.elements[1][2] + mat1.elements[3][3] * mat2.elements[1][3]; backMatrix.elements[2][3] = mat1.elements[0][3] * mat2.elements[2][0] + mat1.elements[1][3] * mat2.elements[2][1]+ mat1.elements[2][3] * mat2.elements[2][2] + mat1.elements[3][3] * mat2.elements[2][3]; backMatrix.elements[3][3] = mat1.elements[0][3] * mat2.elements[3][0] + mat1.elements[1][3] * mat2.elements[3][1]+ mat1.elements[2][3] * mat2.elements[3][2] + mat1.elements[3][3] * mat2.elements[3][3]; return backMatrix;}and my Neon Matrix // neon matrix float[16]
myMatrixFunctionCode (float* losFloat, const float er[16], const float sef[16]) asm("myMatrixFunction")
//*** .s file .text .syntax unified .balign 4 .global myMatrixFunction .thumb .thumb_func
myMatrixFunction:
vld1.32 {d16-d19}, [r1]! vld1.32 {d20-d23}, [r1]! vld1.32 {d0-d3}, [r2]! vld1.32 {d4-d7}, [r2]! .macro mul_los_matrix store_q, column0_d, column1_d vmul.f32 \store_q, q8, \column0_d[0] @multiple col element 0 by matrix col 0 vmla.f32 \store_q, q9, \column0_d[1] @multiple-acc col element 1 by matrix col 1 vmla.f32 \store_q, q10, \column1_d[0] vmla.f32 \store_q, q11, \column1_d[1] .endm mul_los_matrix q12, d0, d1 @ matrix 0 * matrix 1 col 0 mul_los_matrix q13, d2, d3 @ matrix 0 * matrix 1 col 1 mul_los_matrix q14, d4, d5 @ matrix 0 * matrix 1 col 2 mul_los_matrix q15, d6, d7 @ matrix 0 * matrix 1 col 3 vst1.32 {d24-d27}, [r0]! vst1.32 {d28-d31}, [r0]!
b endend: bx lr.end
Everything works for me - it counts everything correctly and so on.But it does it very slowly.Here is my problem.I will be glad and grateful for any help