I have a hunch what is happening. The FIR filter function arm_fir_f32() requires that the filter coefficients be in time reversed order. The arm_conv_partial_f32() has the coefficients in normal order. In a high pass filter, the coefficients typically alternate in sign. If the length of the filter is even then time flipping the filter is equivalent to scaling by -1.