LSTM support with Ethos-U NPU

Hello there,

I was recently developing and model which is required to use some LSTM layers. I found that in deed Ethos-U65 is supporting LSTM: https://developer.arm.com/documentation/102023/0000/Programmers-model/Operators-and-performance/Supported-data-types-and-operators.

I stumble upon a fact that with most of the frameworks it is hard to quantize those layers as their performance is poor (after PTQ). Can you suggest what is the right path for LSTM layer integration with NPU? Maybe use of dynamic quantization?

Thank you,
Tymo

Parents

0 Will He 1 hour ago
Hi Tymoteusz,

Thanks for raising the U65 related questions in Arm Community and sorry that this forum is not monitored very well.

I think the most suspicious point is that the LSTM model generated is decomposed but not fused, which can lead poor performance after vela processing then running on U65.

Kindly please check below flow for fused LSTM:

Train UnidirectionalLSTM in fp32 as usual

PTQ to int8 with a good representative dataset using this tutorial from Google to end generate a fused (aka not unrolled) LSTM: https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lit[…]perimental_new_converter/Keras_LSTM_fusion_Codelab.ipynb

Convert int8 → 16×8 using TFLM’s tools/requantize_flatbuffer.py since 8x8 might have accuracy issues

Compile with Vela as usual (and validate with TFLM reference kernels (not TFLite)

Thanks,

Will
Cancel
Vote up 0 Vote down

Reply

Accept answer

Cancel

Reply

0 Will He 1 hour ago
Hi Tymoteusz,

Thanks for raising the U65 related questions in Arm Community and sorry that this forum is not monitored very well.

I think the most suspicious point is that the LSTM model generated is decomposed but not fused, which can lead poor performance after vela processing then running on U65.

Kindly please check below flow for fused LSTM:

Train UnidirectionalLSTM in fp32 as usual

PTQ to int8 with a good representative dataset using this tutorial from Google to end generate a fused (aka not unrolled) LSTM: https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lit[…]perimental_new_converter/Keras_LSTM_fusion_Codelab.ipynb

Convert int8 → 16×8 using TFLM’s tools/requantize_flatbuffer.py since 8x8 might have accuracy issues

Compile with Vela as usual (and validate with TFLM reference kernels (not TFLite)

Thanks,

Will
Cancel
Vote up 0 Vote down

Reply

Accept answer

Cancel

Children

No data