This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cannot make inference with PyArmnn on custom quantized model trained with TF2

I trained a model using TF2 and Keras. The model includes the following layers (from tf.keras.layers):

  • layers.Conv2D
  • layers.BatchNormalization
  • layers.ReLU
  • layers.MaxPool2D
  • layers.AveragePooling2D
  • layers.Dropout
  • layers.Flatten
  • layers.Dense

I first trained the mode using the fit() function, then I performed a quantization-aware training using Tensorflow Model Optimization API and converted to .tflite:

# disable quantization for dense layer
annotated_model = tf.keras.models.clone_model(model, clone_function=custom_quantization)

q_aware_model = tfmot.quantization.keras.quantize_apply(annotated_model)

q_aware_model.compile(...)

q_aware_model.fit(...)

q_aware_model.save('q_model.hdf5')
converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

quantized_tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(quantized_tflite_model)

I tried to perform inference using PyArmnn with Npu/CpuAcc with this code:

parser = ann.ITfLiteParser()
network = parser.CreateNetworkFromBinaryFile(path)
graph_id = 0
input_names = parser.GetSubgraphInputTensorNames(graph_id)
input_binding_info = parser.GetNetworkInputBindingInfo(graph_id, input_names[0])
input_tensor_id = input_binding_info[0]
input_tensor_info = input_binding_info[1]
options = ann.CreationOptions()
runtime = ann.IRuntime(options)

preferredBackends = [ann.BackendId('VsiNpu'), ann.BackendId('CpuAcc'), ann.BackendId('CpuRef')]
opt_network, messages = ann.Optimize(network, preferredBackends, runtime.GetDeviceSpec(), ann.OptimizerOptions())

However I got the following warnings:

RuntimeError: WARNING: Layer of type Quantize is not supported on requested backend VsiNpu for input data type Float32 and output data type QAsymmS8 (reason: Npu quantize: output type not supported.
), falling back to the next backend.WARNING: Layer of type Convolution2d is not supported on requested backend VsiNpu for input data type QAsymmS8 and output data type QAsymmS8 (reason: Npu convolution2d: Uint8UnbiasedConvolution not supported.
Npu convolution2d: input is not a supported type.
Npu convolution2d: output is not a supported type.
Npu convolution2d: weights is not a supported type.
Npu convolution2d: input and weights types mismatched.
), falling back to the next backend.WARNING: Layer of type Activation is not supported on requested backend VsiNpu for input data type QAsymmS8 and output data type QAsymmS8 (reason: Npu activation: input type not supported.
Npu activation: output type not supported.
), falling back to the next backend.

And also this error:

ERROR: Layer of type Mean is not supported on any preferred backend [VsiNpu CpuAcc CpuRef ]

What is the reason behind this? Is there a fix?

  • Hey! Quick question and response:

    What NPU are you using for this?

    Fundamentally, those layers are supported by that hardware. That's the translation of the error. There are 2 options. both of which are tricky. 

    1. Add support to the TF Parser for ArmNN

    2. Use the TFLite delegate, which is newer, and I haven't tried, but you can check this:

    https://arm-software.github.io/armnn/21.02/delegate.xhtml

    Currently looking into some of your other errors, though, so I'll get back to you.