We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I trained a model using TF2 and Keras. The model includes the following layers (from tf.keras.layers):
I first trained the mode using the fit() function, then I performed a quantization-aware training using Tensorflow Model Optimization API and converted to .tflite:
# disable quantization for dense layerannotated_model = tf.keras.models.clone_model(model, clone_function=custom_quantization)
# disable quantization for dense layer
annotated_model = tf.keras.models.clone_model(model, clone_function=custom_quantization)
q_aware_model = tfmot.quantization.keras.quantize_apply(annotated_model)
q_aware_model.compile(...)
q_aware_model.compile(...
)
q_aware_model.fit(...)
q_aware_model.fit(...
q_aware_model.save('q_model.hdf5')converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)converter.optimizations = [tf.lite.Optimize.DEFAULT]
q_aware_model.save('q_model.hdf5')
converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()open("converted_model.tflite", "wb").write(quantized_tflite_model)
quantized_tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(quantized_tflite_model)
I tried to perform inference using PyArmnn with Npu/CpuAcc with this code:
parser = ann.ITfLiteParser() network = parser.CreateNetworkFromBinaryFile(path) graph_id = 0 input_names = parser.GetSubgraphInputTensorNames(graph_id) input_binding_info = parser.GetNetworkInputBindingInfo(graph_id, input_names[0]) input_tensor_id = input_binding_info[0] input_tensor_info = input_binding_info[1] options = ann.CreationOptions() runtime = ann.IRuntime(options)
parser = ann.ITfLiteParser()
network = parser.CreateNetworkFromBinaryFile(path)
graph_id = 0
input_names = parser.GetSubgraphInputTensorNames(graph_id)
input_binding_info = parser.GetNetworkInputBindingInfo(graph_id, input_names[0])
input_tensor_id = input_binding_info[0]
input_tensor_info = input_binding_info[1]
options = ann.CreationOptions()
runtime = ann.IRuntime(options)
preferredBackends = [ann.BackendId('VsiNpu'), ann.BackendId('CpuAcc'), ann.BackendId('CpuRef')] opt_network, messages = ann.Optimize(network, preferredBackends, runtime.GetDeviceSpec(), ann.OptimizerOptions())
preferredBackends = [ann.BackendId('VsiNpu'), ann.BackendId('CpuAcc'), ann.BackendId('CpuRef')]
opt_network, messages = ann.Optimize(network, preferredBackends, runtime.GetDeviceSpec(), ann.OptimizerOptions())
However I got the following warnings:
RuntimeError: WARNING: Layer of type Quantize is not supported on requested backend VsiNpu for input data type Float32 and output data type QAsymmS8 (reason: Npu quantize: output type not supported.), falling back to the next backend.WARNING: Layer of type Convolution2d is not supported on requested backend VsiNpu for input data type QAsymmS8 and output data type QAsymmS8 (reason: Npu convolution2d: Uint8UnbiasedConvolution not supported.Npu convolution2d: input is not a supported type.Npu convolution2d: output is not a supported type.Npu convolution2d: weights is not a supported type.Npu convolution2d: input and weights types mismatched.), falling back to the next backend.WARNING: Layer of type Activation is not supported on requested backend VsiNpu for input data type QAsymmS8 and output data type QAsymmS8 (reason: Npu activation: input type not supported.Npu activation: output type not supported.), falling back to the next backend.
And also this error:
ERROR: Layer of type Mean is not supported on any preferred backend [VsiNpu CpuAcc CpuRef ]
What is the reason behind this? Is there a fix?