This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

mali_offline_compiler question

1、I found a strange problem, I tested the following two kernels , The main function and the two kernels are as follows.Test platform is Mali -T864.GlobalWorkSize=10000000(10M),The first takes 20ms and the second takes 15ms.

main.cpp

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/*
* This confidential and proprietary software may be used only as
* authorised by a licensing agreement from ARM Limited
* (C) COPYRIGHT 2013 ARM Limited
* ALL RIGHTS RESERVED
* The entire notice above must be reproduced on all authorised
* copies and copies may only be made to the extent permitted
* by a licensing agreement from ARM Limited.
*/
#define CL_TARGET_OPENCL_VERSION 120
#include "common.h"
#include "image.h"
#include <stdlib.h>
#include <CL/cl.h>
#include <iostream>
using namespace std;
/**
* \brief Basic integer array addition implemented in OpenCL.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

kernel1.cl

Fullscreen
1
2
3
4
5
6
7
__kernel void hello_world_opencl(__global float* restrict inputA,
__global float* restrict inputB,
__global float* restrict output)
{
int i = get_global_id(0);
output[i] = inputA[i] + inputB[i];
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

kernel2.cl

Fullscreen
1
2
3
4
5
6
7
__kernel void hello_world_opencl(__global float* restrict inputA,
__global float* restrict inputB,
__global float* restrict output)
{
int i = get_global_id(0);
output[i] = inputA[i]*2 + inputB[i];
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

2、I use mali_offline_compiler to profile them,the two are same shows below ,how to get Instructions Emmited and  Path Cycles?Why Instructions Emmited is twice than Longest Path Cycles ?And in my opinion, the L/S operation should be 3 times,Why four times here?

Mali Offline Compiler v7.0.0 (Build c38421)
Copyright 2007-2019 Arm Limited, all rights reserved
Configuration
=============
Hardware: Mali-T860 r2p0
Driver: Midgard r23p0-00rel0
Shader type: OpenCL Kernel (inferred)
Main shader
===========
Work registers: 1
Uniform registers: 2
Stack spilling: False
   A L/S T Bound
Instructions Emitted: 2.0 4.0 0.0 L/S
Shortest Path Cycles: 1.0 4.0 0.0 L/S
Longest Path Cycles: 1.0 4.0 0.0 L/S
A = Arithmetic, L/S = Load/Store, T = Texture
Shader properties
=================
Uniform computation: False

0