This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Problem with Compiling Pure C with Larger Arrays in Keil (goal: compiling CNN)


* CONTEXT:   Hi! I wrote a CNN inference model (MobileNetV3-Small) in bare metal C and verified its correctness (outputs match PyTorch). I did this on my local machine in Visual Studio Code.
I am trying to simulate this C program on ARM Cortex M4 with FPU, and I am using Keil uVision to compile my C program. I am doing this as part of my thesis project where I will be characterizing the performance/energy before and after adding a specialized custom hardware unit.

I am using my group's RTL simulation infrastructure to run the compiled program (obtained from Keil) on the ARM Cortex M4 core/peripherals and visualize the cycle by cycle info on a waveform viewer for validation and debugging. 

* PROBLEM: I am still learning Keil and the proper way to write code for embedded systems, and I've been having a lot of trouble with running even a simple convolution with larger inputs. It seems that whenever the input arrays get even a little large, the SRAM data inputs (as can be seen on the waveform) become undefined/stop updating and the code runs into memory faults. When the program gets stuck in mem faults instructions, it can also be seen in the disassembly. Below, I show the problem with a simple example (simple array eg.) as well as a snippet of the actual code I am aiming to run (1 layer of the network). I would really appreciate any guidance with code guidelines or environment setup I should pay attention to in order to get my target layer working. Thank you so much in advance!
*EXAMPLES:
*Project config, various settings:
*Simple array eg:
The code below works for in and out sizes of less of 30. Once the arrays have size 30, the mem write data becomes undefined.
SRAM_DIN looks good (defined values), with size 5 here.
Breaks with size 30: see the red undefined xxx data getting written in SRAM_DIN.


*Eg target layer from the network: 1 Conv layer:
- Note: header file "bneck_config.h" contains definitions:
float ifmap_buf[3072] = {0, 1, 2,..,3071};
const float conv0_kernels [432] = { 5.5238e-01,2.0333e-01, ..};
Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include "bneck_config.h"
// #include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
//NOTE: To avoid issues with malloc, all buffers to be used were declared as arrays with the max size that will be needed
//ifmap_buf is a float[3072] array in the .h file
float ofmap_buf [3072]; //32x32x3: 3 channels of 32x32 each
float conv_to_sum_buf [90112];
void convolution2D(float* channel_input, int inputSize,
int kernelSize, float* kernel, int stride,
float* channel_output) {
// Calculate the output size
int padding = (kernelSize - 1) / 2;
int outputSize = (inputSize + 2 * padding - kernelSize) / stride + 1;
for (int i = -padding; i < inputSize - padding; i += stride) {
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

0