Here's a link to a blog post from today about my work on accelerating SQLite with OpenCL on the ARM based Samsung Chromebook with a Mali T604.
Details & Early Benchmarks of OpenCL accelerated SQLite on ARM Mali | Tom Gall
Comments, questions and suggestions most welcome.
Hi Pete,
Thanks again for your suggestions. I'm still working on the code a bit yet but it's looking good. Performing:
SELECT id, uniformi, normali5 FROM test WHERE uniformi > 60 AND normali5 < 0
sqlite built -O2
CPU sql1 took 43631 microseconds
OpenCL sql1 took 14545 microseconds (2.99x or 199% better)
OpenCL (using vectors) 4114 microseconds (10.6x better or 960%)
The improvement not only resulted from working with vectors over arrays of integers but by also reducing the number of registers in use. I was able to jump from 64 to 128 work units.
The heart of the OpenCL kernel evolved from:
do {
if ((data[offset].v > 60) && (data[offset].w < 0)) {
resultArray[roffset].id = data[offset].id;
resultArray[roffset].v = data[offset].v;
resultArray[roffset].w = data[offset].w;
roffset++;
}
offset++;
endRow--;
} while (endRow);
to
v1 = vload4(0, data1+offset);
v2 = vload4(0, data2+offset);
r = (v1 > 60) && ( 0 > v2);
vstore4(r,0, resultMask+offset);
offset+=4;
totalRows--;
} while (totalRows);
Thanks again. If i can knock down one little bug I anticipate I'll be posting my code tomorrow. I've a curious situation where 2 results (out of 100,000 rows) aren't being matched and I'm not sure why. What I wouldn't give for an OpenCL debugger!
Hi Tom,
Thanks again for your suggestions.
No problem - happy to help. Writing optimal GPGPU code generally requires spotting where you can orientate your world 90 degrees and run on the walls - it takes a bit of getting used to.
Sweet =)
Cheers, P