What is the difference between SVM and CL::buffer

hi,

i asked this question on khronos forum but i got no answer. So i decided to ask the question on this forum.

I used to do the following procces with openCL on Android.

Working with:

   - Mali-G715-Immortalis MC11 r1p2

  - OpenCL 3.0 v1.r38p1-01eac0.c1a71ccca2acf211eb87c5db5322f569

  - SVM_COARSE_GRAIN_BUFFER supported

  1. i create the platform,queu,devive. Create all my cl::buffer and compile all the kernel at the start of my application.

  2. i get picture from my camera and send the byte data using JNI jbyteArray =>((uint8_t*)inPtr) to my c++ function.

  3. i get the (uint8_t*)inPtr pointer than i use cl::buffer to feed the buffer with the camera picture data, using :
    bufferNV21 = cl::Buffer(gContext, CL_MEM_READ_ONLY|CL_MEM_USE_HOST_PTR , isize*sizeof(cl_uchar), inPtr , NULL); this take less than 1ms.

  4. i process my kernel NV21toRGB than i do some staff with my output buffer.

  5. i use enqueueMapBuffer to point the Buffer,buf, to my local program memory and that wil be used by pthread CPU processing. take less than 2ms

  6. than i copy back the CPU result to the GPU buffer doing:
    bufferligne = cl::Buffer(gContext, CL_MEM_USE_HOST_PTR, (1024*1024)*sizeof(cl_uchar4), buf, NULL); // remplace enqueueWriteBuffer.
    this take less than 3ms

  7. do some kernel on bufferligne cl::buffer

  8. then send back the GPU buffer(bufferMMM) to Java out bitmap using
    gQueue.enqueueReadBuffer(bufferMMM, CL_TRUE, 0, osize*sizeof(cl_uchar4), out, 0, &arraySecondEvent); // pour openCL
    this last part take between 3 and 5ms, depends. Sometime less.

So it is relevant to use SVM with my cnfiguration and what should i change if i want to use SVM. Change at step 3,5,7 or 8.

And what does SVM that cl::buffer does not. I would like to anderstand Why to use.

i could improved the speed by using on the kernel.cl file

#pragma OPENCL EXTENSION cl_khr_priority_hints : enable // accelere openCL queue driver
#pragma OPENCL EXTENSION CL_QUEUE_PRIORITY_HIGH_KHR : enable

and on the .cpp file

// Optional extension support
#define CL_HPP_USE_IL_KHR
#define CL_HPP_USE_CL_SUB_GROUPS_KHR
#define CL_HPP_OPENCL_API_WRAPPER

  • hi,

    it look like my question is not clear enough.

    that is the ARM document where i found the use of SVM rather than cl_buffer.

    developer.arm.com/.../Shared-virtual-memory

  • nobody can explain why to use clSvmAlloc command rather than clCreateBuffer.

    .

  • As the blog mentions, these are the benefits of using SVM:
    * It has lower overheads.
    * It is easier to use because it is just a pointer to data.
    * If your platform supports coherency, It allows you to use coherent memory.
    * Because the address of the memory is guaranteed to be the same in the host and the device, it allows
    you to write kernels using dynamic data-structures that rely on pointers (i.e. linked lists).

    With regards to your application, with the information supplied, you would probably benefit from the use of SVM.
    The fastest way to convert an existing application to use SVM is to allocate memory via clSVMAlloc and pass it as a host_ptr to a cl object along with CL_MEM_USE_HOST_PTR.

    For your example above that would mean allocating "inPtr" via clSVMAlloc. you will most likely see a performance improvement of the map/unmap calls.

    Although you will probably see the same performance benefit on map/unmap if you use CL_MEM_COPY_HOST_PTR instead of CL_MEM_USE_HOST_PTR.

    Another way you can improve you application is by importing your data. You can use either of these extensions cl_import_memory_arm or cl_khr_external_memory, to import memory.

    Also as a general rule try to avoid switching between the CPU and GPU, if possible.
    for instance. Can step 5 be moved to GPU? also on step 8 can't the kernel write directly to "out"(see if you can import "out" into bufferMMM)?

    Hope this helps.

  • thanks jhon for the help.

    i tri it. But for step 5 it can be removed from CPU. CPU trheat information as a row so we can have sequentialy processing on X and Y in a FOR.

    GPU can threat information on the same way that CPU. One work on matrix, work_group, it need to réorganize the data row to process the work_group. CPU is more faster, the médiatex 9200+ is surprising in 64bit. The CPU is more useable computing array,database,STRUC. It is not the same use for me, if not i would used only one.

    GPU are faster than CPU for multitasking. CPU got 8/16 core when GPU got thusand, i am joking,but.

    It is définitly not the same use because they do not process information the same way.

    All is how to structure information for your use.

    GPU could and should decome like a CPU by processing the work_group in order not in aléatory mode,"soory for my, I think. VERY BAD ENGLISH WRITTING, it is the same in french.

    GPU and CPU do not do the same things.

    CPU can som very quickly on big STRUC.

    But thanks again for the respond. I will find it, no problem. time and patience.

  • hi,

    lest's come back to SVM. you said :

    For your example above that would mean allocating "inPtr" via clSVMAlloc. you will most likely see a performance improvement of the map/unmap calls.

    But "inPtr" is a java Object, we do not know where the memoriy is allocated and the same for "out". So using JAVA i am not sure that the use of SVM can be done. And i do not know how to do it is it could be done.