官术网_书友最值得收藏!

A kernel call

The device code that is written using ANSI C keywords along with CUDA extension keywords is called a kernel. It is launched from the host code by a method called kernel call. Basically, the meaning of kernel call is that we are launching device code from the host code. A kernel call typically generates a large number of blocks and threads to exploit data parallelism on the GPU. Kernel code is very similar to normal C functions; it is just that this code is executed by several threads in parallel. It has a very weird syntax, which is as follows:

kernel << <number of blocks, number of threads per block, size of shared memory > >> (parameters for kernel)

It starts with the name of the kernel that we want to launch. You should make sure that this kernel is defined using the __global__  keyword. Then, it has the << < > >> kernel launch operator that contains configuration parameters for kernel. It can include three parameters separated by a comma. The first parameter indicates the number of blocks you want to execute, and the second parameter indicates the number of threads each block will have. So, the total number of threads started by a kernel launch will be the product of these two numbers. The third parameter, which specifies the size of shared memory used by the kernel, is optional. In the program for variable addition, the kernel launch syntax is as follows:

gpuAdd << <1,1> >> (1 , 4, d_c)

Here, gpuAdd is the name of a kernel that we want to launch, and <<<1,1>>> indicates we want to start one block with one thread per block, which means that we are starting only one thread. Three arguments in round brackets are the parameters that are passed to the kernel. Here, we are passing two constants,  1 and 4. The third parameter is a pointer to device memory d_c.  It points at the location on device memory where the kernel will store its answer after addition. One thing that the programmer has to keep in mind is that pointers passed as parameters to kernel should only point to device memory. If it is pointing to host memory, it can crash your program. After kernel execution is completed, the result pointed by the device pointer can be copied back to host memory for further use. Starting only one thread for execution on the device is not the optimal use of device resources. Suppose you want to start multiple threads in parallel; what is the modification that you have to make in the syntax of the kernel call? This is addressed in the next section and is termed "configuring kernel parameters".

主站蜘蛛池模板: 丰县| 内乡县| 阳城县| 井研县| 伊吾县| 辉县市| 雅江县| 紫阳县| 新兴县| 湘潭市| 南和县| 凌云县| 额敏县| 攀枝花市| 唐海县| 商丘市| 通州市| 攀枝花市| 当阳市| 南通市| 柳江县| 彭州市| 邵阳县| 保亭| 井研县| 鄢陵县| 集贤县| 凤凰县| 宜丰县| 吉水县| 余庆县| 肥东县| 扎赉特旗| 昭苏县| 伊金霍洛旗| 西乌珠穆沁旗| 郸城县| 阿合奇县| 屏东县| 赤水市| 沙洋县|