官术网_书友最值得收藏!

A kernel call

The device code that is written using ANSI C keywords along with CUDA extension keywords is called a kernel. It is launched from the host code by a method called kernel call. Basically, the meaning of kernel call is that we are launching device code from the host code. A kernel call typically generates a large number of blocks and threads to exploit data parallelism on the GPU. Kernel code is very similar to normal C functions; it is just that this code is executed by several threads in parallel. It has a very weird syntax, which is as follows:

kernel << <number of blocks, number of threads per block, size of shared memory > >> (parameters for kernel)

It starts with the name of the kernel that we want to launch. You should make sure that this kernel is defined using the __global__  keyword. Then, it has the << < > >> kernel launch operator that contains configuration parameters for kernel. It can include three parameters separated by a comma. The first parameter indicates the number of blocks you want to execute, and the second parameter indicates the number of threads each block will have. So, the total number of threads started by a kernel launch will be the product of these two numbers. The third parameter, which specifies the size of shared memory used by the kernel, is optional. In the program for variable addition, the kernel launch syntax is as follows:

gpuAdd << <1,1> >> (1 , 4, d_c)

Here, gpuAdd is the name of a kernel that we want to launch, and <<<1,1>>> indicates we want to start one block with one thread per block, which means that we are starting only one thread. Three arguments in round brackets are the parameters that are passed to the kernel. Here, we are passing two constants,  1 and 4. The third parameter is a pointer to device memory d_c.  It points at the location on device memory where the kernel will store its answer after addition. One thing that the programmer has to keep in mind is that pointers passed as parameters to kernel should only point to device memory. If it is pointing to host memory, it can crash your program. After kernel execution is completed, the result pointed by the device pointer can be copied back to host memory for further use. Starting only one thread for execution on the device is not the optimal use of device resources. Suppose you want to start multiple threads in parallel; what is the modification that you have to make in the syntax of the kernel call? This is addressed in the next section and is termed "configuring kernel parameters".

主站蜘蛛池模板: 铜陵市| 建湖县| 壤塘县| 青州市| 陇南市| 西青区| 陇川县| 吉林省| 连州市| 荔浦县| 岳普湖县| 诏安县| 娄底市| 萨迦县| 商洛市| 福泉市| 儋州市| 化州市| 湛江市| 临海市| 临安市| 隆化县| 南开区| 镇巴县| 梁平县| 洮南市| 东辽县| 九龙城区| 天柱县| 南城县| 库车县| 田林县| 涞源县| 宜章县| 乌拉特后旗| 宝坻区| 梁山县| 屏山县| 清水河县| 桑植县| 克什克腾旗|