官术网_书友最值得收藏!

A kernel call

The device code that is written using ANSI C keywords along with CUDA extension keywords is called a kernel. It is launched from the host code by a method called kernel call. Basically, the meaning of kernel call is that we are launching device code from the host code. A kernel call typically generates a large number of blocks and threads to exploit data parallelism on the GPU. Kernel code is very similar to normal C functions; it is just that this code is executed by several threads in parallel. It has a very weird syntax, which is as follows:

kernel << <number of blocks, number of threads per block, size of shared memory > >> (parameters for kernel)

It starts with the name of the kernel that we want to launch. You should make sure that this kernel is defined using the __global__  keyword. Then, it has the << < > >> kernel launch operator that contains configuration parameters for kernel. It can include three parameters separated by a comma. The first parameter indicates the number of blocks you want to execute, and the second parameter indicates the number of threads each block will have. So, the total number of threads started by a kernel launch will be the product of these two numbers. The third parameter, which specifies the size of shared memory used by the kernel, is optional. In the program for variable addition, the kernel launch syntax is as follows:

gpuAdd << <1,1> >> (1 , 4, d_c)

Here, gpuAdd is the name of a kernel that we want to launch, and <<<1,1>>> indicates we want to start one block with one thread per block, which means that we are starting only one thread. Three arguments in round brackets are the parameters that are passed to the kernel. Here, we are passing two constants,  1 and 4. The third parameter is a pointer to device memory d_c.  It points at the location on device memory where the kernel will store its answer after addition. One thing that the programmer has to keep in mind is that pointers passed as parameters to kernel should only point to device memory. If it is pointing to host memory, it can crash your program. After kernel execution is completed, the result pointed by the device pointer can be copied back to host memory for further use. Starting only one thread for execution on the device is not the optimal use of device resources. Suppose you want to start multiple threads in parallel; what is the modification that you have to make in the syntax of the kernel call? This is addressed in the next section and is termed "configuring kernel parameters".

主站蜘蛛池模板: 许昌市| 京山县| 闸北区| 沙雅县| 井研县| 察隅县| 布拖县| 前郭尔| 崇左市| 贵州省| 彰武县| 丹凤县| 望谟县| 扬州市| 车险| 布尔津县| 芦山县| 威海市| 托里县| 桂东县| 无极县| 保定市| 宁乡县| 肃宁县| 北宁市| 沂源县| 潼关县| 高邑县| 河池市| 洱源县| 浪卡子县| 安顺市| 株洲县| 长春市| 海盐县| 泗阳县| 南通市| 巨鹿县| 乌兰县| 五峰| 马尔康县|