官术网_书友最值得收藏!

A kernel call

The device code that is written using ANSI C keywords along with CUDA extension keywords is called a kernel. It is launched from the host code by a method called kernel call. Basically, the meaning of kernel call is that we are launching device code from the host code. A kernel call typically generates a large number of blocks and threads to exploit data parallelism on the GPU. Kernel code is very similar to normal C functions; it is just that this code is executed by several threads in parallel. It has a very weird syntax, which is as follows:

kernel << <number of blocks, number of threads per block, size of shared memory > >> (parameters for kernel)

It starts with the name of the kernel that we want to launch. You should make sure that this kernel is defined using the __global__  keyword. Then, it has the << < > >> kernel launch operator that contains configuration parameters for kernel. It can include three parameters separated by a comma. The first parameter indicates the number of blocks you want to execute, and the second parameter indicates the number of threads each block will have. So, the total number of threads started by a kernel launch will be the product of these two numbers. The third parameter, which specifies the size of shared memory used by the kernel, is optional. In the program for variable addition, the kernel launch syntax is as follows:

gpuAdd << <1,1> >> (1 , 4, d_c)

Here, gpuAdd is the name of a kernel that we want to launch, and <<<1,1>>> indicates we want to start one block with one thread per block, which means that we are starting only one thread. Three arguments in round brackets are the parameters that are passed to the kernel. Here, we are passing two constants,  1 and 4. The third parameter is a pointer to device memory d_c.  It points at the location on device memory where the kernel will store its answer after addition. One thing that the programmer has to keep in mind is that pointers passed as parameters to kernel should only point to device memory. If it is pointing to host memory, it can crash your program. After kernel execution is completed, the result pointed by the device pointer can be copied back to host memory for further use. Starting only one thread for execution on the device is not the optimal use of device resources. Suppose you want to start multiple threads in parallel; what is the modification that you have to make in the syntax of the kernel call? This is addressed in the next section and is termed "configuring kernel parameters".

主站蜘蛛池模板: 土默特左旗| 卢氏县| 蕲春县| 东乡族自治县| 诸暨市| 乐亭县| 特克斯县| 贵州省| 武陟县| 和林格尔县| 洛南县| 牡丹江市| 栾城县| 达日县| 砀山县| 禄丰县| 宕昌县| 尉氏县| 达拉特旗| 都昌县| 梁山县| 若尔盖县| 新晃| 奈曼旗| 百色市| 沙坪坝区| 延长县| 通化县| 正镶白旗| 瑞安市| 霍邱县| 莱州市| 沾益县| 绥德县| 沙河市| 增城市| 资阳市| 邵阳市| 涟源市| 那曲县| 莎车县|