官术网_书友最值得收藏!

A kernel call

The device code that is written using ANSI C keywords along with CUDA extension keywords is called a kernel. It is launched from the host code by a method called kernel call. Basically, the meaning of kernel call is that we are launching device code from the host code. A kernel call typically generates a large number of blocks and threads to exploit data parallelism on the GPU. Kernel code is very similar to normal C functions; it is just that this code is executed by several threads in parallel. It has a very weird syntax, which is as follows:

kernel << <number of blocks, number of threads per block, size of shared memory > >> (parameters for kernel)

It starts with the name of the kernel that we want to launch. You should make sure that this kernel is defined using the __global__  keyword. Then, it has the << < > >> kernel launch operator that contains configuration parameters for kernel. It can include three parameters separated by a comma. The first parameter indicates the number of blocks you want to execute, and the second parameter indicates the number of threads each block will have. So, the total number of threads started by a kernel launch will be the product of these two numbers. The third parameter, which specifies the size of shared memory used by the kernel, is optional. In the program for variable addition, the kernel launch syntax is as follows:

gpuAdd << <1,1> >> (1 , 4, d_c)

Here, gpuAdd is the name of a kernel that we want to launch, and <<<1,1>>> indicates we want to start one block with one thread per block, which means that we are starting only one thread. Three arguments in round brackets are the parameters that are passed to the kernel. Here, we are passing two constants,  1 and 4. The third parameter is a pointer to device memory d_c.  It points at the location on device memory where the kernel will store its answer after addition. One thing that the programmer has to keep in mind is that pointers passed as parameters to kernel should only point to device memory. If it is pointing to host memory, it can crash your program. After kernel execution is completed, the result pointed by the device pointer can be copied back to host memory for further use. Starting only one thread for execution on the device is not the optimal use of device resources. Suppose you want to start multiple threads in parallel; what is the modification that you have to make in the syntax of the kernel call? This is addressed in the next section and is termed "configuring kernel parameters".

主站蜘蛛池模板: 阳城县| 乌兰县| 玛多县| 万盛区| 长顺县| 健康| 曲水县| 郴州市| 溆浦县| 衡南县| 河曲县| 台东市| 当雄县| 钟祥市| 米林县| 望奎县| 蓝田县| 方正县| 东莞市| 涿鹿县| 安庆市| 福贡县| 云林县| 延庆县| 偏关县| 永春县| 博罗县| 班玛县| 安平县| 疏附县| 九台市| 太仆寺旗| 涡阳县| 汉川市| 肥西县| 建昌县| 饶河县| 镶黄旗| 额济纳旗| 桓仁| 尖扎县|