官术网_书友最值得收藏!

CUDA program structure

We have seen a very simple Hello, CUDA! program earlier, that showcased some important concepts related to CUDA programs. A CUDA program is a combination of functions that are executed either on the host or on the GPU device. The functions that do not exhibit parallelism are executed on the CPU, and the functions that exhibit data parallelism are executed on the GPU. The GPU compiler segregates these functions during compilation. As seen in the previous chapter, functions meant for execution on the device are defined using the __global__ keyword and compiled by the NVCC compiler, while normal C host code is compiled by the C compiler. A CUDA code is basically the same ANSI C code with the addition of some keywords needed for exploiting data parallelism.

So, in this section, a simple two-variable addition program is taken to explain important concepts related to CUDA programming, such as kernel calls, passing parameters to kernel functions from host to device, the configuration of kernel parameters, CUDA APIs needed to exploit data parallelism, and how memory allocation takes place on the host and the device. 

主站蜘蛛池模板: 沙雅县| 达州市| 桂平市| 巴马| 万山特区| 平昌县| 高州市| 临沂市| 印江| 清新县| 东丰县| 卓尼县| 汪清县| 定西市| 泰兴市| 清镇市| 松原市| 大竹县| 曲麻莱县| 白沙| 麻阳| 株洲县| 张北县| 湖州市| 盐城市| 霞浦县| 连云港市| 抚顺县| 东乌珠穆沁旗| 万山特区| 会泽县| 孝感市| 惠安县| 连平县| 荥阳市| 宁晋县| 兴义市| 桐乡市| 河津市| 湟源县| 庄浪县|