官术网_书友最值得收藏!

CUDA program structure

We have seen a very simple Hello, CUDA! program earlier, that showcased some important concepts related to CUDA programs. A CUDA program is a combination of functions that are executed either on the host or on the GPU device. The functions that do not exhibit parallelism are executed on the CPU, and the functions that exhibit data parallelism are executed on the GPU. The GPU compiler segregates these functions during compilation. As seen in the previous chapter, functions meant for execution on the device are defined using the __global__ keyword and compiled by the NVCC compiler, while normal C host code is compiled by the C compiler. A CUDA code is basically the same ANSI C code with the addition of some keywords needed for exploiting data parallelism.

So, in this section, a simple two-variable addition program is taken to explain important concepts related to CUDA programming, such as kernel calls, passing parameters to kernel functions from host to device, the configuration of kernel parameters, CUDA APIs needed to exploit data parallelism, and how memory allocation takes place on the host and the device. 

主站蜘蛛池模板: 礼泉县| 长泰县| 时尚| 衡阳市| 高淳县| 嫩江县| 论坛| 永州市| 兴城市| 清新县| 香河县| 武威市| 肥城市| 土默特左旗| 阜平县| 湘乡市| 清镇市| 子洲县| 农安县| 桐庐县| 邓州市| 双柏县| 侯马市| 义马市| 周宁县| 商水县| 茶陵县| 秭归县| 商丘市| 阳曲县| 石楼县| 文水县| 合作市| 威信县| 柳州市| 康平县| 绥芬河市| 凌源市| 枣强县| 宝鸡市| 夏津县|