官术网_书友最值得收藏!

Summary

To summarize, in this chapter, you were introduced to programming concepts in CUDA C and how parallel computing can be done using CUDA. It was shown that CUDA programs can run on any NVIDIA GPU hardware efficiently and in parallel. So, CUDA is both efficient and scalable. The CUDA API functions over and above existing ANSI C functions needed for parallel data computations were discussed in detail. How to call device code from the host code via a kernel call, configuring of kernel parameters, and a passing of parameters to the kernel were also discussed by taking a simple two-variable addition example. It was also shown that CUDA does not guarantee the order in which the blocks or thread will run and which block is assigned to which multi-processor in hardware. Moreover, vector operations, which take advantage of parallel-processing capabilities of GPU and CUDA, were discussed. It can be seen that, by performing vector operations on the GPU, it can improve the throughput drastically, compared to the CPU. In the last section, various common communication patterns followed in parallel programming were discussed in detail. Still, we have not discussed memory architecture and how threads can communicate with one another in CUDA. If one thread needs data of the other thread, then what can be done is also not discussed. So, in the next chapter, we will discuss memory architecture and thread synchronization in detail.

主站蜘蛛池模板: 株洲县| 龙岩市| 绥中县| 德阳市| 山西省| 中卫市| 利津县| 平谷区| 宜宾县| 屏边| 肃宁县| 扶沟县| 佛山市| 双桥区| 巩留县| 将乐县| 柞水县| 葫芦岛市| 临邑县| 泗水县| 厦门市| 泸溪县| 增城市| 淅川县| 易门县| 罗山县| 荆州市| 湘潭市| 潜江市| 吉木萨尔县| 轮台县| 盈江县| 辽宁省| 弥勒县| 南陵县| 赤壁市| 红河县| 包头市| 深圳市| 广水市| 民和|