官术网_书友最值得收藏!

Threads, Synchronization, and Memory

In the last chapter, we saw how to write CUDA programs that leverage the processing capabilities of a GPU by executing multiple threads and blocks in parallel. In all programs, until the last chapter, all threads were independent of each other and there was no communication between multiple threads. Most of the real-life applications need communication between intermediate threads. So, in this chapter, we will look in detail at how communication between different threads can be done, and explain the synchronization between multiple threads working on the same data. We will examine the hierarchical memory architecture of a CUDA and how different memories can be used to accelerate CUDA programs. The last part of this chapter explains a very useful application of a CUDA in the dot product of vectors and matrix multiplication, using all the concepts we have covered earlier.

The following topics will be covered in this chapter:

  • Thread calls
  • CUDA memory architecture
  • Global, local, and cache memory
  • Shared memory and thread synchronization
  • Atomic operations
  • Constant and texture memory
  • Dot product and a matrix multiplication example

主站蜘蛛池模板: 灵宝市| 和林格尔县| 宝鸡市| 长治县| 石渠县| 仁化县| 镇远县| 攀枝花市| 铜川市| 武威市| 和硕县| 尖扎县| 南和县| 且末县| 兴义市| 江津市| 诸暨市| 紫金县| 贵港市| 陈巴尔虎旗| 吉林省| 固安县| 扶沟县| 云和县| 汕尾市| 甘肃省| 信丰县| 铜川市| 阿克陶县| 莱阳市| 北流市| 阿拉尔市| 正定县| 云和县| 邹平县| 兴山县| 广德县| 玉溪市| 尖扎县| 河北省| 时尚|