官术网_书友最值得收藏!

  • Deep Learning Essentials
  • Wei Di Anurag Bhardwaj Jianing Wei
  • 481字
  • 2021-06-30 19:17:47

Deep learning with GPU

As the name suggests, deep learning involves learning a deeper representation of data, which requires large amounts of computational power. Such massive computational power is usually not possible with modern day CPUs. GPUs, on the other hand, lend themselves very nicely to this task. GPUs were originally designed for rendering graphics in real time. The design of a typical GPU allows for the disproportionately larger number of arithmetic logical unit (ALU), which allows them to crunch a large number of calculations in real time.

GPUs used for general purpose computation have a high data parallel architecture, which means they can process a large number of data points in parallel, leading to higher computational throughput. Each GPU is composed of thousands of cores. Each of such cores consists of a number of functional units which contain cache and ALU among other modules. Each of these functional units executes exactly the same instruction set thereby allowing for massive data parallelism in GPUs. In the next section, we compare and contrast the design of a GPU with CPU.

The following table illustrates the differences between the design of CPU with a GPU. As shown, GPUs are designed to execute a large number of threads optimized to execute an identical control logic. Hence, each of the GPU cores is rather simple in design. CPUs, on the other hand, are designed to operate with fewer cores but are the more general purpose. Their basic core design can handle highly complex control logic, which is usually not possible in GPUs. Hence CPUs can be thought of like a commodity processing unit as opposed to GPUs which are specialized units:

In terms of relative performance comparison, GPU's have a much lower latency than CPUs for performing high data parallel operations. This is also especially true if the GPU has enough device memory to load all the required data needed for peak load computation. However, for a head to head the number of core comparison, CPU's have a much lower latency as each CPU core is much more complex and has an advanced state control logic as opposed to a GPU.

As such, the design of the algorithm has a great bearing on potential benefits of using GPU versus CPU. The following table outlines what algorithms are a good choice for a GPU implementation. Erik Smistad and their co-authors outline five different factors that determine the suitability of the algorithm towards using a GPU–data parallelism, thread count, branch divergence, memory usage, and synchronization.

The table Factors affecting GPU Computing by Dutta-Roy illustrates the impact of all of these factors on the suitability of using a GPU. As shown following, any algorithm which fares under the High column is more suited to using a GPU than others:

Factors affecting GPU Computing (Source: Dutta Roy et al. https://medium.com/@taposhdr/gpu-s-have-become-the-new-core-for-image-analytics-b8ba8bd8d8f3)
主站蜘蛛池模板: 惠州市| 龙门县| 盐山县| 方城县| 万山特区| 石阡县| 河源市| 鄂伦春自治旗| 宜都市| 祁门县| 大理市| 富顺县| 德清县| 台湾省| 西林县| 南昌市| 巴彦县| 包头市| 黄陵县| 山东省| 民丰县| 桦南县| 基隆市| 公主岭市| 大邑县| 镇雄县| 大洼县| 名山县| 清涧县| 通渭县| 凉城县| 正蓝旗| 宁远县| 中山市| 乐业县| 巨野县| 长寿区| 富蕴县| 肥乡县| 克拉玛依市| 台中县|