- Hands-On Natural Language Processing with PyTorch 1.x
- Thomas Dop
- 551字
- 2022-08-25 16:45:15
Enabling PyTorch acceleration using CUDA
One of the main benefits of PyTorch is its ability to enable acceleration through the use of a graphics processing unit (GPU). Deep learning is a computational task that is easily parallelizable, meaning that the calculations can be broken down into smaller tasks and calculated across many smaller processors. This means that instead of needing to execute the task on a single CPU, it is more efficient to perform the calculation on a GPU.
GPUs were originally created to efficiently render graphics, but since deep learning has grown in popularity, GPUs have been frequently used for their ability to perform multiple calculations simultaneously. While a traditional CPU may consist of around four or eight cores, a GPU consists of hundreds of smaller cores. Because calculations can be executed across all these cores simultaneously, GPUs can rapidly reduce the time taken to perform deep learning tasks.
Consider a single pass within a neural network. We may take a small batch of data, pass it through our network to obtain our loss, and then backpropagate, adjusting our parameters according to the gradients. If we have many batches of data to do this over, on a traditional CPU, we must wait until batch 1 has completed before we can compute this for batch 2:

Figure 2.7 – One pass in a neural network
However, on a GPU, we can perform all these steps simultaneously, meaning there is no requirement for batch 1 to finish before batch 2 can be started. We can calculate the parameter updates for all batches simultaneously and then perform all the parameter updates in one go (as the results are independent of one another). The parallel approach can vastly speed up the machine learning process:

Figure 2.8 – Parallel approach to perform passes
Compute Unified Device Architecture (CUDA) is the technology specific to Nvidia GPUs that enables hardware acceleration on PyTorch. In order to enable CUDA, we must first make sure the graphics card on our system is CUDA-compatible. A list of CUDA-compatible GPUs can be found here: https://developer.nvidia.com/cuda-gpus. If you have a CUDA-compatible GPU, then CUDA can be installed from this link: https://developer.nvidia.com/cuda-downloads. We will activate it using the following steps:
- Firstly, in order to actually enable CUDA support on PyTorch, you will have to build PyTorch from source. Details about how this can be done can be found here: https://github.com/pytorch/pytorch#from-source.
- Then, to actually CUDA within our PyTorch code, we must type the following into our Python code:
cuda = torch.device('cuda')
This sets our default CUDA device's name to 'cuda'.
- We can then execute operations on this device by manually specifying the device argument in any tensor operations:
x = torch.tensor([5., 3.], device=cuda)
Alternatively, we can do this by calling the cuda method:
y = torch.tensor([4., 2.]).cuda()
- We can then run a simple operation to ensure this is working correctly:
x*y
This results in the following output:

Figure 2.9 – Tensor multiplication output using CUDA
The changes in speed will not be noticeable at this stage as we are just creating a tensor, but when we begin training models at scale later, we will see the speed benefits of parallelizing our computations using CUDA. By training our models in parallel, we will be able to reduce the time this takes by a considerable amount.
- 筆記本電腦使用、維護與故障排除實戰
- Learning SQL Server Reporting Services 2012
- 圖解西門子S7-200系列PLC入門
- OpenGL Game Development By Example
- 計算機組裝維修與外設配置(高等職業院校教改示范教材·計算機系列)
- 龍芯自主可信計算及應用
- Internet of Things Projects with ESP32
- 筆記本電腦芯片級維修從入門到精通(圖解版)
- 嵌入式系統原理及應用:基于ARM Cortex-M4體系結構
- 計算機電路基礎(第2版)
- 微服務實戰(Dubbox +Spring Boot+Docker)
- Deep Learning with Keras
- 分布式存儲系統:核心技術、系統實現與Go項目實戰
- 計算機應用基礎案例教程(Windows 7+Office 2010)
- 微服務架構實戰:基于Spring Boot、Spring Cloud、Docker