- Hands-On GPU Programming with Python and CUDA
- Dr. Brian Tuomanen
- 638字
- 2021-06-10 19:25:40
Querying your GPU with PyCUDA
Now, finally, we will begin our foray into the world of GPU programming by writing our own version of deviceQuery in Python. Here, we will primarily concern ourselves with only the amount of available memory on the device, the compute capability, the number of multiprocessors, and the total number of CUDA cores.
We will begin by initializing CUDA as follows:
import pycuda.driver as drv
drv.init()
We can now immediately check how many GPU devices we have on our host computer with this line:
print 'Detected {} CUDA Capable device(s)'.format(drv.Device.count())
Let's type this into IPython and see what happens:

Great! So far, I have verified that my laptop does indeed have one GPU in it. Now, let's extract some more interesting information about this GPU (and any other GPU on the system) by adding a few more lines of code to iterate over each device that can be individually accessed with pycuda.driver.Device (indexed by number). The name of the device (for example, GeForce GTX 1050) is given by the name function. We then get the compute capability of the device with the compute_capability function and total amount of device memory with the total_memory function.
Here's how we will write it:
for i in range(drv.Device.count()):
gpu_device = drv.Device(i)
print 'Device {}: {}'.format( i, gpu_device.name() )
compute_capability = float( '%d.%d' % gpu_device.compute_capability() )
print '\t Compute Capability: {}'.format(compute_capability)
print '\t Total Memory: {} megabytes'.format(gpu_device.total_memory()//(1024**2))
Now, we are ready to look at some of the remaining attributes of our GPU, which PyCUDA yields to us in the form of a Python dictionary type. We will use the following lines to convert this into a dictionary that is indexed by strings indicating attributes:
device_attributes_tuples = gpu_device.get_attributes().iteritems()
device_attributes = {}
for k, v in device_attributes_tuples:
device_attributes[str(k)] = v
We can now determine the number of multiprocessors on our device with the following:
num_mp = device_attributes['MULTIPROCESSOR_COUNT']
A GPU divides its individual cores up into larger units known as Streaming Multiprocessors (SMs); a GPU device will have several SMs, which will each individually have a particular number of CUDA cores, depending on the compute capability of the device. To be clear: the number of cores per multiprocessor is not indicated directly by the GPU—this is given to us implicitly by the compute capability. We will have to look up some technical documents from NVIDIA to determine the number of cores per multiprocessor (see http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities), and then create a lookup table to give us the number of cores per multiprocessor. We do so as such, using the compute_capability variable to look up the number of cores:
cuda_cores_per_mp = { 5.0 : 128, 5.1 : 128, 5.2 : 128, 6.0 : 64, 6.1 : 128, 6.2 : 128}[compute_capability]
We can now finally determine the total number of cores on our device by multiplying these two numbers:
print '\t ({}) Multiprocessors, ({}) CUDA Cores / Multiprocessor: {} CUDA Cores'.format(num_mp, cuda_cores_per_mp, num_mp*cuda_cores_per_mp)
We now can finish up our program by iterating over the remaining keys in our dictionary and printing the corresponding values:
device_attributes.pop('MULTIPROCESSOR_COUNT')
for k in device_attributes.keys():
print '\t {}: {}'.format(k, device_attributes[k])
So, now we finally completed our first true GPU program of the text! (Also available at https://github.com/PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA/blob/master/3/deviceQuery.py). Now, we can run it as follows:

We can now have a little pride that we can indeed write a program to query our GPU! Now, let's actually begin to learn to use our GPU, rather than just observe it.
- Cybersecurity:Attack and Defense Strategies
- Google系統架構解密:構建安全可靠的系統
- Learn Helm
- 嵌入式Linux系統開發:基于Yocto Project
- 新手易學:系統安裝與重裝
- RESS Essentials
- Instant Optimizing Embedded Systems using Busybox
- Windows Server 2012網絡操作系統企業應用案例詳解
- 網絡操作系統管理與應用(第三版)
- Linux自動化運維:Shell與Ansible(微課版)
- VMware NSX Cookbook
- Fedora 12 Linux應用基礎
- Linux系統最佳實踐工具:命令行技術
- 電腦辦公(Windows 7 + Office 2013)入門與提高
- 微軟360度