官术网_书友最值得收藏!

Querying your GPU with PyCUDA

Now, finally, we will begin our foray into the world of GPU programming by writing our own version of deviceQuery in Python. Here, we will primarily concern ourselves with only the amount of available memory on the device, the compute capability, the number of multiprocessors, and the total number of CUDA cores.

We will begin by initializing CUDA as follows:

import pycuda.driver as drv
drv.init()
Note that we will always have to initialize PyCUDA with pycuda.driver.init() or by importing the PyCUDA autoinit submodule with import pycuda.autoinit!

We can now immediately check how many GPU devices we have on our host computer with this line:

print 'Detected {} CUDA Capable device(s)'.format(drv.Device.count())

Let's type this into IPython and see what happens:

Great! So far, I have verified that my laptop does indeed have one GPU in it. Now, let's extract some more interesting information about this GPU (and any other GPU on the system) by adding a few more lines of code to iterate over each device that can be individually accessed with pycuda.driver.Device (indexed by number). The name of the device (for example, GeForce GTX 1050) is given by the name function. We then get the compute capability of the device with the compute_capability function and total amount of device memory with the total_memory function. 

Compute capability can be thought of as a version number for each NVIDIA GPU architecture; this will give us some important information about the device that we can't otherwise query, as we will see in a minute.

Here's how we will write it:

for i in range(drv.Device.count()):

gpu_device = drv.Device(i)
print 'Device {}: {}'.format( i, gpu_device.name() )
compute_capability = float( '%d.%d' % gpu_device.compute_capability() )
print '\t Compute Capability: {}'.format(compute_capability)
print '\t Total Memory: {} megabytes'.format(gpu_device.total_memory()//(1024**2))

Now, we are ready to look at some of the remaining attributes of our GPU, which PyCUDA yields to us in the form of a Python dictionary type. We will use the following lines to convert this into a dictionary that is indexed by strings indicating attributes:

    device_attributes_tuples = gpu_device.get_attributes().iteritems()
device_attributes = {}

for k, v in device_attributes_tuples:
device_attributes[str(k)] = v

We can now determine the number of multiprocessors on our device with the following:

    num_mp = device_attributes['MULTIPROCESSOR_COUNT']

A GPU divides its individual cores up into larger units known as Streaming Multiprocessors (SMs); a GPU device will have several SMs, which will each individually have a particular number of CUDA cores, depending on the compute capability of the device. To be clear: the number of cores per multiprocessor is not indicated directly by the GPU—this is given to us implicitly by the compute capability. We will have to look up some technical documents from NVIDIA to determine the number of cores per multiprocessor (see http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities), and then create a lookup table to give us the number of cores per multiprocessor. We do so as such, using the compute_capability variable to look up the number of cores:

    cuda_cores_per_mp = { 5.0 : 128, 5.1 : 128, 5.2 : 128, 6.0 : 64, 6.1 : 128, 6.2 : 128}[compute_capability]

We can now finally determine the total number of cores on our device by multiplying these two numbers:

    print '\t ({}) Multiprocessors, ({}) CUDA Cores / Multiprocessor: {} CUDA Cores'.format(num_mp, cuda_cores_per_mp, num_mp*cuda_cores_per_mp)

We now can finish up our program by iterating over the remaining keys in our dictionary and printing the corresponding values:

    device_attributes.pop('MULTIPROCESSOR_COUNT')

for k in device_attributes.keys():
print '\t {}: {}'.format(k, device_attributes[k])

So, now we finally completed our first true GPU program of the text! (Also available at https://github.com/PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA/blob/master/3/deviceQuery.py). Now, we can run it as follows: 

We can now have a little pride that we can indeed write a program to query our GPU! Now, let's actually begin to learn to use our GPU, rather than just observe it.

主站蜘蛛池模板: 滦平县| 阳高县| 慈利县| 泽普县| 灵丘县| 柞水县| 施甸县| 前郭尔| 长乐市| 中牟县| 常德市| 珠海市| 株洲县| 章丘市| 颍上县| 马公市| 兴业县| 金门县| 鹤岗市| 罗甸县| 义马市| 福鼎市| 胶南市| 利川市| 抚松县| 忻城县| 新疆| 双柏县| 鄂伦春自治旗| 大厂| 桃江县| 通江县| 东丽区| 锡林郭勒盟| 渝中区| 南投市| 阿城市| 安图县| 香港 | 晋城| 中牟县|