舉報

會員
Hands-On GPU Programming with Python and CUDA
Hands-OnGPUProgrammingwithPythonandCUDAhitsthegroundrunning:you’llstartbylearninghowtoapplyAmdahl’sLaw,useacodeprofilertoidentifybottlenecksinyourPythoncode,andsetupanappropriateGPUprogrammingenvironment.You’llthenseehowto"query"theGPU’sfeaturesandcopyarraysofdatatoandfromtheGPU’sownmemory.Asyoumakeyourwaythroughthebook,you’lllaunchcodedirectlyontotheGPUandwritefullblownGPUkernelsanddevicefunctionsinCUDAC.You’llgettogripswithprofilingGPUcodeeffectivelyandfullytestanddebugyourcodeusingNsightIDE.Next,you’llexploresomeofthemorewell-knownNVIDIAlibraries,suchascuFFTandcuBLAS.Withasolidbackgroundinplace,youwillnowapplyyournew-foundknowledgetodevelopyourveryownGPU-baseddeepneuralnetworkfromscratch.You’llthenexploreadvancedtopics,suchaswarpshuffling,dynamicparallelism,andPTXassembly.Inthefinalchapter,you’llseesometopicsandapplicationsrelatedtoGPUprogrammingthatyoumaywishtopursue,includingAI,graphics,andblockchain.Bytheendofthisbook,youwillbeabletoapplyGPUprogrammingtoproblemsrelatedtodatascienceandhigh-performancecomputing.
目錄(201章)
倒序
- coverpage
- Title Page
- Dedication
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Why GPU Programming?
- Technical requirements
- Parallelization and Amdahl's Law
- Using Amdahl's Law
- The Mandelbrot set
- Profiling your code
- Using the cProfile module
- Summary
- Questions
- Setting Up Your GPU Programming Environment
- Technical requirements
- Ensuring that we have the right hardware
- Checking your hardware (Linux)
- Checking your hardware (windows)
- Installing the GPU drivers
- Installing the GPU drivers (Linux)
- Installing the GPU drivers (Windows)
- Setting up a C++ programming environment
- Setting up GCC Eclipse IDE and graphical dependencies (Linux)
- Setting up Visual Studio (Windows)
- Installing the CUDA Toolkit
- Installing the CUDA Toolkit (Linux)
- Installing the CUDA Toolkit (Windows)
- Setting up our Python environment for GPU programming
- Installing PyCUDA (Linux)
- Creating an environment launch script (Windows)
- Installing PyCUDA (Windows)
- Testing PyCUDA
- Summary
- Questions
- Getting Started with PyCUDA
- Technical requirements
- Querying your GPU
- Querying your GPU with PyCUDA
- Using PyCUDA's gpuarray class
- Transferring data to and from the GPU with gpuarray
- Basic pointwise arithmetic operations with gpuarray
- A speed test
- Using PyCUDA's ElementWiseKernel for performing pointwise computations
- Mandelbrot revisited
- A brief foray into functional programming
- Parallel scan and reduction kernel basics
- Summary
- Questions
- Kernels Threads Blocks and Grids
- Technical requirements
- Kernels
- The PyCUDA SourceModule function
- Threads blocks and grids
- Conway's game of life
- Thread synchronization and intercommunication
- Using the __syncthreads() device function
- Using shared memory
- The parallel prefix algorithm
- The naive parallel prefix algorithm
- Inclusive versus exclusive prefix
- A work-efficient parallel prefix algorithm
- Work-efficient parallel prefix (up-sweep phase)
- Work-efficient parallel prefix (down-sweep phase)
- Work-efficient parallel prefix — implementation
- Summary
- Questions
- Streams Events Contexts and Concurrency
- Technical requirements
- CUDA device synchronization
- Using the PyCUDA stream class
- Concurrent Conway's game of life using CUDA streams
- Events
- Events and streams
- Contexts
- Synchronizing the current context
- Manual context creation
- Host-side multiprocessing and multithreading
- Multiple contexts for host-side concurrency
- Summary
- Questions
- Debugging and Profiling Your CUDA Code
- Technical requirements
- Using printf from within CUDA kernels
- Using printf for debugging
- Filling in the gaps with CUDA-C
- Using the Nsight IDE for CUDA-C development and debugging
- Using Nsight with Visual Studio in Windows
- Using Nsight with Eclipse in Linux
- Using Nsight to understand the warp lockstep property in CUDA
- Using the NVIDIA nvprof profiler and Visual Profiler
- Summary
- Questions
- Using the CUDA Libraries with Scikit-CUDA
- Technical requirements
- Installing Scikit-CUDA
- Basic linear algebra with cuBLAS
- Level-1 AXPY with cuBLAS
- Other level-1 cuBLAS functions
- Level-2 GEMV in cuBLAS
- Level-3 GEMM in cuBLAS for measuring GPU performance
- Fast Fourier transforms with cuFFT
- A simple 1D FFT
- Using an FFT for convolution
- Using cuFFT for 2D convolution
- Using cuSolver from Scikit-CUDA
- Singular value decomposition (SVD)
- Using SVD for Principal Component Analysis (PCA)
- Summary
- Questions
- The CUDA Device Function Libraries and Thrust
- Technical requirements
- The cuRAND device function library
- Estimating π with Monte Carlo
- The CUDA Math API
- A brief review of definite integration
- Computing definite integrals with the Monte Carlo method
- Writing some test cases
- The CUDA Thrust library
- Using functors in Thrust
- Summary
- Questions
- Implementation of a Deep Neural Network
- Technical requirements
- Artificial neurons and neural networks
- Implementing a dense layer of artificial neurons
- Implementation of the softmax layer
- Implementation of Cross-Entropy loss
- Implementation of a sequential network
- Implementation of inference methods
- Gradient descent
- Conditioning and normalizing data
- The Iris dataset
- Summary
- Questions
- Working with Compiled GPU Code
- Launching compiled code with Ctypes
- The Mandelbrot set revisited (again)
- Compiling the code and interfacing with Ctypes
- Compiling and launching pure PTX code
- Writing wrappers for the CUDA Driver API
- Using the CUDA Driver API
- Summary
- Questions
- Performance Optimization in CUDA
- Dynamic parallelism
- Quicksort with dynamic parallelism
- Vectorized data types and memory access
- Thread-safe atomic operations
- Warp shuffling
- Inline PTX assembly
- Performance-optimized array sum
- Summary
- Questions
- Where to Go from Here
- Furthering your knowledge of CUDA and GPGPU programming
- Multi-GPU systems
- Cluster computing and MPI
- OpenCL and PyOpenCL
- Graphics
- OpenGL
- DirectX 12
- Vulkan
- Machine learning and computer vision
- The basics
- cuDNN
- Tensorflow and Keras
- Chainer
- OpenCV
- Blockchain technology
- Summary
- Questions
- Assessment
- Chapter 1 Why GPU Programming?
- Chapter 2 Setting Up Your GPU Programming Environment
- Chapter 3 Getting Started with PyCUDA
- Chapter 4 Kernels Threads Blocks and Grids
- Chapter 5 Streams Events Contexts and Concurrency
- Chapter 6 Debugging and Profiling Your CUDA Code
- Chapter 7 Using the CUDA Libraries with Scikit-CUDA
- Chapter 8 The CUDA Device Function Libraries and Thrust
- Chapter 9 Implementation of a Deep Neural Network
- Chapter 10 Working with Compiled GPU Code
- Chapter 11 Performance Optimization in CUDA
- Chapter 12 Where to Go from Here
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-10 19:26:12
推薦閱讀
- 30天自制操作系統(tǒng)
- Linux操作系統(tǒng)基礎(chǔ)
- Learning Windows Server Containers
- 阿里云數(shù)字新基建系列:云原生操作系統(tǒng)Kubernetes
- Instant Handlebars.js
- SharePoint 2013 WCM Advanced Cookbook
- 嵌入式應(yīng)用程序設(shè)計綜合教程(微課版)
- 嵌入式系統(tǒng)原理及開發(fā)
- Application Development in iOS 7
- 一學(xué)就會:Windows Vista應(yīng)用完全自學(xué)手冊
- AWS SysOps Cookbook
- 嵌入式微系統(tǒng)
- 電腦辦公(Windows 7+Office 2016)入門與提高
- Azure Serverless Computing Cookbook
- Instant Getting Started with VMware Fusion
- Windows PE權(quán)威指南
- 操作系統(tǒng)實用教程
- 48小時精通SolidWorks 2014中文版鈑金設(shè)計技巧
- Command Line Fundamentals
- Linux Device Driver Development Cookbook
- 計算機應(yīng)用基礎(chǔ)(Windows 7+Office 2016)上機指導(dǎo)與習(xí)題集
- HTML5移動開發(fā)即學(xué)即用
- Apache SkyWalking實戰(zhàn)
- 恐龍星球:揭秘史前巨型殺手(修訂版)
- 零基礎(chǔ)趣學(xué)Linux
- React and React Native
- DevOps with Windows Server 2016
- Instant Simple Botting with PHP
- 深度探索Linux系統(tǒng)虛擬化:原理與實現(xiàn)
- Linux系統(tǒng)與網(wǎng)絡(luò)管理