Part I The OpenCL 1.1 Language and API
What Is OpenCL, or . . . Why You Need This Book
Our Many-Core Future: Heterogeneous Platforms
Conceptual Foundations of OpenCL
2. HelloWorld: An OpenCL Example
Microsoft Windows and Visual Studio
Choosing an OpenCL Platform and Creating a Context
Choosing a Device and Creating a Command-Queue
Creating and Building a Program Object
Creating Kernel and Memory Objects
3. Platforms, Contexts, and Devices
Writing a Data-Parallel Kernel Using OpenCL C
Reinterpreting Data as Another Type
Relational and Equality Operators
Preprocessor Directives and Macros
5. OpenCL C Built-In Functions
Vector Data Load and Store Functions
Async Copy and Prefetch Functions
Miscellaneous Vector Functions
Image Read and Write Functions
Program and Kernel Object Overview
Creating and Building Programs
Creating Programs from Binaries
Managing and Querying Programs
Creating Kernel Objects and Setting Kernel Arguments
Memory Objects, Buffers, and Sub-Buffers Overview
Creating Buffers and Sub-Buffers
Querying Buffers and Sub-Buffers
Reading, Writing, and Copying Buffers and Sub-Buffers
Mapping Buffers and Sub-Buffers
Image and Sampler Object Overview
OpenCL C Functions for Working with Images
Commands, Queues, and Events Overview
Events Impacting Execution on the Host
10. Interoperability with OpenGL
OpenCL/OpenGL Sharing Overview
Querying for the OpenGL Sharing Extension
Initializing an OpenCL Context for OpenGL Interoperability
Creating OpenCL Buffers from OpenGL Buffers
Creating OpenCL Image Objects from OpenGL Textures
Querying Information about OpenGL Objects
Synchronization between OpenGL and OpenCL
11. Interoperability with Direct3D
Direct3D/OpenCL Sharing Overview
Initializing an OpenCL Context for Direct3D Interoperability
Creating OpenCL Memory Objects from Direct3D Buffers and Textures
Acquiring and Releasing Direct3D Objects in OpenCL
Processing a Direct3D Texture in OpenCL
Processing D3D Vertex Data in OpenCL
Vector Add Example Using the C++ Wrapper API
Choosing an OpenCL Platform and Creating a Context
Choosing a Device and Creating a Command-Queue
Creating and Building a Program Object
Creating Kernel and Memory Objects
Executing the Vector Add Kernel
Mandated Minimum Single-Precision Floating-Point Capabilities
Determining the Profile Supported by a Device in an OpenCL C Program
Part II OpenCL 1.1 Case Studies
Parallelizing the Image Histogram
Additional Optimizations to the Parallel Image Histogram
Computing Histograms with Half-Float or Float Values for Each Channel
15. Sobel Edge Detection Filter
What Is a Sobel Edge Detection Filter?
Implementing the Sobel Filter as an OpenCL Kernel
16. Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm
Leveraging Multiple Compute Devices
17. Cloth Simulation in the Bullet Physics SDK
An Introduction to Cloth Simulation
Executing the Simulation on the CPU
Changes Necessary for Basic GPU Execution
Optimizing for SIMD Computation and Local Memory
18. Simulating the Ocean with Fast Fourier Transform
An Overview of the Ocean Application
An OpenCL Discrete Fourier Transform
Determining the Sub-Transform Size
Determining the Work-Group Size
Determining How Much Local Memory Is Needed
Avoiding Local Memory Bank Conflicts
A Closer Look at the FFT Kernel
A Closer Look at the Transpose Kernel
Sub-Pixel Accuracy with Hardware Linear Interpolation
Application of the Texture Cache
Early Exit and Hardware Scheduling
Efficient Visualization with OpenGL Interop
20. Using OpenCL with PyOpenCL
Running the PyImageFilter2D Example
Context and Command-Queue Creation
Creating and Building a Program
Setting Kernel Arguments and Executing a Kernel
21. Matrix Multiplication with OpenCL
The Basic Matrix Multiplication Algorithm
A Direct Translation into OpenCL
Increasing the Amount of Work per Kernel
Optimizing Memory Movement: Local Memory
Performance Results and Optimizing the Original CPU Code
22. Sparse Matrix-Vector Multiplication
Sparse Matrix-Vector Multiplication (SpMV) Algorithm
Description of This Implementation
Tiled and Packetized Sparse Matrix Representation
Tiled and Packetized Sparse Matrix Design Considerations
Tested Hardware Devices and Results
Additional Areas of Optimization
Querying Platform Information and Devices
Read, Write, and Copy Buffer Objects
Kernel Arguments and Object Queries
Out-of-Order Execution of Kernels and Memory Object Commands
Vector Addressing Equivalencies
Conversions and Type Casting Examples
Preprocessor Directives and Macros
Vector Data Load/Store Functions
Async Copies and Prefetch Functions
Synchronization, Explicit Memory Fence
Miscellaneous Vector Built-In Functions
Image Read and Write Built-In Functions
Query List of Supported Image Formats
Copy between Image, Buffer Objects
Read, Write, Copy Image Objects
OpenCL Device Architecture Diagram
CL Buffer Objects > GL Buffer Objects
CL Image Objects > GL Textures
CL Image Objects > GL Renderbuffers
CL Event Objects > GL Sync Objects
CL Context > GL Context, Sharegroup