Contents

Figures

Tables

Listings

Foreword

Preface

Acknowledgments

About the Authors

Part I The OpenCL 1.1 Language and API

1. An Introduction to OpenCL

What Is OpenCL, or . . . Why You Need This Book

Our Many-Core Future: Heterogeneous Platforms

Software in a Many-Core World

Conceptual Foundations of OpenCL

Platform Model

Execution Model

Memory Model

Programming Models

OpenCL and Graphics

The Contents of OpenCL

Platform API

Runtime API

Kernel Programming Language

OpenCL Summary

The Embedded Profile

Learning OpenCL

2. HelloWorld: An OpenCL Example

Building the Examples

Prerequisites

Mac OS X and Code::Blocks

Microsoft Windows and Visual Studio

Linux and Eclipse

HelloWorld Example

Choosing an OpenCL Platform and Creating a Context

Choosing a Device and Creating a Command-Queue

Creating and Building a Program Object

Creating Kernel and Memory Objects

Executing a Kernel

Checking for Errors in OpenCL

3. Platforms, Contexts, and Devices

OpenCL Platforms

OpenCL Devices

OpenCL Contexts

4. Programming with OpenCL C

Writing a Data-Parallel Kernel Using OpenCL C

Scalar Data Types

The half Data Type

Vector Data Types

Vector Literals

Vector Components

Other Data Types

Derived Types

Implicit Type Conversions

Usual Arithmetic Conversions

Explicit Casts

Explicit Conversions

Reinterpreting Data as Another Type

Vector Operators

Arithmetic Operators

Relational and Equality Operators

Bitwise Operators

Logical Operators

Conditional Operator

Shift Operators

Unary Operators

Assignment Operator

Qualifiers

Function Qualifiers

Kernel Attribute Qualifiers

Address Space Qualifiers

Access Qualifiers

Type Qualifiers

Keywords

Preprocessor Directives and Macros

Pragma Directives

Macros

Restrictions

5. OpenCL C Built-In Functions

Work-Item Functions

Math Functions

Floating-Point Pragmas

Floating-Point Constants

Relative Error as ulps

Integer Functions

Common Functions

Geometric Functions

Relational Functions

Vector Data Load and Store Functions

Synchronization Functions

Async Copy and Prefetch Functions

Atomic Functions

Miscellaneous Vector Functions

Image Read and Write Functions

Reading from an Image

Samplers

Determining the Border Color

Writing to an Image

Querying Image Information

6. Programs and Kernels

Program and Kernel Object Overview

Program Objects

Creating and Building Programs

Program Build Options

Creating Programs from Binaries

Managing and Querying Programs

Kernel Objects

Creating Kernel Objects and Setting Kernel Arguments

Thread Safety

Managing and Querying Kernels

7. Buffers and Sub-Buffers

Memory Objects, Buffers, and Sub-Buffers Overview

Creating Buffers and Sub-Buffers

Querying Buffers and Sub-Buffers

Reading, Writing, and Copying Buffers and Sub-Buffers

Mapping Buffers and Sub-Buffers

8. Images and Samplers

Image and Sampler Object Overview

Creating Image Objects

Image Formats

Querying for Image Support

Creating Sampler Objects

OpenCL C Functions for Working with Images

Transferring Image Objects

9. Events

Commands, Queues, and Events Overview

Events and Command-Queues

Event Objects

Generating Events on the Host

Events Impacting Execution on the Host

Using Events for Profiling

Events Inside Kernels

Events from Outside OpenCL

10. Interoperability with OpenGL

OpenCL/OpenGL Sharing Overview

Querying for the OpenGL Sharing Extension

Initializing an OpenCL Context for OpenGL Interoperability

Creating OpenCL Buffers from OpenGL Buffers

Creating OpenCL Image Objects from OpenGL Textures

Querying Information about OpenGL Objects

Synchronization between OpenGL and OpenCL

11. Interoperability with Direct3D

Direct3D/OpenCL Sharing Overview

Initializing an OpenCL Context for Direct3D Interoperability

Creating OpenCL Memory Objects from Direct3D Buffers and Textures

Acquiring and Releasing Direct3D Objects in OpenCL

Processing a Direct3D Texture in OpenCL

Processing D3D Vertex Data in OpenCL

12. C++ Wrapper API

C++ Wrapper API Overview

C++ Wrapper API Exceptions

Vector Add Example Using the C++ Wrapper API

Choosing an OpenCL Platform and Creating a Context

Choosing a Device and Creating a Command-Queue

Creating and Building a Program Object

Creating Kernel and Memory Objects

Executing the Vector Add Kernel

13. OpenCL Embedded Profile

OpenCL Profile Overview

64-Bit Integers

Images

Built-In Atomic Functions

Mandated Minimum Single-Precision Floating-Point Capabilities

Determining the Profile Supported by a Device in an OpenCL C Program

Part II OpenCL 1.1 Case Studies

14. Image Histogram

Computing an Image Histogram

Parallelizing the Image Histogram

Additional Optimizations to the Parallel Image Histogram

Computing Histograms with Half-Float or Float Values for Each Channel

15. Sobel Edge Detection Filter

What Is a Sobel Edge Detection Filter?

Implementing the Sobel Filter as an OpenCL Kernel

16. Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm

Graph Data Structures

Kernels

Leveraging Multiple Compute Devices

17. Cloth Simulation in the Bullet Physics SDK

An Introduction to Cloth Simulation

Simulating the Soft Body

Executing the Simulation on the CPU

Changes Necessary for Basic GPU Execution

Two-Layered Batching

Optimizing for SIMD Computation and Local Memory

Adding OpenGL Interoperation

18. Simulating the Ocean with Fast Fourier Transform

An Overview of the Ocean Application

Phillips Spectrum Generation

An OpenCL Discrete Fourier Transform

Determining 2D Decomposition

Using Local Memory

Determining the Sub-Transform Size

Determining the Work-Group Size

Obtaining the Twiddle Factors

Determining How Much Local Memory Is Needed

Avoiding Local Memory Bank Conflicts

Using Images

A Closer Look at the FFT Kernel

A Closer Look at the Transpose Kernel

19. Optical Flow

Optical Flow Problem Overview

Sub-Pixel Accuracy with Hardware Linear Interpolation

Application of the Texture Cache

Using Local Memory

Early Exit and Hardware Scheduling

Efficient Visualization with OpenGL Interop

Performance

20. Using OpenCL with PyOpenCL

Introducing PyOpenCL

Running the PyImageFilter2D Example

PyImageFilter2D Code

Context and Command-Queue Creation

Loading to an Image Object

Creating and Building a Program

Setting Kernel Arguments and Executing a Kernel

Reading the Results

21. Matrix Multiplication with OpenCL

The Basic Matrix Multiplication Algorithm

A Direct Translation into OpenCL

Increasing the Amount of Work per Kernel

Optimizing Memory Movement: Local Memory

Performance Results and Optimizing the Original CPU Code

22. Sparse Matrix-Vector Multiplication

Sparse Matrix-Vector Multiplication (SpMV) Algorithm

Description of This Implementation

Tiled and Packetized Sparse Matrix Representation

Header Structure

Tiled and Packetized Sparse Matrix Design Considerations

Optional Team Information

Tested Hardware Devices and Results

Additional Areas of Optimization

A. Summary of OpenCL 1.1

The OpenCL Platform Layer

Contexts

Querying Platform Information and Devices

The OpenCL Runtime

Command-Queues

Buffer Objects

Create Buffer Objects

Read, Write, and Copy Buffer Objects

Map Buffer Objects

Manage Buffer Objects

Query Buffer Objects

Program Objects

Create Program Objects

Build Program Executable

Build Options

Query Program Objects

Unload the OpenCL Compiler

Kernel and Event Objects

Create Kernel Objects

Kernel Arguments and Object Queries

Execute Kernels

Event Objects

Out-of-Order Execution of Kernels and Memory Object Commands

Profiling Operations

Flush and Finish

Supported Data Types

Built-In Scalar Data Types

Built-In Vector Data Types

Other Built-In Data Types

Reserved Data Types

Vector Component Addressing

Vector Components

Vector Addressing Equivalencies

Conversions and Type Casting Examples

Operators

Address Space Qualifiers

Function Qualifiers

Preprocessor Directives and Macros

Specify Type Attributes

Math Constants

Work-Item Built-In Functions

Integer Built-In Functions

Common Built-In Functions

Math Built-In Functions

Geometric Built-In Functions

Relational Built-In Functions

Vector Data Load/Store Functions

Atomic Functions

Async Copies and Prefetch Functions

Synchronization, Explicit Memory Fence

Miscellaneous Vector Built-In Functions

Image Read and Write Built-In Functions

Image Objects

Create Image Objects

Query List of Supported Image Formats

Copy between Image, Buffer Objects

Map and Unmap Image Objects

Read, Write, Copy Image Objects

Query Image Objects

Image Formats

Access Qualifiers

Sampler Objects

Sampler Declaration Fields

OpenCL Device Architecture Diagram

OpenCL/OpenGL Sharing APIs

CL Buffer Objects > GL Buffer Objects

CL Image Objects > GL Textures

CL Image Objects > GL Renderbuffers

Query Information

Share Objects

CL Event Objects > GL Sync Objects

CL Context > GL Context, Sharegroup

OpenCL/Direct3D 10 Sharing APIs

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset