0% found this document useful (0 votes)

235 views34 pages

OpenCL Programming Basics and Examples

OpenCL is a standard for parallel programming across CPUs, GPUs, and other processors. It uses a data-parallel programming model similar to CUDA. The OpenCL programming model uses work-items organized into work-groups that execute kernels. An OpenCL program consists of compiling and building a kernel program, creating kernel and memory objects, setting kernel arguments, enqueueing the kernel for execution, and copying data between the host and device.

Uploaded by

sam062

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

235 views34 pages

OpenCL Programming Basics and Examples

Uploaded by

sam062

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

OpenCL

These notes will introduce OpenCL

ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, [Link]

OpenCL
(Open Computing Language)
A standard based upon C for portable parallel applications Task parallel and data parallel applications Focuses on multi platform support (multiple CPUs, GPUs, )

Development initiated by Apple.

Developed by Khromos group who also managed OpenGL OpenCL 1.0 2008. Released with Max OS 10.6 (Snow Leopard) OpenCL 1.1 June 2010 Similarities with CUDA Implementation available for NVIDIA GPUs
Wikipedia OpenCL [Link] 2

OpenCL Programming Model

Uses data parallel programming model, similar to CUDA Host program launches kernel routines as in CUDA, but allows for just-in-time compilation during host execution. OpenCL work items corresponds to CUDA threads OpenCL work groups corresponds to CUDA thread blocks Work items in same work group can be synchronized with a barrier as in CUDA.
3

Sample OpenCL code to add two vectors

To illustrate OpenCL commands, will used OpenCl code to add two vectors, A and B which are transferred to the device (GPU) and the result, C, returned to host (CPU), similar to CUDA vector addition

Structure of OpenCL main program

Get information about platform and devices available on system

Select devices to use

Create an OpenCL command queue Create memory buffers on device

Transfer data from host to device memory buffers

Create kernel program object Build (compile) kernel in-line (or load precompiled binary) Create OpenCL kernel object Set kernel arguments

Execute kernel
Read kernel memory and copy to host memory. 5

Platform
"The host plus a collection of devices managed by the OpenCL framework that allow an application to share resources and execute kernels on devices in the platform." Platforms represented by a cl_platform object, initialized with clGetPlatformID()

[Link]

Simple code for identifying platform

//Platform
cl_platform_id platform;

clGetPlatformIDs (1, &platform, NULL);

Returns number of OpenCL platforms available. If NULL, ignored.

Number of platform entries

List of OpenCL platforms found. (Platform IDs) In our case just one platform, identified by &platform

Context
The environment within which the kernels execute and the domain in which synchronization and memory management is defined. The context includes a set of devices, the memory accessible to those devices, the corresponding memory properties and one or more command-queues used to schedule execution of a kernel(s) or operations on memory objects.

The OpenCL Specification version 1.1 [Link]

Code for context

//Context cl_context_properties props[3]; props[0] = (cl_context_properties) CL_CONTEXT_PLATFORM; props[1] = (cl_context_properties) platform; props[2] = (cl_context_properties) 0; cl_context GPUContext = clCreateContextFromType(props,CL_DEVICE_TYPE_GPU,NULL,NULL,NULL); //Context info size_t ParmDataBytes; clGetContextInfo(GPUContext,CL_CONTEXT_DEVICES,0,NULL,&ParmDataBytes); cl_device_id* GPUDevices = (cl_device_id*)malloc(ParmDataBytes); clGetContextInfo(GPUContext,CL_CONTEXT_DEVICES,ParmDataBytes,GPUDevices, NULL); 9

Command Queue
An object that holds commands that will be executed on a specific device.

The command-queue is created on a specific device in a context. Commands to a command-queue are queued in-order but may be executed in-order or out-of-order. ...

The OpenCL Specification version 1.1 [Link]

Simple code for creating a command queue

// Create command-queue cl_command_queue GPUCommandQueue = clCreateCommandQueue(GPUContext,GPUDevices[0],0,NULL);

Allocating memory on device

Use clCreatBuffer:
cl_mem clCreateBuffer(cl_context context, cl_mem_flags flags,
No of bytes in buffer memory object
Returns memory object OpenCL context, from clCreateContextFromType()

size_t size, void host_ptr, cl_int errcode_ret)

Bit field to specify type of allocation/usage (CL_MEM_READ_WRITE ,) Ptr to buffer data (May be previously allocated.)

Returns error code if an error

Sample code for allocating memory on device for source data

// source data on host, two vectors int *A, A = new B = new for(int A[i] B[i] } *B; int[N]; int[N]; i = 0; i < N; i++) { = rand()%1000; = rand()%1000;

// Allocate GPU memory for source vectors cl_mem GPUVector1 = clCreateBuffer(GPUContext,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,sizeof(int)*N, A, NULL); cl_mem GPUVector2 = clCreateBuffer(GPUContext,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,sizeof(int)*N, B, NULL); 13

Sample code for allocating memory on device for results on GPU

// Allocate GPU memory for output vector

cl_mem GPUOutputVector = clCreateBuffer(GPUContext,CL_MEM_WRITE_ONLY,sizeof(int)*N, NULL,NULL);

Kernel Program

Simple programs might be in the same file as the host code as our CUDA examples. In that case need to formed into strings in a character array. If in a separate file, can read that file into host program as a character string

OpenCL qualifier to indicate kernel code

Kernel program

If in same program as host, kernel needs to be strings (I think it can be a single string)

const char* OpenCLSource[] = { "__kernel void vectorAdd (const __global int* a,", " const __global int* b,", OpenCL qualifier to " __global int* c)", indicate kernel memory "{", (Memory objects " unsigned int gid = get_global_id(0);", allocated from global memory pool) " c[gid] = a[gid] + b[gid];", "}" };
Returns global work-item ID in given dimension (0 here)

int main(int argc, char **argv){ Double underscores optional in OpenCL qualifiers } 16

Kernel in a separate file

// Load the kernel source code into the array source_str FILE *fp; char *source_str; size_t source_size; fp = fopen("vector_add_kernel.cl", "r"); if (!fp) { fprintf(stderr, "Failed to load kernel.\n"); exit(1); } source_str = (char*)malloc(MAX_SOURCE_SIZE); source_size = fread( source_str, 1, MAX_SOURCE_SIZE, fp); fclose( fp );
[Link] 17

Create kernel program object

const char* OpenCLSource[] = { }; This example uses a single file for both host and kernel code. Can use clCreateprogramWithSource() with a separate kernel file read into host program

int main(int argc, char **argv) // Create OpenCL program object

Used to return error code if error

cl_program OpenCLProgram = clCreateProgramWithSource(GPUContext,7,OpenCLSource,NULL,NULL);

Number of strings in kernel program array Used if strings not null-terminated to given length of strings 18

Build kernel program

// Build the program (OpenCL JIT compilation) clBuildProgram(OpenCLProgram,0,NULL,NULL,NULL,NULL);
Arguments for notification routine Function ptr to notification routine called with build complete. Then clBuildProgram will return immediately, otherwise only when build complete 19

Program object from clCreateProgramwithSource

Number of devices

Build options

List of devices, if more than one

Creating Kernel Objects

// Create a handle to the compiled OpenCL function
cl_kernel OpenCLVectorAdd = clCreateKernel(OpenCLProgram, "vectorAdd", NULL);

Built prgram from clBuildProgram

Function name with __kernel qualifier

Return error code

Set Kernel Arguments

// Set kernel arguments
clSetKernelArg(OpenCLVectorAdd,0,sizeof(cl_mem), (void*)&GPUVector1); clSetKernelArg(OpenCLVectorAdd,1,sizeof(cl_mem), (void*)&GPUVector2);

clSetKernelArg(OpenCLVectorAdd,2,sizeof(cl_mem), (void*)&GPUOutputVector);
Pointer to data for argument, from clCreateBuffer() Kernel object from clCreateKernel() Which argument Size of argument

Enqueue a command to execute kernel on device

// Launch the kernel size_t WorkSize[1] = {N}; // Total number of work items size_t localWorkSize[1]={256}; //No of work items in work group

Kernel object from clCreatKernel()

// Launch the kernel

Dimensions of work items

clEnqueueNDRangeKernel(GPUCommandQueue,OpenCLVectorAdd,1,NULL, WorkSize, localWorkSize, 0, NULL, NULL);

Offset used with work item Event wait list Event

Array describing no of global work items

Array describing no of work items that make up a work group

Number of events to complete before this commands

Function to copy from buffer object to host memory

The following function enqueue commands to read from a buffer object to host memory:
cl_int clEnqueueReadBuffer (cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t cb, void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)

The OpenCL Specification version 1.1 [Link]

Function to copy from host memory to buffer object

The following function enqueue commands to write to a buffer object from host memory:
cl_int clEnqueueWriteBuffer (cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, size_t offset, size_t cb, const void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)

The OpenCL Specification version 1.1 [Link]

Copy data back from kernel

// Copy the output back to CPU memory
int *C; C = new int[N];
Command queue from clCreateCommandQueue Device buffer from clCreateBuffer

clEnqueueReadBuffer(GPUCommandQueue,GPUOutputVector, CL_TRUE, 0, N*sizeof(int), C, 0, NULL, NULL);

Number of Event wait events to list complete before this commands

Read is blocking

Byte Size of data to offset in read in bytes buffer

Pointer to buffer in host to write data

Event

Results from GPU

cout << "C[ << 0 << "]: " << A[0] <<"+"<< B[0] <<"=" << C[0]
<< "\n";

cout << "C[ << N-1 << "]: << A[N-1] << "+ << B[N-1] << "=" << C[N-1] << "\n";

C++ here
26

Clean-up
// Cleanup free(GPUDevices); clReleaseKernel(OpenCLVectorAdd); clReleaseProgram(OpenCLProgram); clReleaseCommandQueue(GPUCommandQueue); clReleaseContext(GPUContext); clReleaseMemObject(GPUVector1); clReleaseMemObject(GPUVector2); clReleaseMemObject(GPUOutputVector);

Compiling
Need OpenCL header: #include <CL/cl.h> (For mac: #include <OpenCL/opencl.h> ) and link to the OpenCL library. Compile OpenCL host program main.c using gcc, two phases:

gcc -c -I /path-to-include-dir-with-cl.h/ main.c -o main.o gcc -L /path-to-lib-folder-with-OpenCL-libfile/ -l OpenCL main.o -o host

Ref: [Link] 28

Make File
(Program called scalarmulocl)
CC = g++ LD = g++ -lm CFLAGS = -Wall -shared CDEBUG = LIBOCL = -L/nfs-home/mmishra2/NVIDIA_GPU_Computing_SDK/OpenCL/common/lib INCOCL = -I/nfs-home/mmishra2/NVIDIA_GPU_Computing_SDK/OpenCL/common/inc SRCS = [Link] OBJS = scalarmulocl.o EXE = scalarmulocl.a all: $(EXE) $(OBJS): $(SRCS) $(CC) $(CFLAGS) $(INCOCL) -I/usr/include -c $(SRCS) $(EXE): $(OBJS) $(LD) -L/usr/local/lib $(OBJS) $(LIBOCL) -o $(EXE) -l OpenCL clea: rm -f $(OBJS) *~ clear References: [Link] Submitted by: Manisha Mishra 29

Compiling and Executing the program

To compile: make
To Run: ./scalarmulocl.a Snapshot:

Submitted by: Manisha Mishra

Questions

More Information

Chapter 11 of Programming Massively Parallel Processors by D. B. Kirk and W-M W. Hwu, Morgan Kaufmann, 2010

clGetPlatformIDs
Obtain the list of platforms available.
cl_int clGetPlatformIDs(cl_uint num_entries, cl_platform_id *platforms, cl_uint *num_platforms)
Parameters

num_entries The number of cl_platform_id entries that can be added to platforms. If platforms is not NULL, the num_entries must be greater than zero.
platforms Returns a list of OpenCL platforms found. The cl_platform_id values returned in platforms can be used to identify a specific OpenCL platform. If platforms argument is NULL, this argument is ignored. The number of OpenCL platforms returned is the mininum of the value specified by num_entries or the number of OpenCL platforms available. num_platforms Returns the number of OpenCL platforms available. If num_platforms is NULL, this argument is ignored. [Link] 33

Includes

#include <stdio.h> #include <stdlib.h> #include <CL/cl.h> #include <iostream> using namespace std; //OpenCL header for C //C++ input/output

Introduction To OpenCL With Examples
No ratings yet
Introduction To OpenCL With Examples
128 pages
OpenCL Programming for Multi-Core Systems
No ratings yet
OpenCL Programming for Multi-Core Systems
80 pages
OpenCL Overview and Comparison Guide
No ratings yet
OpenCL Overview and Comparison Guide
74 pages
OpenCL Programming Guide
No ratings yet
OpenCL Programming Guide
19 pages
OpenCL Programming in ECE 459
No ratings yet
OpenCL Programming in ECE 459
47 pages
06-Intro To Opencl PDF
No ratings yet
06-Intro To Opencl PDF
57 pages
Introduction To OpenCL
No ratings yet
Introduction To OpenCL
44 pages
OpenCL Programming Guide and Tutorial
No ratings yet
OpenCL Programming Guide and Tutorial
13 pages
11 - OpenCL Fundamentals
No ratings yet
11 - OpenCL Fundamentals
253 pages
OpenCL Parallel Programming Overview
No ratings yet
OpenCL Parallel Programming Overview
31 pages
OpenCL Programming Model Overview
No ratings yet
OpenCL Programming Model Overview
38 pages
Hands On Opencl: Created by Simon Mcintosh-Smith and Tom Deakin
No ratings yet
Hands On Opencl: Created by Simon Mcintosh-Smith and Tom Deakin
258 pages
Opencl 1pp PDF
No ratings yet
Opencl 1pp PDF
48 pages
OpenCL Basics for GPU Programming
No ratings yet
OpenCL Basics for GPU Programming
16 pages
OpenCL Execution Model Overview
No ratings yet
OpenCL Execution Model Overview
41 pages
Introduction To OpenCL Programming (201005)
No ratings yet
Introduction To OpenCL Programming (201005)
132 pages
OpenCL: Heterogeneous Computing Guide
No ratings yet
OpenCL: Heterogeneous Computing Guide
12 pages
Introduction to OpenCL by Bannerman
No ratings yet
Introduction to OpenCL by Bannerman
18 pages
AdvancedOpenCL Full
No ratings yet
AdvancedOpenCL Full
101 pages
OpenCL and OpenGL Interoperability Guide
No ratings yet
OpenCL and OpenGL Interoperability Guide
18 pages
OpenCL Jumpstart Guide
No ratings yet
OpenCL Jumpstart Guide
17 pages
OpenCL C++ Example and .cl File Guide
No ratings yet
OpenCL C++ Example and .cl File Guide
28 pages
OpenCL GPU Acceleration in Linux
No ratings yet
OpenCL GPU Acceleration in Linux
8 pages
NVIDIA OpenCL JumpStart Guide
No ratings yet
NVIDIA OpenCL JumpStart Guide
15 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
OpenCL Basics and Programming Overview
No ratings yet
OpenCL Basics and Programming Overview
62 pages
GPU Programming Platforms Overview
No ratings yet
GPU Programming Platforms Overview
16 pages
Intro to CUDA C++ Programming Basics
No ratings yet
Intro to CUDA C++ Programming Basics
32 pages
Running CUDA on Rocks Cluster
No ratings yet
Running CUDA on Rocks Cluster
17 pages
OpenCL Runtime Command Queue Guide
No ratings yet
OpenCL Runtime Command Queue Guide
8 pages
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
No ratings yet
Opencl On Fpga: Marc Gaucheron INTEL Programmable Solution Group
128 pages
OpenCL GPU Matrix Multiplication Code
No ratings yet
OpenCL GPU Matrix Multiplication Code
19 pages
CUDA Shared Memory and RNG Techniques
No ratings yet
CUDA Shared Memory and RNG Techniques
21 pages
OpenCL Execution Model Overview
No ratings yet
OpenCL Execution Model Overview
8 pages
FPGA and OpenCL
No ratings yet
FPGA and OpenCL
31 pages
CUDA Vector Addition Guide
No ratings yet
CUDA Vector Addition Guide
38 pages
Introduction to CUDA Programming
No ratings yet
Introduction to CUDA Programming
24 pages
CUDA Programming Essentials
No ratings yet
CUDA Programming Essentials
29 pages
CUDA Exercises: Basic Applications
No ratings yet
CUDA Exercises: Basic Applications
185 pages
Introduction to CUDA C Programming
No ratings yet
Introduction to CUDA C Programming
24 pages
Introduction to GPU Programming Basics
No ratings yet
Introduction to GPU Programming Basics
27 pages
Optimizing CUDA Task Graphs
No ratings yet
Optimizing CUDA Task Graphs
41 pages
Intel FPGA OpenCL SDK Tutorial
No ratings yet
Intel FPGA OpenCL SDK Tutorial
18 pages
OpenGL vs. OpenCL: Key Differences
No ratings yet
OpenGL vs. OpenCL: Key Differences
9 pages
OpenCL Buffer Performance Analysis
No ratings yet
OpenCL Buffer Performance Analysis
3 pages
C/C++ GPU Toolchain Overview
No ratings yet
C/C++ GPU Toolchain Overview
24 pages
CUDA Data Structures Implementation
No ratings yet
CUDA Data Structures Implementation
13 pages
CUDA Programming Basics and History
No ratings yet
CUDA Programming Basics and History
44 pages
OpenCL Best Practices Guide
No ratings yet
OpenCL Best Practices Guide
54 pages
OpenCL Overview: Features and Limitations
No ratings yet
OpenCL Overview: Features and Limitations
17 pages
CUDA Programming: Vector Addition Guide
No ratings yet
CUDA Programming: Vector Addition Guide
5 pages
PgOpenCL: GPU Acceleration for PostgreSQL
No ratings yet
PgOpenCL: GPU Acceleration for PostgreSQL
29 pages
Overview of CUDA Architecture
No ratings yet
Overview of CUDA Architecture
26 pages
Introduction to CUDA C/C++ Basics
100% (1)
Introduction to CUDA C/C++ Basics
82 pages
Advanced CUDA Asynchronous Execution
No ratings yet
Advanced CUDA Asynchronous Execution
41 pages
EBCOT: Scalable Image Compression Algorithm
No ratings yet
EBCOT: Scalable Image Compression Algorithm
5 pages
Test
No ratings yet
Test
1 page
Properties of Fourier Transforms
No ratings yet
Properties of Fourier Transforms
10 pages
Rlocus
No ratings yet
Rlocus
15 pages
Get 12 2010
No ratings yet
Get 12 2010
4 pages
Microchip AN1468 Peripheral Brief Programmable Switch Mode Controller
No ratings yet
Microchip AN1468 Peripheral Brief Programmable Switch Mode Controller
16 pages
Error Codes and Troubleshooting Guide
No ratings yet
Error Codes and Troubleshooting Guide
101 pages
OWON 3-Year Warranty Overview
No ratings yet
OWON 3-Year Warranty Overview
24 pages
October 2010 Commercial Notebooks Pricing
No ratings yet
October 2010 Commercial Notebooks Pricing
16 pages
Panasonic Conduit and Fittings Catalog
No ratings yet
Panasonic Conduit and Fittings Catalog
45 pages
Baguio City Hall Comfort Room Refurbishment
100% (1)
Baguio City Hall Comfort Room Refurbishment
58 pages
Dredger Equipment Inventory List
No ratings yet
Dredger Equipment Inventory List
6 pages
HBD-9550DVD Service Manual
No ratings yet
HBD-9550DVD Service Manual
29 pages
TCP/IP Configuration Lab Activity
No ratings yet
TCP/IP Configuration Lab Activity
8 pages
MBX2 User Manual Overview
No ratings yet
MBX2 User Manual Overview
25 pages
DIY Homemade Wood Lathe Guide
100% (2)
DIY Homemade Wood Lathe Guide
10 pages
Applications of Augmented Reality Report
No ratings yet
Applications of Augmented Reality Report
20 pages
Adafruit Micro SD Breakout Board Card Tutorial 932877
No ratings yet
Adafruit Micro SD Breakout Board Card Tutorial 932877
28 pages
Kistler - Acelerômetro
No ratings yet
Kistler - Acelerômetro
3 pages
Configuring Vista Cinema V3
100% (1)
Configuring Vista Cinema V3
57 pages
Intel H61 Motherboard User Manual
No ratings yet
Intel H61 Motherboard User Manual
35 pages
Overview of Computer Software
No ratings yet
Overview of Computer Software
18 pages
38DL - Plus Specification PDF
No ratings yet
38DL - Plus Specification PDF
5 pages
2024-2025 Computer Literacy Test Paper
No ratings yet
2024-2025 Computer Literacy Test Paper
2 pages
Clean Install Guide for Windows 11
No ratings yet
Clean Install Guide for Windows 11
5 pages
TB Series Turret Instructions Manual
No ratings yet
TB Series Turret Instructions Manual
14 pages
Rec Room Update Issues on Xbox
No ratings yet
Rec Room Update Issues on Xbox
1 page
Leica Flexline: Ts02/Ts06/Ts09 Quick Guide
No ratings yet
Leica Flexline: Ts02/Ts06/Ts09 Quick Guide
10 pages
BECKHOFF - CX20x0 (2016)
No ratings yet
BECKHOFF - CX20x0 (2016)
53 pages
Operating System Overview and Functions
No ratings yet
Operating System Overview and Functions
13 pages
Manual de Taller CRF 450 X 2005 - 2012
86% (7)
Manual de Taller CRF 450 X 2005 - 2012
464 pages
COA Syllabus Overview 2025
No ratings yet
COA Syllabus Overview 2025
1 page
CCG2 Type-C to HDMI Schematic Guide
No ratings yet
CCG2 Type-C to HDMI Schematic Guide
4 pages
IT Test MCQs for Computing Basics
No ratings yet
IT Test MCQs for Computing Basics
13 pages
Superpower2 Gameplay Design Document - V1.1a
No ratings yet
Superpower2 Gameplay Design Document - V1.1a
63 pages

OpenCL Programming Basics and Examples

Uploaded by

OpenCL Programming Basics and Examples

Uploaded by

OpenCL

These notes will introduce OpenCL

ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, [Link]

Development initiated by Apple.

OpenCL Programming Model

Sample OpenCL code to add two vectors

Structure of OpenCL main program

Select devices to use

Transfer data from host to device memory buffers

Simple code for identifying platform

clGetPlatformIDs (1, &platform, NULL);

Number of platform entries

The OpenCL Specification version 1.1 [Link]

Code for context

The OpenCL Specification version 1.1 [Link]

Simple code for creating a command queue

// Create command-queue cl_command_queue GPUCommandQueue = clCreateCommandQueue(GPUContext,GPUDevices[0],0,NULL);

Allocating memory on device

size_t size, void *host_ptr, cl_int *errcode_ret)

Returns error code if an error

Sample code for allocating memory on device for source data

Sample code for allocating memory on device for results on GPU

// Allocate GPU memory for output vector

cl_mem GPUOutputVector = clCreateBuffer(GPUContext,CL_MEM_WRITE_ONLY,sizeof(int)*N, NULL,NULL);

OpenCL qualifier to indicate kernel code

Kernel in a separate file

Create kernel program object

int main(int argc, char **argv) // Create OpenCL program object

cl_program OpenCLProgram = clCreateProgramWithSource(GPUContext,7,OpenCLSource,NULL,NULL);

Build kernel program

Program object from clCreateProgramwithSource

List of devices, if more than one

Creating Kernel Objects

Built prgram from clBuildProgram

Function name with __kernel qualifier

Return error code

Set Kernel Arguments

Enqueue a command to execute kernel on device

Kernel object from clCreatKernel()

Dimensions of work items

clEnqueueNDRangeKernel(GPUCommandQueue,OpenCLVectorAdd,1,NULL, WorkSize, localWorkSize, 0, NULL, NULL);

Array describing no of global work items

Array describing no of work items that make up a work group

Number of events to complete before this commands

Function to copy from buffer object to host memory

The OpenCL Specification version 1.1 [Link]

Function to copy from host memory to buffer object

The OpenCL Specification version 1.1 [Link]

Copy data back from kernel

clEnqueueReadBuffer(GPUCommandQueue,GPUOutputVector, CL_TRUE, 0, N*sizeof(int), C, 0, NULL, NULL);

Byte Size of data to offset in read in bytes buffer

Pointer to buffer in host to write data

Results from GPU

gcc -c -I /path-to-include-dir-with-cl.h/ main.c -o main.o gcc -L /path-to-lib-folder-with-OpenCL-libfile/ -l OpenCL main.o -o host

Compiling and Executing the program

Submitted by: Manisha Mishra

You might also like

size_t size, void host_ptr, cl_int errcode_ret)