Quantitative Analysis
Parallel Processing
Numerical Analysis
C++ Multithreading
Python for Excel
Python Utilities

I. Introduction into GPU programming.
II. Exception safe dynamic memory handling in Cuda project.
1. Allocating and deallocating device memory. ots::cuda::memory::Device class.
2. Accessing device memory. ots::cuda::memory::Host class.
3. Crossing host/device boundary. ots::cuda::memory::DeviceHandler class.
4. Accessing memory in __device__ code. ots::cuda::memory::Block class.
5. Handling two dimensional memory blocks. Do not use cudaMallocPitch.
6. Allocation of memory from Host scope.
7. Tagged data. Compiler assisted data verification.
III. Calculation of partial sums in parallel.
IV. Manipulation of piecewise polynomial functions in parallel.
V. Manipulation of localized piecewise polynomial functions in parallel.
Downloads. Index. Contents.

Crossing host/device boundary. ots::cuda::memory::DeviceHandler class.

n two previous section we presented a layer for grabbing device memory and assigning values to it. Once we have done it, we want to use such block of memory in a Cuda kernel. Obviously, we cannot pass Device class in any form (reference,pointer,value) into a kernel launch because the class resides in host memory and utilizes boost library. Hence, the Device class has a member function "DeviceHandler<dataType> handler() const" that returns a piece of data suitable for crossing host/device boundary. Hence, instead of writing "MyKernel<<<...>>>(deviceInstance)" we write "MyKernel<<<...>>>(deviceInstance.handler())".

The following is the code for DeviceHandler.

template <typename dataType>

class DeviceHandler



typedef typename Index::type index;

friend struct Block<dataType>;


dataType* theData;

index theSize;


DeviceHandler( dataType* data, index size ) : theData(data), theSize(size) {}

DeviceHandler( const DeviceHandler<dataType>& d ) : theData(d.theData), theSize(d.theSize) {}

DeviceHandler<dataType>& operator=( const DeviceHandler& d ) { theData=d.theData; theSize=d.theSize; return *this; }

index size() const { return theSize; }

void setSize( unsigned size ) { if( size>theSize ) return; theSize=size; }


Note complete absence of accessors. This is so because DeviceHandler is designed to exist in all three compiler scopes and, thus, cannot have any functionality except for the ability to create and copy by value. Once a DeviceHandler instance reaches Device-scope code, it should be passed into a constructor of the class "Block<dataType>" which is declared a friend in the DeviceHandler code.

Downloads. Index. Contents.

Copyright 2007