Quantitative Analysis
Parallel Processing
Numerical Analysis
C++ Multithreading
Python for Excel
Python Utilities
Services
Author

I. Introduction into GPU programming.
II. Exception safe dynamic memory handling in Cuda project.
1. Allocating and deallocating device memory. ots::cuda::memory::Device class.
2. Accessing device memory. ots::cuda::memory::Host class.
3. Crossing host/device boundary. ots::cuda::memory::DeviceHandler class.
4. Accessing memory in __device__ code. ots::cuda::memory::Block class.
5. Handling two dimensional memory blocks. Do not use cudaMallocPitch.
6. Allocation of memory from Host scope.
7. Tagged data. Compiler assisted data verification.
III. Calculation of partial sums in parallel.
IV. Manipulation of piecewise polynomial functions in parallel.
V. Manipulation of localized piecewise polynomial functions in parallel.
Downloads. Index. Contents.

Allocating and deallocating device memory. ots::cuda::memory::Device class.


he Device class is responsible for allocating/deallocating a block of memory on device. The following is partial code.

template <typename dataType>

class Device

{

public:

typedef typename Index::type index;

private:

class Impl : boost::noncopyable

{

private:

dataType* theData;

index theSize;

void init( index size );

void destroy() ;

public:

Impl() : theSize(0),theData(NULL) {}

explicit Impl( index size ) : theSize(0),theData(NULL) { init(size); }

explicit Impl( const Host<dataType>& x ) : theSize(0),theData(NULL) { init(x.size()); copyFrom(x); }

~Impl() { destroy(); }

void copyFrom( const Host<dataType>& x );

};

boost::shared_ptr<Impl> theImpl;

public:

DeviceHandler<dataType> handler() const { return theImpl->handler(); }

explicit Device( index size ) : theImpl(new Impl(size)) {}

explicit Device( const Host<dataType>& x ) : theImpl(new Impl(x)) {}

Device() : theImpl(new Impl()) {}

Device( const Device& x ) : theImpl(x.theImpl) {}

Device& operator=( const Device& x ) { theImpl=x.theImpl; return *this; }

void copyFrom( const Host<dataType>& x ) { theImpl->copyFrom(x); }

};

This is a standard bridge pattern implementation of a handler to be passed by value. The pointer theData holds a device-based address of a memory block. The function init() contains a call to Cuda API that allocates the block of memory.

void init( index size )

{

theSize=0;

theData=NULL;

if( size<=0 )

return;

void *v=static_cast<void*>(theData);

deviceAlloc(&v,size*sizeof(type));

theData=static_cast<type*>(v);

theSize=size;

}

Here the function deviceAlloc is prototyped in a header file and implemented in a cu-file.

void deviceAlloc( void** dst, std::size_t bytesToAlloc )

{

check( cudaSuccess==cudaMalloc(dst,bytesToAlloc), "cudaMalloc failed" );

check( cudaSuccess==cudaDeviceSynchronize(), "cudaDeviceSynchronize failed" );

}

The function cudaMalloc is the Cuda API function that does the allocation.

The function "void copyFrom( const Host<dataType>& x )" is the accessor. The Host<dataType> class is the holder of data that we might want to bring into the device memory.





Downloads. Index. Contents.


















Copyright 2007