I. Introduction into GPU programming.
 II. Exception safe dynamic memory handling in Cuda project.
 III. Calculation of partial sums in parallel.
 IV. Manipulation of piecewise polynomial functions in parallel.
 V. Manipulation of localized piecewise polynomial functions in parallel.

## Calculation of partial sums in parallel.

et is a given array of numbers (integers or doubles). We would like to calculate an array s.t.

The presented procedure has -proportional processing time under assumption of infinite parallel processing capability.

We proceed as follows. Choose an integer : .

For step 0, we calculate in parallel ( is the thread index):

For step 1, we calculate

We continue, for step , we calculate
Summation in parallel for and .

The last step would be when

We have Similarly, Note that for each is a partial sum. Next, we correct the rest of the numbers.

For step 0 we calculate (on a single thread): Thus, Note that the final contains the final partial sum. Hence, there is no need to increment this position anymore. This fact is reflected in slightly different indexing for the following steps.

For step 1 we calculate (on threads, is the thread index): We adopt the convention The index varies from 0 to except for the boundary where cannot exceed or be equal to (because indexes of are in the same range as indexes of ): Hence, the stated range for follows.

We have thus

For general step we calculate (on threads, is the thread index): Assuming that we get thus

At we have the partial sums.

It remains to describe memory allocation and positioning for the intermediate data and . We place (see the figure ( Partial summation in parallel ))

Summary

(Calculation of partial sums) The array and the parameters , : are input data.

We introduce an operation that transforms the array according to the rule where is chosen to limit the range of so that participates in the sum at most once and the range of (thread index) is

Let

We introduce the operation that transforms the array according to the rule (single threaded calculation):

We introduce the operation that transforms the array according to the rule ( is the thread index): where

We consecutively perform the operations At completion contains the partial sums.