ood performance of a Cudabased implementation needs grouping of alike
operations to threads within the same kernel call. Hence, we need to identify
the subdomains
that include the boundary
and we need to establish a 1to1 mapping between these domains and a
onedimensional index. To establish such mapping we iterate though all
relevant
dimensional
subdomains
For each
we find the complement
as follows. Let
be the set of vertices of the subdomain
.
For each
we find an integer
such
that
or
Then
covers part of the boundary
.
There is no need to calculate
for every
.
Indeed,
thus a vertex
of a subdomain
is given
by
Hence
