DUNE PDELab (git)
How to support vectorization in Dune classes. More...
Files | |
file | interface.hh |
User interface of the SIMD abstraction. | |
file | io.hh |
IO interface of the SIMD abstraction. | |
Namespaces | |
namespace | Dune::Simd |
Namespace for vectorization interface functions used by library developers. | |
Basic interface | |
template<class V > | |
using | Dune::Simd::Scalar = typename Overloads::ScalarType< std::decay_t< V > >::type |
Element type of some SIMD type. More... | |
template<class S , class V > | |
using | Dune::Simd::Rebind = typename Overloads::RebindType< std::decay_t< S >, std::decay_t< V > >::type |
Construct SIMD type with different scalar type. More... | |
template<class V > | |
constexpr std::size_t | Dune::Simd::lanes () |
Number of lanes in a SIMD type. More... | |
template<class V > | |
decltype(auto) | Dune::Simd::lane (std::size_t l, V &&v) |
Extract an element of a SIMD type. More... | |
template<class V , class U > | |
constexpr V | Dune::Simd::implCast (U &&u) |
Cast an expression from one implementation to another. More... | |
template<class V , class S > | |
constexpr V | Dune::Simd::broadcast (S s) |
Broadcast a scalar to a vector explicitly. More... | |
template<class M , class V > | |
V | Dune::Simd::cond (M &&mask, const V &ifTrue, const V &ifFalse) |
Like the ?: operator. More... | |
template<class V > | |
V | Dune::Simd::cond (bool mask, const V &ifTrue, const V &ifFalse) |
Like the ?: operator. More... | |
template<class V > | |
auto | Dune::Simd::max (const V &v1, const V &v2) |
The binary maximum value over two simd objects. More... | |
template<class V > | |
auto | Dune::Simd::min (const V &v1, const V &v2) |
The binary minimum value over two simd objects. More... | |
template<class Mask > | |
bool | Dune::Simd::anyTrue (const Mask &mask) |
Whether any entry is true More... | |
template<class Mask > | |
bool | Dune::Simd::allTrue (const Mask &mask) |
Whether all entries are true More... | |
template<class Mask > | |
bool | Dune::Simd::anyFalse (const Mask &mask) |
Whether any entry is false More... | |
template<class Mask > | |
bool | Dune::Simd::allFalse (const Mask &mask) |
Whether all entries are false More... | |
template<class V > | |
Scalar< V > | Dune::Simd::max (const V &v) |
The horizontal maximum value over all lanes. More... | |
template<class V > | |
Scalar< V > | Dune::Simd::min (const V &v) |
The horizontal minimum value over all lanes. More... | |
template<class V > | |
auto | Dune::Simd::mask (const V &v) |
Convert to mask, analogue of bool(s) for scalars. More... | |
template<class V1 , class V2 > | |
auto | Dune::Simd::maskOr (const V1 &v1, const V2 &v2) |
Logic or of masks. More... | |
template<class V1 , class V2 > | |
auto | Dune::Simd::maskAnd (const V1 &v1, const V2 &v2) |
Logic and of masks. More... | |
Syntactic Sugar | |
Templates and functions in this group provide syntactic sugar, they are implemented using the functionality from SimdInterfaceBase, and are not customizable by implementations. | |
template<class V > | |
using | Dune::Simd::Mask = Rebind< bool, V > |
Mask type type of some SIMD type. More... | |
template<class V > | |
std::size_t | Dune::Simd::lanes (const V &) |
Number of lanes in a SIMD type. More... | |
IO interface | |
Templates and functions in this group provide syntactic sugar for IO. They are implemented using the functionality from SimdInterfaceBase, and are not customizable by implementations. | |
template<class V > | |
auto | Dune::Simd::vio (const V &v) |
construct a stream inserter More... | |
template<class V > | |
auto | Dune::Simd::io (const V &v) |
construct a stream inserter More... | |
Detailed Description
How to support vectorization in Dune classes.
This module describes how a Dune library developer can add support for vectorization to library facilities.
Understanding SIMD types
The (idealized) model of a SIMD type V
used in this abstraction layer is that they are fixed-length vectors of some scalar type S
. Operations and operators that take values of type S
as arguments, except for operator,()
, should be overloaded to support values of type V
too. These operations should apply element-wise. If the operation takes more than one argument, it should accept arbitrary combinations of V
and S
. The exception is the combination of S
on the left hand side and V
on the right hand side of one of the assignment operators, which does not make sense.
The result of a boolean operation is a mask type M
, which is a SIMD type with scalar type bool
with the same number of elements as V
. The result of all other operations is again of type V
, or of some type convertible to V
.
This is very similar to std::valarray
, with the main difference being that std::valarray
is dynamic in size, while for this abstraction the size is static.
Type promotion issues
True SIMD types have an issue with type promotion, which means they cannot behave completely analogous to built-in integral types (this is a non-issue with floating point types). Essentially, operations on true SIMD types cannot promote their arguments, because the promoted types typically require more storage than the original types, meaning an argument that was passed in a single vector register would need multiple vector registers after promotion, which would mean greater register pressure. Also, there would be conversion operations required, which (at least on x86) is not typically the case for promotions of the built-in types. Lastly, with larger types the vector units can typically operate on fewer lanes at a time.
Omitting integral promotions has in many cases no negative impact, because many programmers do not really expect them anyway. There are however cases where they matter, and for illustration I want to explain one that crept up during unit testing.
Here is a simplified (and somewhat pseudo-code) version of the test. The test checks the result of unary -
on Vc::Vector<unsigned short>
by comparing the result of unary -
when applied to the complete vector to the result of unary -
when applied to each lane individually.
The test fails in lane 0. On the left side of the ==
, lane(0,
vresult)
is (unsigned short)65535
, which is the same as (unsigned short)-1
, as it should be. On the right side, lane(0, varg)
is (unsigned short)1
. -
promotes its argument, so that becomes (int)1
, and the result of the negation is (int)-1
.
Now the comparison is (unsigned short)65535 == (int)-1
. The comparison operator applies the usual arithmetic conversions to bring both operands to the same type. In this case this boils down to converting the left side to int
via integral promotions and the comparison becomes (int)65535 == (int)-1
. The result is of course false
and the assertion triggers.
The only way to thoroughly prevent this kind of problem is to convert the result of any operation back to the expected type. In the above example, the assertion would need to be written as assert(lane(l,
vresult) == static_cast<unsigned short>(-lane(l, varg)));
. In practice, this should only be a problem with operations on unsigned types where the result may be "negative". Most code in Dune will want to operate on floating point types, where this is a non-issue.
(Of course, this is also a problem for code that operates on untrusted input, but you should not be doing that with Dune anyway).
Still, when writing code using the SIMD abstractions, you should be aware that in the following snippet
the exact types of var1
and var2
may be somewhat surprising.
Limitations of the Abstraction Layer
Since the abstraction layer cannot overload operators of SIMD types (that would be meddling with the domain of the library that provides the SIMD types), nor provide its own constructors, there are severe limitations in what the abstraction layer guarantees. Besides the standard types, the first SIMD library supported is Vc, so that is where most of the limitations stem from; see Restrictions in SIMD Abstraction Implementation for Vc.
The biggest limitations are with masks. In Vc masks support a very restricted set of operations compared to other SIMD types, so in what follows we will distinguish between masks with a very small set of operations and between vectors with a larger set of operations.
Here is a compact table of the limitations as a quick reference, together with suggested workarounds for the constructs that don't work. s
denotes a scalar object/expression (i.e. of type double
or in the case of masks bool
). v
denotes a vector/mask object/expression. sv
means that both scalar and vector arguments are accepted. V
denotes a vector/mask type. @
means any applicable operator that is not otherwise listed.
Notes:
- [1] In Vc, mask-mask
==
and!=
operations exist, but the result is of typebool
, i.e. a scalar. - [2]
++
(either kind) on bools is deprecated by the standard. Our test suite does not check for it on masks, but it was supported by Vc masks at some point. - [3] Contrary to the other operators, the expected result for
(sv1, sv2)
is exactlysv2
, no broadcasting applied. - [4] Try to avoid the use of
operator,
unless both operands are built-in types if possible. Libraries had a tendency to overloadoperator,
to provide for things like container initialization before C++11, and these overloads may still be present in the library you are using and replace the default meaning ofoperator,
.
Support levels:
y
: operation generally works; some instances of the operation may not apply*N*
: operation generally does not work; some instances of the operation may not applyn/a
: operation does not apply (i.e. bitwise operations to floating-point operands,--
(and in the future possibly++
) to boolean operands, assignment operators to scalar left hand sides)
Typedef Documentation
◆ Mask
using Dune::Simd::Mask = typedef Rebind<bool, V> |
Mask type type of some SIMD type.
- Template Parameters
-
V The SIMD (mask or vector) type. const
,volatile
or reference qualifiers are automatically ignored.
The mask type is kind of a SIMD vector of bool
with the same number of lanes as V
. It results from comparison operations between values of type V
. It is only "kind of" a SIMD vector, because the guaranteed supported operations are extremely limited. At the moment only the logical operators &&
, ||
and !
and the "bitwise" operators &
, ^
and |
between masks are supported, and even with those operators you cannot rely on automatic broadcasting of bool
values.
- Note
- In particular, masks do not support comparison. As a workaround you can use
^
instead of!=
and!(m1 ^ m2)
instead ofm1 == m2
. (The reason why comparison is not supported is because in Vc==
and!=
between masks yield a singlebool
result and not a mask.)
This is an alias for Rebind<bool, V>
.
◆ Rebind
using Dune::Simd::Rebind = typedef typename Overloads::RebindType<std::decay_t<S>, std::decay_t<V> >::type |
Construct SIMD type with different scalar type.
- Template Parameters
-
S The new scalar type V The SIMD (mask or vector) type.
The resulting type a SIMD vector of S
with the same number of lanes as V
. const
, volatile
or reference qualifiers in S
and V
are automatically ignored, and the result will have no such qualifiers.
Implementations shall rebind to LoopSIMD<S, lanes<V>()>
if they can't support a particular rebind natively.
Implemented by Overloads::RebindType
.
◆ Scalar
using Dune::Simd::Scalar = typedef typename Overloads::ScalarType<std::decay_t<V> >::type |
Element type of some SIMD type.
- Template Parameters
-
V The SIMD (mask or vector) type. const
,volatile
or reference qualifiers are automatically ignored.
Not all operations that access the element of a vector return (a reference to) the scalar type – some may return proxy objects instead. Use autoCopy()
to make sure you are getting a prvalue of the scalar type.
Implemented by Overloads::ScalarType
.
Function Documentation
◆ allFalse()
Whether all entries are false
Implemented by Overloads::allFalse()
.
References Dune::Simd::allFalse(), and Dune::Simd::mask().
Referenced by Dune::Simd::allFalse().
◆ allTrue()
Whether all entries are true
Implemented by Overloads::allTrue()
.
References Dune::Simd::allTrue(), and Dune::Simd::mask().
Referenced by Dune::Simd::allTrue(), Dune::BiCGSTABSolver< X >::apply(), Dune::RestartedGMResSolver< X, Y, F >::apply(), Dune::RestartedFlexibleGMResSolver< X, Y, F >::apply(), and Dune::IterativeSolver< X, Y >::Iteration< CountType >::step().
◆ anyFalse()
Whether any entry is false
Implemented by Overloads::anyFalse()
.
References Dune::Simd::anyFalse(), and Dune::Simd::mask().
Referenced by Dune::Simd::anyFalse().
◆ anyTrue()
Whether any entry is true
Implemented by Overloads::anyTrue()
.
References Dune::Simd::anyTrue(), and Dune::Simd::mask().
Referenced by Dune::Simd::Overloads::allFalse(), Dune::Simd::Overloads::allTrue(), Dune::Simd::Overloads::anyFalse(), and Dune::Simd::anyTrue().
◆ broadcast()
|
constexpr |
Broadcast a scalar to a vector explicitly.
Implemented by Overloads::broadcast()
This is useful because the syntax for broadcasting can vary wildly between implementations.
- Note
- One of the few functions that explicitly take a template argument (
V
in this case).
References Dune::Simd::broadcast().
Referenced by Dune::Simd::broadcast().
◆ cond() [1/2]
V Dune::Simd::cond | ( | bool | mask, |
const V & | ifTrue, | ||
const V & | ifFalse | ||
) |
Like the ?: operator.
Overload for plain bool masks, accepting any simd type
Implemented by Overloads::cond()
.
References Dune::Simd::mask().
◆ cond() [2/2]
V Dune::Simd::cond | ( | M && | mask, |
const V & | ifTrue, | ||
const V & | ifFalse | ||
) |
Like the ?: operator.
Equivalent to
Implemented by Overloads::cond()
.
References Dune::Simd::cond(), and Dune::Simd::mask().
Referenced by Dune::GradientSolver< X >::apply(), Dune::CGSolver< X >::apply(), Dune::BiCGSTABSolver< X >::apply(), Dune::MINRESSolver< X >::apply(), Dune::RestartedGMResSolver< X, Y, F >::apply(), Dune::Simd::cond(), Dune::Simd::Overloads::max(), and Dune::Simd::Overloads::min().
◆ implCast()
|
constexpr |
Cast an expression from one implementation to another.
Implemented by Overloads::implCast()
Requires the scalar type and the number of lanes to match exactly.
This is particularly useful for masks, which often know the type they were derived from. This can become a problem when doing a conditional operation e.g. on some floating point vector type, but with a mask derived from some index vector type.
- Note
- One of the few functions that explicitly take a template argument (
V
in this case).
References Dune::Simd::implCast().
Referenced by Dune::Simd::implCast().
◆ io()
auto Dune::Simd::io | ( | const V & | v | ) |
construct a stream inserter
- Template Parameters
-
V The SIMD (mask or vector) type.
Construct an object that can be inserted into an output stream. For one-lane vectors, behaves the same as scalar insertion. For multi-lane vectors, behaves as the inserter returned by vio()
: insertion prints the vector values separated by a comma and a space, and surrounded by angular brackets.
Referenced by Dune::BiCGSTABSolver< X >::apply(), Dune::InverseOperator< X, Y >::printOutput(), and Dune::IterativeSolver< X, Y >::Iteration< CountType >::step().
◆ lane()
decltype(auto) Dune::Simd::lane | ( | std::size_t | l, |
V && | v | ||
) |
Extract an element of a SIMD type.
- Parameters
-
l Number of lane to extract v SIMD object to extract from
- Returns
- If
v
is a non-const
lvalue, a referenceScalar<decay_t<V>>&
, or a proxy object through which the element ofv
may be modified. Otherwise,v
is aconst
lvalue or an rvalue, and the result is a prvalue (a temporary) of typeScalar<decay_t<V>>
.
Implemented by Overloads::lane()
.
References Dune::Simd::lane().
Referenced by Dune::Simd::Overloads::implCast(), Dune::Simd::lane(), Dune::Simd::Overloads::max(), and Dune::Simd::Overloads::min().
◆ lanes() [1/2]
|
constexpr |
Number of lanes in a SIMD type.
- Template Parameters
-
V The SIMD (mask or vector) type. const
,volatile
or reference qualifiers are automatically ignored.
Implemented by Overloads::LaneCount
.
Referenced by Dune::Simd::Overloads::implCast(), Dune::Simd::Overloads::max(), and Dune::Simd::Overloads::min().
◆ lanes() [2/2]
std::size_t Dune::Simd::lanes | ( | const V & | ) |
Number of lanes in a SIMD type.
- Template Parameters
-
V The SIMD (mask or vector) type.
The value of the parameter is ignored; the call is simply forwarded to lanes<V>()
.
◆ mask()
auto Dune::Simd::mask | ( | const V & | v | ) |
Convert to mask, analogue of bool(s) for scalars.
Implemented by Overloads::mask()
.
References Dune::Simd::mask().
Referenced by Dune::Simd::allFalse(), Dune::Simd::allTrue(), Dune::Simd::anyFalse(), Dune::Simd::anyTrue(), Dune::Simd::cond(), Dune::Simd::mask(), Dune::Simd::Overloads::maskAnd(), and Dune::Simd::Overloads::maskOr().
◆ maskAnd()
auto Dune::Simd::maskAnd | ( | const V1 & | v1, |
const V2 & | v2 | ||
) |
Logic and of masks.
Implemented by Overloads::maskAnd()
.
References Dune::Simd::maskAnd().
Referenced by Dune::Simd::maskAnd().
◆ maskOr()
auto Dune::Simd::maskOr | ( | const V1 & | v1, |
const V2 & | v2 | ||
) |
Logic or of masks.
Implemented by Overloads::maskOr()
.
References Dune::Simd::maskOr().
Referenced by Dune::Simd::maskOr().
◆ max() [1/2]
Scalar< V > Dune::Simd::max | ( | const V & | v | ) |
The horizontal maximum value over all lanes.
Implemented by Overloads::max()
.
References Dune::Simd::max().
◆ max() [2/2]
auto Dune::Simd::max | ( | const V & | v1, |
const V & | v2 | ||
) |
The binary maximum value over two simd objects.
Implemented by Overloads::max()
.
References Dune::Simd::max().
Referenced by Dune::RestartedGMResSolver< X, Y, F >::apply(), and Dune::Simd::max().
◆ min() [1/2]
Scalar< V > Dune::Simd::min | ( | const V & | v | ) |
The horizontal minimum value over all lanes.
Implemented by Overloads::min()
.
References Dune::Simd::min().
◆ min() [2/2]
auto Dune::Simd::min | ( | const V & | v1, |
const V & | v2 | ||
) |
The binary minimum value over two simd objects.
Implemented by Overloads::min()
.
References Dune::Simd::min().
Referenced by Dune::Simd::min().
◆ vio()
auto Dune::Simd::vio | ( | const V & | v | ) |
construct a stream inserter
- Template Parameters
-
V The SIMD (mask or vector) type.
Construct an object that can be inserted into an output stream. Insertion prints the vector values separated by a comma and a space, and surrounded by angular brackets.