Expression Templates and OpenCL.
In: Proceedings of Parallel Processing and Applied Mathematics (PPAM 2011).
Torun, Poland, to appear.
Uwe Bawidamann und Marco Nehmeier.
[Kurzfassung]
[BibTeX]
In this paper we discuss the interaction of expression templates with
OpenCL devices. We show how the expression tree of expression templates
can be used to generate problem specific OpenCL kernels. In a second approach
we use expression templates to optimize the data transfer between the host and
the device which leads to a measurable performance increase in a domain
specific language approach.
We tested the functionality, correctness and performance for both
implementations in a case study for vector and matrix operations.
Generative Programming for Automatic Differentiation.
In: Proceedings of the 6th International Conference on Automatic Differentiation (AD 2012).
Fort Collins, CO, USA, to appear.
Marco Nehmeier.
[Kurzfassung]
[BibTeX]
In this paper we present a concept for a C++ implementation of forward
automatic differentiation of an arbitrary order using expression templates
and template meta programming. In contrast to other expression template
implementations, the expression tree in our implementation has only
symbolic characteristics. The run time code is then generated from the
tree structure using template meta programming functions to apply the
rules of symbolic differentiation onto the single operations at compile
time. This generic approach has the advantage that the template meta
programming functions are replaceable which offers the opportunity to
easily generate different specialized algorithms. We tested the functionality,
correctness and performance of a prototype in different case studies for floating
point as well as interval data types and compared it against other
implementations.
Interval arithmetic using expression templates, template meta programming and the upcoming C++ standard.
Computing, 94:215-228, 2012.
10.1007/s00607-011-0176-6
Marco Nehmeier.
[doi]
[Kurzfassung]
[BibTeX]
In this paper we will discuss different realizations for an efficient interval arithmetic implementation using expression templates and template meta programming in C++. We will improve the handling of the rounding mode switches using expression templates and show how the constructed expression trees can be combined with other features like automatic differentiation. For a further improvement of the run time performance we try to move as many functionality as possible to the compile time using template meta programming techniques. In addition we will illustrate how an interval arithmetic implementation will profit from new features and keywords defined in the upcoming C++ standard.
Parallel Detection of Interval Overlapping.
In:
K. Jónasson (Herausgeber):
Applied Parallel and Scientific Computing, Seiten 127-136.
Springer Berlin / Heidelberg, 2012.
10.1007/978-3-642-28145-7_13
Marco Nehmeier, Stefan Siegel und Jürgen Wolff von Gudenberg.
[doi]
[Kurzfassung]
[BibTeX]
In this paper we define the interval overlapping relation and develop a parallel hardware unit for its realization. As one application we consider the interval comparisons. It is shown that a detailed classification of the interval overlapping relation leads to a reduction of floating-point comparisons in common applications.
Specification of hardware for interval arithmetic.
Computing, 94:243-255, 2012.
10.1007/s00607-012-0185-0
Marco Nehmeier, Stefan Siegel und Jürgen Wolff von Gudenberg.
[doi]
[Kurzfassung]
[BibTeX]
Interval arithmetic, as it is standardized by the IEEE working group P1788 can be implemented by using floating point arithmetic units with directed rounding modes. The easiest way to represent an interval is by its two bounds. Simple formulas for the arithmetic operations can be applied. Our goal is to perform interval operations as fast as their floating point counterparts. Hence, we provide at least two units per operation. We also specify the operation for reverse multiplication (Neumaier in Vienna proposal for interval standardization, 2008) which can be implemented with the division unit. In this paper we do not care about optimization. Our primary intention is to give an easily understandable specification of hardware for interval arithmetic.
A long accumulator like a carry-save adder.
Computing, 94:203-213, 2012.
10.1007/s00607-011-0164-x
Stefan Siegel und Jürgen Wolff von Gudenberg.
[doi]
[Kurzfassung]
[BibTeX]
Long accumulators for the exact summation of floating point numbers or products are well known tools in numerical analysis especially in algorithms verifying the result (C++ Toolbox for Verified Computing, Springer, New York, 1995). An exact dot product is one of the features of the upcoming interval standard (IEEE Interval Standard Working Group, P1788. http://grouper.ieee.org/groups/1788/). Usually an accumulator is realized as a memory block with operations to add floating point numbers and products. Several variants have been proposed to avoid carry rippling: use separate accumulators for positive and negative numbers, initialize the accu with a pattern not equal to zero, or perform a kind of carry look-ahead-technique. All these approaches are described in detail in Kulisch (Computer arithmetic and validity—theory, implementation, and applications. Series: de Gruyter Studies in Mathematics 33, 2008) and Bohlender (Computer arithmetic and self-validating numerical methods, Academic Press, San Diego, 1990). In this paper we propose a long accumulator similar to a carry-save adder. The main idea is to augment the long accumulator with cache information. The cache is used to store the carries or borrows, instead of propagating them through the whole accumulator every time. Due to the cache, operations are kept local in our approach. The full information of the exact result is represented by the accu and the cache. When we want to deliver the result we have to add the contents of the cache into the accumulator. In this paper we present an implementation in software and compare it with other approaches. Furthermore we discuss the advantages of this algorithm for a hardware implementation.
filib++, Expression Templates and the Coming Interval Standard.
Reliable Computing, 15(4):312-320, 2011.
Marco Nehmeier und Jürgen Wolff von Gudenberg.
[doi]
[Kurzfassung]
[BibTeX]
In this paper we investigate how a C++ class library can be improved
by the concept of expression templates. Our first result is a saving of
rounding mode switches which considerably increases the performance.
Our second result deals with handling the discontinuity flag that will
probably be decided to be raised whenever a function is called outside its
domain (loose evaluation). We discuss several alternatives and propose
an expression related flag that can be used in a thread safe manner.
Both results are reviewed with respect to the coming IEEE standard
for interval arithmetic.
Interval Comparisons and Lattice Operations based on the Interval Overlapping Relation
.
In: Proceedings of the World Conference on Soft Computing 2011 (WConSC'11).
San Francisco, CA, USA, 2011.
Marco Nehmeier und Jürgen Wolff von Gudenberg.
[doi]
[Kurzfassung]
[BibTeX]
The interval overlapping relation defines 13 different states of the relative position of two intervals. In this paper we show that this relation can be used to define the interval
comparisons and lattice operations. The shift from a boolean (binary) comparison to a comprehensive relation (13 different states) enables the user to exploit involved dependencies. We
further introduce an object oriented abstract datatype to provide a userfriendly interface.
Solving Decidability Problems with Interval
Arithmetic.
Reliable Computing, 15(3):279-289, 2011.
German Tischler und Jürgen Wolff von Gudenberg.
[doi]
[BibTeX]
|