Contents
Introduction
Sylvester-Type
Matrix Equations
Triangular
Matrix Equations
Usage
Download
Prerequisites
Building
Installation
Invoking
SLICOT/LAPACK
Wrappers
Native
routines
SMP
Parallel Native Routines
Performance
Aspects
Machine Characteristics
Workspace
Matrix-matrix
Multiply
OpenMP
QZ Factorizing
Routine
Other Sylvester-Type Projects from Umeå
Revision
History
Contact
References
Acknowledgements and Copyright
|
|
RECSY is a library for solving
triangular Sylvester-type matrix equations. Its objectives are both speed
and reliability. In order to achieve these goals, RECSY is based on novel
recursive blocked algorithms, which call new high-performance
kernels for solving small-sized leaf problems of the recursion tree [Jonsson
and Kågström, 2002a, 2002b]. In contrast to explicit standard
blocking techniques, our recursive approach leads to an automatic variable
blocking that has the potential of matching the memory hierarchies of
today’s HPC systems. The RECSY library comprises a set of Fortran 90
files, which uses recursion and OpenMP for shared memory parallelism to
solve eight different matrix equations, including continuous-time as well
as discrete-time standard and generalized Sylvester and Lyapunov
equations.
The Sylvester-type matrix
equations considered are a set of linear matrix equations that appear and
are commonly used in different control theory applications.
In the RECSY library, we
differentiate between one-sided and two-sided matrix equations. We use the
notation one-sided matrix equations when the solution is only
involved in matrix products of two matrices, e.g., op(A)X or
Xop(A), where op(A) can be A or
AT. In
two-sided matrix equations, the solution is involved in matrix
products of three matrices, both to the left and to the right, e.g., op(A)Xop(B).
In the table below, we list the
cases of the eight different types of matrix equations
considered together with the acronyms used (one-sided equations in the top
and two-sided equations in the bottom part).
| Name |
Matrix equation |
Acronym |
| Standard Sylvester (CT) |
op(A)X
± Xop(B) = C |
SYCT |
| Standard Lyapunov (CT) |
op(A)X
+ Xop(AT) = C |
LYCT |
| Generalized coupled Sylvester |
( op(A)X
± Yop(B) = C,
op(D)X ± Yop(E) = F
) |
GCSY |
|
|
|
| Standard Sylvester (DT) |
op(A)Xop(B)
± X = C |
SYDT |
| Standard Lyapunov (DT) |
op(A)Xop(AT)
-
X = C |
LYDT |
| Generalized Sylvester |
op(A)Xop(B)
± op(C)Xop(D) = E |
GSYL |
| Generalized Lyapunov (CT) |
op(A)Xop(AT)
-
op(E)Xop(ET) = C |
GLYCT |
| Generalized Lyapunov (DT) |
op(A)Xop(ET)
+ op(E)Xop(AT) = C |
GLYDT |
The RECSY library provides the
solution to various “transpose” variants of these matrix equations with
plus or minus signs. For example, eight different variants of the
continuous-time Sylvester equation, which in condensed form can be
expressed as
op(A)X
± Xop(B)
= scale
C,
where op(A) can be A
or AT and op(B) can be B or BT.
The scalar scale is an output scaling factor (0
£
scale £
1), set to avoid overflow in X during the computations. In general,
the solution overwrites the right hand side (e.g., C
¬
X).
The classical
method of solution of the Sylvester-type matrix equations is based on the
Bartels-Stewart method, which includes three major steps. First, the
matrix (or
matrix pair)
is transformed to a Schur (or
generalized Schur)
form. This leads to a reduced triangular matrix equation, which is solved
using the RECSY library. For example, the coefficient matrices A
and B
in the Sylvester equation
AX - XB = C are in upper triangular or upper
quasi-triangular form. Finally, the solution of the reduced matrix
equation is transformed back to the originally coordinate system.
Reliable and efficient algorithms
for the reduction step can be found in LAPACK for the standard case, and
in [Dackland-Kågström
1999] for the generalized case, where a blocked variant of the QZ method
is presented.
Triangular matrix equations also
appear naturally in estimating the condition numbers of matrix equations
and different eigenspace computations, including block-diagonalization of
matrices and matrix pairs and computation of functions of matrices.
Related applications include the direct reordering of eigenvalues in the
real (generalized) Schur form and the computation of additive
decompositions of a (generalized) transfer function (see [Jonsson-Kågström
2002a, 2002b] for more information and references).
This section describes how to use the library in
your applications.
New version (2009-12-28): The source code including makefiles is available in
.tar.gz format and
in .zip format.
Old version (2003): The source code including makefiles is available in
.tar.gz format and
in .zip format.
Prerequisites
In order to build the library, you need:
- A compiler. Included in the package are makefiles for IBM
XL
Fortran, Portland Group
PGHPF, and Compaq
Visual Fortran. It should not
be difficult to modify the makefiles to match another compiler, though.
If explicit parallelization is required, the compiler must support
OpenMP.
- A good implementation of
BLAS routines, e.g., your
vendor's, or a free implementation like
ATLAS.
- An implementation of
LAPACK routines.
- The
SLICOT library.
- GNU make 3.78 or higher, if you are building under Unix.
Building
- After downloading and unpacking, try make
-f Makefile.xxx librecsy.a, where xxx matches your
compiler. Or, if you are
using Visual Fortran, open the recsy.dsw workspace, select the librecsy
project and choose Build. This will compile all the files necessary for
the native routines and build the
library.
-
However,
if you want to use the SLICOT/LAPACK
wrappers (recommended), you will need to compile for wrapper files
and archive them into the SLICOT/LAPACK libraries. An example how
this can be done is by using make libslicot_lapack_recsy.a.
This will first compile the archive files, then create a huge library
file, containing SLICOT, LAPACK and RECSY, see below. Before doing this,
you may have to modify Makefile and set the variables LIBSLICOT and
LIBLAPACK so they point to your original libraries. These libraries will
not be modified. If you are using Visual Fortran, select the
libslicot_lapack_recsy project and choose Build. Also here, you will
probably need to modify libslicot_lapack_recsy.mak.
There is currently no installation routine. You may copy the library file (libslicot_lapack_recsy.a
or libslicot_lapack_recsy.lib) to an appropriate library location, e.g., /usr/local/lib, in order for the linker to
find it. Then link our program using -lslicot_lapack_recsy,
or, in Visual Fortran, by adding the library to the Object/library modules
in the Link tab in project settings.
There are two ways of using the library. One way is to use the standard
SLICOT/LAPACK routines for solving Sylvester-type equations. This has
three advantages. First, you don't need to modify your already existing
application. Not a line. Plug and play. Second, the SLICOT library
includes routines for solving full equations, i.e., they include
the Schur factorizing and back-transformation steps. These routines benefit from the
faster triangular solvers in this package. Also, they include condition
estimation. Thirdly, LAPACK and SLICOT routines include parameter
checking.
The other way, if you choose to walk on the wild side, is by calling
the recursive subroutines directly. Although they give a rough-cut
interface without parameter checking, this method has its advantages too.
Firstly, it provides some transposition options not provided in SLICOT/LAPACK.
Secondly, it gives you more freedom to choose workspace allocation,
although this is not recommended.
This said, it is recommended that you start out with the SLICOT/LAPACK
routines.
The following routines are overloaded in the RECSY library.
As they are overloaded, performance for routines using these subroutines,
e.g., SB04PD, will improve as well.
| Routine |
Native routine |
SLICOT SB03MX |
RECLYDT |
SLICOT SB03MY |
RECLYCT |
SLICOT SB04PY |
RECSYDT |
SLICOT SG03AX |
RECGLYDT |
SLICOT SG03AY |
RECGLYCT |
LAPACK DTRSYL |
RECSYCT |
LAPACK DTGSYL |
RECGCSY * |
* The RECGCSY routine is only used in
the case where TRANS='N' and
IJOB=0.
For the other cases, the original LAPACK code will be used.
We remark that neither LAPACK nor
SLICOT provide a GSYL solver.
To get the largest performance
gain from overloading RECSY routines, it is important that the SLICOT and
LAPACK libraries are linked with highly optimized BLAS libraries.
By calling the native routines, you obtain access to
all cases available in the RECSY library. Note however, that there is no parameter
checking. Also, these routines only solve quasi-triangular problems.
Directly calling the native routines requires explicit interfaces to be
declared in your (Fortran 90) source file, since some arguments are
optional. This is done by including the file RECSY_DECL.f90 (.F in Unix).
For example:
SUBROUTINE MYSUB(A)
DOUBLE PRECISION A(10,10)
INCLUDE 'RECSY_DECL.f90'
CALL RECSYCT(...)
END SUBROUTINE
The following native sequential routines are defined in the library.
Some parameters are declared OPTIONAL, they are shown in
italics.
RECGCSY( UPLOSIGN, SCALE, M, N, A, LDA, B, LDB, C, LDC,
D, LDD, E, LDE, F, LDF, INFO, MACHINE )Solves the generalized coupled triangular Sylvester
equation. A is upper quasitriangular M´M,
D is upper triangular M´M,
B is is upper quasitriangular N´N,
E is upper triangular N´N,
C and F are rectangular M´N.
UPLOSIGN |
GCSY equation |
| 0 |
( AX + YB = scale
C,
DX + YE = scale F ),
(C, F)¬
(X,
Y) |
| 1 |
( AX -
YB =
scale C, DX -
YE =
scale F ),
(C, F)¬
(X,
Y) |
| 2 |
( AX + YBT = scale
C,
DX + YET = scale F
),
(C, F)¬
(X,
Y) |
| 3 |
( AX -
YBT = scale C, DX
- YET = scale F
),
(C, F)¬
(X,
Y) |
| 4 |
( ATX + YB
= scale C, DTX + YE
= scale F ),
(C, F)¬
(X,
Y) |
| 5 |
( ATX -
YB = scale C, DTX -
YE = scale F ),
(C, F)¬
(X,
Y) |
| 6 |
( ATX + YBT =
scale C, DTX + YET =
scale F ),
(C, F)¬
(X,
Y) |
| 7 |
( ATX -
YBT = scale C, DTX -
YET = scale F ),
(C, F)¬
(X,
Y) |
RECGSYL( UPLOSIGN, SCALE, M, N, A, LDA,
B, LDB, C, LDC, D, LDD, E, LDE, INFO, MACHINE, WORKSPACE,
WKSIZE )
Solves the triangular generalized Sylvester equation.
A is upper quasitriangular M´M,
C is upper triangular M´M,
B is is upper quasitriangular N´N,
D is upper triangular N´N,
C and F are rectangular M´N.
UPLOSIGN |
GSYL equation |
| 0 |
AXB + CXD = scale
E,
E
¬
X |
| 1 |
AXB -
CXD =
scale E,
E
¬
X |
| 2 |
AXBT + CXDT = scale
E,
E
¬
X |
| 3 |
AXBT
- CXDT = scale E,
E
¬
X |
| 4 |
ATXB + CTXD
= scale E,
E
¬
X |
| 5 |
ATXB -
CTXD = scale E,
E
¬
X |
| 6 |
ATXBT +
CTXDT =
scale E,
E
¬
X |
| 7 |
ATXBT -
CTXDT = scale
E,
E
¬
X |
| 10 |
AXBT + CXDT = scale
E,
E
¬
X, B is is upper triangular N´N,
D is upper quasitriangular N´N
(used by RECGLYCT) |
| 12 |
ATXB + CTXD
= scale E,
E
¬
X, B is is upper triangular N´N,
D is upper quasitriangular N´N
(used by RECGLYCT) |
RECGLYDT( UPLO, SCALE, M, A, LDA, E,
LDE, C, LDC, INFO, MACHINE, WORKSPACE, WKSIZE )
Solves the triangular generalized discrete-time Lyapunov equation.
A is upper quasitriangular M´M,
E is upper triangular M´M,
C is symmetric M´M, only
upper part of C is accessed.
UPLO |
GLYDT equation |
| 0 |
AXAT -
EXET = scale C,
C
¬
X |
| 1 |
ATXA -
ETXE = scale C,
C
¬
X |
RECGLYCT( UPLO, SCALE, M, A, LDA, E,
LDE, C, LDC, INFO, MACHINE, WORKSPACE, WKSIZE )
Solves the triangular generalized continuous-time Lyapunov equation.
A is upper quasitriangular M´M,
E is upper triangular M´M,
C is symmetric M´M, only
upper part of C is accessed.
UPLO |
GLYCT equation |
| 0 |
AXET + AXET = scale
C,
C
¬
X |
| 1 |
ATXE + ATXE = scale
C,
C
¬
X |
Parameter types and meaning:
| Parameter |
Intent |
Type |
Meaning |
UPLOSIGN, UPLO |
IN |
INTEGER |
Defines different options (see above). |
A, B, D |
IN |
DOUBLE
PRECISION(*) |
Coefficient matrices (see above). |
C, E |
IN, INOUT |
DOUBLE
PRECISION(*) |
Coefficient/right hand side matrices (see above). |
F |
INOUT |
DOUBLE
PRECISION(*) |
Right hand side matrix (see above). |
SCALE |
OUT |
DOUBLE PRECISION |
Scaling factor, to avoid overflow in the computed solution. |
LDA, LDB, LDC,
LDD, LDE, LDF |
IN |
INTEGER |
Leading dimension of matrices A to F. |
INFO |
OUT |
INTEGER |
INFO=-100:
Out of memory
INFO=0: No error
INFO>0: The equation is (nearly) singular to working
precision; perturbed values were used to solve the equation. |
MACHINE |
IN |
DOUBLE
PRECISION(10) |
Optional. Specify only if you want
different parameters than the default parameters, which are set by RECSY_MACHINE. |
WORKSPACE |
INOUT |
DOUBLE
PRECISION(*) |
Optional. Specify only
if you provide your own workspace. |
WKSIZE |
INOUT |
INTEGER |
Optional.
Specify only if you provide your own workspace, or if you want to know
the required workspace size. |
In addition to the routines listed above, a set of
parallelized routines for OpenMP platforms are provided. The signatures
for these routines mimic the sequential routines, except for the extension
"_P" to the subroutine name and an extra parameter
PROCS. PROCS, of type INTEGER,
specifies the number of processors to use (³1).
Due to the nature of the recursion, this parallelization works best when
PROCS = 2d. However, other values may work
well, also.
Regardless if the sequential or parallel routines
are used, SMP optimized BLAS routines (DGEMM) should be used.
RECSYCT_P( PROCS, UPLOSIGN, SCALE, M,
N, A, LDA, B, LDB, C, LDC, INFO, MACHINE )
RECLYCT_P( PROCS, UPLO, SCALE, M, A, LDA, C, LDC,
INFO, MACHINE )
RECGCSY_P( PROCS, UPLOSIGN, SCALE, M, N, A, LDA, B, LDB, C, LDC,
D, LDD, E, LDE, F, LDF, INFO, MACHINE )
RECSYDT_P( PROCS, UPLOSIGN, SCALE, M,
N, A, LDA, B, LDB, C, LDC, INFO, MACHINE, WORKSPACE, WKSIZE )
RECLYDT_P( PROCS, UPLO, SCALE, M, A,
LDA, C, LDC, INFO, MACHINE, WORKSPACE, WKSIZE )
RECGSYL_P( PROCS, UPLOSIGN, SCALE, M,
N, A, LDA, B, LDB, C, LDC, D, LDD, E, LDE, INFO, MACHINE, WORKSPACE,
WKSIZE )
RECGLYDT_P( PROCS, UPLO, SCALE, M, A,
LDA, E, LDE, C, LDC, INFO, MACHINE, WORKSPACE, WKSIZE )
RECGLYCT_P( PROCS, UPLO, SCALE, M, A,
LDA, E, LDE, C, LDC, INFO, MACHINE, WORKSPACE, WKSIZE )
Like all high-performance routines, there are
tunable parameters. However, the number is not that great, since recursion
takes care of most of the blocking. All machine-characteristic parameters
are given in the vector MACHINE(0:9), which is initialized by the subroutine
REC_MACHINE. The elements of the vector are:
| Element |
Definition (default value) |
| 0 (first) |
Smallest number (DLAMCH('S')/DLAMCH('P')
is default value) |
| 1 |
Largest number (1D0/MACHINE(0)
is default value) |
| 2 |
Machine precision (DLAMCH('P')
is default value) |
| 3 |
Threshold for choosing rectangular matrix-matrix multiply
instead of triangular matrix-matrix multiply. (40 is default value).
For updates of the form C = C + A×B
where A or B is triangular and all matrix
dimensions are smaller than MACHINE(3), the built in
matrix-matrix multiply routine or DGEMM is used, see
below. For larger problems, A×B is
formed by a call to DTRMM using a temporary workspace,
and C is then updated. |
| 4 |
Threshold for choosing built in matrix-matrix multiply routine (40
is default value).
For matrix products where matrix dimensions are smaller than
MACHINE(4), the built-in matrix-matrix multiply routine
is used. For larger problems, BLAS DGEMM is used. |
| 5 |
Threshold for sequential routine
(3D6 is default value).
For problems which are smaller than MACHINE(5), no more
parallel splits are done. Instead, the (remaining) problem is solved
by sequential solver. The size of the problem is the number of flops
required to solve the problem. |
| 6 |
Reserved |
| 7 |
Reserved |
| 8 |
Reserved |
| 9 |
Reserved |
Since the routines are implemented in Fortran 90,
the subroutines can dynamically allocate their own workspace. If, for
performance or memory management reasons, you would like to provide your
own buffer, specify WORKSPACE and set WKSIZE to
the size of your buffer. If it is
large enough, it will be used. If it is too small, the routine will
allocate its own buffer anyway. A special case is WKSIZE=-1.
Then WKSIZE is overwritten with the minimum workspace
necessary to solve the problem, but no problem is solved and the matrices
are never referenced. Required workspace differs between the routines, but
no routine requires more than M×N+4
elements. The number may change in future releases.
Following the ideas from GEMM-based
level 3 BLAS (Kågström, Ling and
Van Loan, 1999), the recursive algorithms rely highly on a fast
matrix-matrix multiply (MM) routine. For small problems, it is important
that the MM routine is light-weight. The authors have found that some MM
implementations do not perform optimally on small problems, because of
parameter checking and/or unnecessary data copying. Because of this, a
very lightweight MM implementation which features 4´4
outer loop unrolling is provided, showing good performance on different
types of processors. This MM is intended for matrices which fit in level 1
cache. However, if your
DGEMM
routine performs well for small problems as well, you might want to lower
MACHINE(4),
possibly to zero.
In order to get the explicit parallelization from
the _P subroutines, a compiler which supports OpenMP is necessary. If your
system has more than two processors, your compiler should support nested
OpenMP directives. That is, all OpenMP compilers should support nested
directives according to the specifications, but some compilers do not
parallelize inner directives. Also, the overhead for OpenMP parallelism
varies between systems. For systems whose OpenMP overhead is large, higher
performance might be obtained by increasing
MACHINE(5).
Whilst the RECSY library reduces
the time for solving the triangular equations, it is only one part of
solving the complete dense matrix equation problem. Looking at only the
flop counts, the factorization part should take substantially
longer time. Previously, this was not the case, as most of the Sylvester
solvers were not based on true blocked algorithms. With the RECSY library,
the pressure is once again on the Schur factorization routines. One
significant improvement is to use the blocked algorithms for the
generalized Schur decomposition described in
Dackland and Kågström, 1999.
Hopefully, there will not be much need for digging into the internals
of the library. Yet, if there would be, you might want to know a bit of
the internals of the subroutines:
- The
SMP parallel subroutines call the sequential routine whenever the
problem is too small, or when there is only one processor assigned to
the work. Notice that
MACHINE(5)
cannot be too small for the parallel routines to work.
- The
sequential routines for solving Sylvester equations call a kernel solver
(ends with
_TR0
-
_TR7)
for problems where M, N
£
4. If M or N is greater than 4, a recursive split is done.
If M
£
2N, the N dimension is split. If N
£
2M, the M dimension is split. Otherwise, both dimensions
are split. For the SMP parallel subroutines, the split of both
dimensions leads to a recursion tree where two of the four branches are
done in parallel.
- The
kernel solver (
_TRx)
solves several different small matrix equations (leafs in the recursion
tree). This is done by constructing the Kronecker product representation
of each matrix equation and solving the resulting linear system Zx
= c using an optimized kernel with partial pivoting and overflow
guarding. If the kernel solver detects a (near) singularity or risk of
overflow, the kernel aborts.
- If
the kernel solver aborts, the recursive subroutine backtracks and we
construct a new Kronecker product representation now leading to a matrix
Z of size 4´4
- 32´32,
which is solved using complete pivoting.
-
The sequential routines for solving Lyapunov equations do recursion down
to M
£
2. When this limit is reached, the small problem is solved using partial
pivoting or complete pivoting for very ill-conditioned cases.
-
The SMP parallel subroutines for solving Lyapunov equations do not
utilize OpenMP in themselves, but they call SMP parallel routines for
solving Sylvester equations.
For more details we refer to
Jonsson and Kågström 2001a, 2001b, 2002a, 2002b. The
routine and call hierarchy is shown in this
PDF file.
Currently, new efficient
algorithms for ScaLAPACK-type Sylvester-type solvers and new ScaLAPACK-type
generalized Schur factorization routines are being developed. For more
information contact Bo Kågström (bokg@cs.umu.se).
2003-02-12: First release, 0.01alpha. Web page
released.
The authors are glad to receive any comments, bug
reports, or suggestions regarding the RECSY library. E-mail address:
isak@cs.umu.se.
A small subset of relevant references applicable to the library. For
further references, see Jonsson and
Kågström,
2002a and 2002b.
- Bartels, R. H.
and Stewart, G. W. 1972. Algorithm 432: Solution of the equation
AX+XB=C, Commun. ACM 15, 9, 820–826.
- Dackland, K. and Kågström, B.
1999. Blocked algorithms and software for reduction of a regular
matrix pair to generalized Schur form. ACM Trans. Math. Softw. 25,
4, 425–454.
- Jonsson, I. and Kågströöm, B.
2001a. Recursive blocked algorithms for solving triangular matrix
equations — Part I: One-sided and coupled Sylvester-type equations.
SLICOT Working Note 2001-4.
- Jonsson, I. and Kågström, B.
2001b. Recursive blocked algorithms for solving triangular matrix
equations — Part II: Two-sided and generalized Sylvester and Lyapunov
equations. SLICOT Working Note 2001-5.
- Jonsson, I. and Kågström, B.
2002a. Recursive blocked algorithms for solving triangular systems — Part
I: One-sided and coupled Sylvester-type matrix equations. ACM Trans.
Math. Softw. 28, 4, (Dec.), 392-415.
- Jonsson, I. and Kågström, B.
2002b. Recursive blocked algorithms for solving triangular systems — Part
II: Two-sided and generalized Sylvester and Lyapunov matrix equations.
ACM Trans. Math. Softw. 28, 4 (Dec.), 416-435.
- Kressner, D.,
Mehrmann, V., and Penzl, T.
1999a. CTLEX — A collection of benchmark examples for continuous-time
Lyapunov equations. SLICOT Working Note 1999–6.
- Kressner, D.,
Mehrmann, V., and Penzl, T.
1999b. DTLEX — A collection of benchmark examples for discrete-time
Lyapunov equations. SLICOT Working Note 1999–7.
- Penzl, T. 1998.
Numerical solution of generalized Lyapunov equations, Adv. Comp.
Math. 8, 33–48.
- SLICOT. Library and the Numerics in Control Network (NICONET) Web
site:
http://www.win.tue.nl/niconet/index.html.
Authors are Isak Jonsson and Bo Kågström, Dept of
Computing Science and HPC2N, Umeå University, SE-901 87 Umeå, Sweden.
Financial support for this project has been provided by
the Swedish Research Council under grants TFR 98-604, VR
621-2001-3284 and the Swedish Foundation for Strategic Research
under grant A3 02:128.
The library is freely available for academic (non-commercial) use, and is
provided on an "as is" basis. Provided in this package are parts of the SLICOT and
LAPACK libraries, which themselves may impose other terms. (SLICOT terms
are found here. LAPACK terms
are found here.)
Any use of the RECSY library should be acknowledged by citing the papers
Jonsson, I. and Kågström, B.
2002a and 2002b above. |