Computations on matrices

Applied Filters

People

Publications

Conferences

Reproducibility Badges

Publication Date

Searched The ACM Guide to Computing Literature (3,482,418 records)|Limit your search to The ACM Full-Text Collection (691,749 records)

Showing 1 - 20of85 Results

Filters

Select All

Export Citations Save to Binder

per page:

Relevance

Article
June 2022
Batch QR Factorization on GPUs: Design, Optimization, and Tuning
Computational Science – ICCS 2022Jun 2022, pp 60–74https://doi.org/10.1007/978-3-031-08751-6_5
Abstract
QR factorization of dense matrices is a ubiquitous tool in high performance computing (HPC). From solving linear systems and least squares problems to eigenvalue problems, and singular value decompositions, the impact of a high performance QR ...
0
Metrics
Total Citations0
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
research-article
July 2021
A survey of numerical linear algebra methods utilizing mixed-precision arithmetic
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 35, Issue 4Jul 2021, pp 344–369https://doi.org/10.1177/10943420211003313

The efficient utilization of mixed-precision numerical linear algebra algorithms can offer attractive acceleration to scientific computing applications. Especially with the hardware integration of low-precision special-function units designed for machine ...
6
Metrics
Total Citations6
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
research-article
Public Access
June 2021
Published By ACM
A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines
ACM Transactions on Mathematical Software (TOMS), Volume 47, Issue 3September 2021, Article No.: 21, pp 1–23https://doi.org/10.1145/3431921

This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a ...
11
433
Metrics
Total Citations11
Total Downloads433
Last 12 Months270
Last 6 weeks33
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
HTML
PDF
research-article
November 2020
MAGMA templates for scalable linear algebra on emerging architectures
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 34, Issue 6Nov 2020, pp 645–658https://doi.org/10.1177/1094342020938421

With the acquisition and widespread use of more resources that rely on accelerator/wide vector–based computing, there has been a strong demand for science and engineering applications to take advantage of these latest assets. This, however, has been ...
1
Metrics
Total Citations1
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
research-article
Public Access
March 2020
Published By ACM
Load-balancing Sparse Matrix Vector Product Kernels on GPUs
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 1March 2020, Article No.: 2, pp 1–26https://doi.org/10.1145/3380930

Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that ...
18
1,370
Metrics
Total Citations18
Total Downloads1,370
Last 12 Months455
Last 6 weeks52
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
HTML
PDF
research-article
Public Access
November 2019
Published By ACM
SLATE: design of a modern distributed and accelerated linear algebra library
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2019, Article No.: 26, pp 1–18https://doi.org/10.1145/3295500.3356223

The SLATE (Software for Linear Algebra Targeting Exascale) library is being developed to provide fundamental dense linear algebra capabilities for current and upcoming distributed high-performance systems, both accelerated CPU-GPU based and CPU based. ...
35
1,207
Metrics
Total Citations35
Total Downloads1,207
Last 12 Months348
Last 6 weeks46
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
September 2019
Distributed-memory lattice H -matrix factorization
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 33, Issue 5Sep 2019, pp 1046–1063https://doi.org/10.1177/1094342019861139

We parallelize the LU factorization of a hierarchical low-rank matrix ( H -matrix) on a distributed-memory computer. This is much more difficult than the H -matrix-vector multiplication due to the dataflow of the factorization, and it is much harder than the ...
2
Metrics
Total Citations2
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Article
August 2019
Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators
Euro-Par 2019: Parallel ProcessingAug 2019, pp 495–506https://doi.org/10.1007/978-3-030-29400-7_35
Abstract
This work presents two implementations of linear solvers for distributed-memory machines with GPU accelerators—one based on the Cholesky factorization and one based on the LU factorization with partial pivoting. The routines are developed as part ...
0
Metrics
Total Citations0
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
research-article
Public Access
June 2019
Published By ACM
Least squares solvers for distributed-memory machines with GPU accelerators
ICS '19: Proceedings of the ACM International Conference on SupercomputingJune 2019, pp 117–126https://doi.org/10.1145/3330345.3330356

This work presents an implementation of a linear least squares solver for distributed-memory machines with GPU accelerators, developed as part of the Software for Linear Algebra Targeting Exascale (SLATE) package. From the algorithmic standpoint, the ...
2
285
Metrics
Total Citations2
Total Downloads285
Last 12 Months32
Last 6 weeks3
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
Article
June 2018
The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques
Computational Science – ICCS 2018Jun 2018, pp 586–600https://doi.org/10.1007/978-3-319-93698-7_45
Abstract
As parallel computers approach exascale, power efficiency in high-performance computing (HPC) systems is of increasing concern. Exploiting both the hardware features and algorithms is an effective solution to achieve power efficiency, and to ...
2
Metrics
Total Citations2
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
research-article
May 2018
Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs
Parallel Computing (PACO), Volume 74, Issue CMay 2018, pp 3–18https://doi.org/10.1016/j.parco.2017.10.004
Highlights

Accelerates all three phases of the singular value decomposition using a GPU.
...
Abstract
The increasing gap between memory bandwidth and computation speed motivates the choice of algorithms to take full advantage of today’s high performance computers. For dense matrices, the classic algorithm for the singular value ...
3
Metrics
Total Citations3
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
research-article
Public Access
November 2015
Published By ACM
Adaptive precision solvers for sparse linear systems
E2SC '15: Proceedings of the 3rd International Workshop on Energy Efficient SupercomputingNovember 2015, Article No.: 2, pp 1–10https://doi.org/10.1145/2834800.2834802

We formulate an implementation of a Jacobi iterative solver for sparse linear systems that iterates the distinct components of the solution with different precision in terms of mantissa length. Starting with very low accuracy, and using an inexpensive ...
7
310
Metrics
Total Citations7
Total Downloads310
Last 12 Months36
Last 6 weeks1
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
Public Access
November 2015
Published By ACM
Mixed-precision block gram Schmidt orthogonalization
ScalA '15: Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale SystemsNovember 2015, Article No.: 2, pp 1–8https://doi.org/10.1145/2832080.2832082

The mixed-precision Cholesky QR (CholQR) can orthogonalize the columns of a dense matrix with the minimum communication cost. Moreover, its orthogonality error depends only linearly to the condition number of the input matrix. However, when the desired ...
5
279
Metrics
Total Citations5
Total Downloads279
Last 12 Months40
Last 6 weeks3
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
Public Access
November 2015
Published By ACM
Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs
SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2015, Article No.: 60, pp 1–11https://doi.org/10.1145/2807591.2807613

A low-rank approximation of a dense matrix plays an important role in many applications. To compute such an approximation, a common approach uses the QR factorization with column pivoting (QRCP). Though the reliability and efficiency of QRCP have been ...
5
332
Metrics
Total Citations5
Total Downloads332
Last 12 Months26
Last 6 weeks1
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
Public Access
February 2015
Published By ACM
Optimization for performance and energy for batched matrix computations on GPUs
GPGPU-8: Proceedings of the 8th Workshop on General Purpose Processing using GPUsFebruary 2015, pp 59–69https://doi.org/10.1145/2716282.2716288

As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size independent problems. Many applications already need this functionality, ...
6
362
Metrics
Total Citations6
Total Downloads362
Last 12 Months44
Last 6 weeks7
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
abstract
Public Access
January 2015
Published By ACM
Towards batched linear solvers on accelerated hardware platforms
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingJanuary 2015, pp 261–262https://doi.org/10.1145/2688500.2688534

As hardware evolves, an increasingly effective approach to develop energy efficient, high-performance solvers, is to design them to work on many small and independent problems. Indeed, many applications already need this functionality, especially for ...
Also Published in:
ACM SIGPLAN Notices: Volume 50 Issue 8, August 2015
13
405
Metrics
Total Citations13
Total Downloads405
Last 12 Months33
Last 6 weeks0
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
May 2014
Published By ACM
clMAGMA: high performance dense linear algebra with OpenCL
IWOCL '14: Proceedings of the International Workshop on OpenCL 2013 & 2014May 2014, Article No.: 1, pp 1–9https://doi.org/10.1145/2664666.2664667

This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms in OpenCL. In particular, these are linear system solvers and eigenvalue problem solvers. Further, we give an overview of the clMAGMA library, ...
9
186
Metrics
Total Citations9
Total Downloads186
Last 12 Months8
Last 6 weeks3
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
January 2014
Communication-Avoiding Symmetric-Indefinite Factorization
SIAM Journal on Matrix Analysis and Applications (SIMAX), Volume 35, Issue 42014, pp 1364–1406https://doi.org/10.1137/130929060

We describe and analyze a novel symmetric triangular factorization algorithm. The algorithm is essentially a block version of Aasen's triangular tridiagonalization. It factors a dense symmetric matrix $A$ as the product $A=PLTL^{T}P^{T},$ where $P$ is a ...
0
Metrics
Total Citations0
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
research-article
June 2013
Published By ACM
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputingJune 2013, pp 223–232https://doi.org/10.1145/2464996.2465438

The enormous gap between the high-performance capabilities of GPUs and the slow interconnect between them has made the development of numerical software that is scalable across multiple GPUs extremely challenging. We describe a successful methodology on ...
3
218
Metrics
Total Citations3
Total Downloads218
Last 12 Months0
Last 6 weeks0
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
February 2013
Published By ACM
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms
ACM Transactions on Mathematical Software (TOMS), Volume 39, Issue 2February 2013, Article No.: 9, pp 1–10https://doi.org/10.1145/2427023.2427026

Four routines called DPOTF3i, i = a,b,c,d, are presented. DPOTF3i are a novel type of level-3 BLAS for use by BPF (Blocked Packed Format) Cholesky factorization and LAPACK routine DPOTRF. Performance of routines DPOTF3i are still increasing when the ...
1
248
Metrics
Total Citations1
Total Downloads248
Last 12 Months4
Last 6 weeks2
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Batch QR Factorization on GPUs: Design, Optimization, and Tuning

A survey of numerical linear algebra methods utilizing mixed-precision arithmetic

A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines

MAGMA templates for scalable linear algebra on emerging architectures

Load-balancing Sparse Matrix Vector Product Kernels on GPUs

SLATE: design of a modern distributed and accelerated linear algebra library

Distributed-memory lattice H -matrix factorization

Linear Systems Solvers for Distributed-Memory Machines with GPU Accelerators

Least squares solvers for distributed-memory machines with GPU accelerators

The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques

Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs

Adaptive precision solvers for sparse linear systems

Mixed-precision block gram Schmidt orthogonalization

Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs

Optimization for performance and energy for batched matrix computations on GPUs

Towards batched linear solvers on accelerated hardware platforms

Also Published in:

clMAGMA: high performance dense linear algebra with OpenCL

Communication-Avoiding Symmetric-Indefinite Factorization

Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication

Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms