Author: Dongarra, Jack : Search

Applied Filters

People

Publications

Conferences

Reproducibility Badges

Publication Date

131 Results for: Author: Dongarra, JackEdit SearchSave SearchRSS

Searched The ACM Full-Text Collection (691,749 records)|Expand your search to The ACM Guide to Computing Literature (3,482,418 records)

Showing 1 - 20of131 Results

Filters

Select All

Export Citations Save to Binder

per page:

Relevance

research-article
November 2022
Results Reproduced / v1.1
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisNovember 2022, Article No.: 26, pp 1–14

Many scientific applications rely on sparse direct solvers for their numerical robustness. However, performance optimization for these solvers remains a challenging task, especially on GPUs. This is due to workloads of small dense matrices that are ...
0
36
Metrics
Total Citations0
Total Downloads36
Last 12 Months36
Last 6 weeks12
1
Supplementary Material
SC22_Presentation_Ahmad.mp4
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
November 2022
Reshaping geostatistical modeling and prediction for extreme-scale environmental applications
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisNovember 2022, Article No.: 2, pp 1–12

We extend the capability of space-time geostatistical modeling using algebraic approximations, illustrating application-expected accuracy worthy of double precision from majority low-precision computations and low-rank matrix approximations. We exploit ...
0
71
Metrics
Total Citations0
Total Downloads71
Last 12 Months71
Last 6 weeks20
1
Supplementary Material
reshaping_geostatistical_modeling_and_prediction_for_extreme-scale_environmental_applications.mp4 (1080p).mp4
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
Public Access
June 2021
Published By ACM
A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines
ACM Transactions on Mathematical Software (TOMS), Volume 47, Issue 3September 2021, Article No.: 21, pp 1–23https://doi.org/10.1145/3431921

This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a ...
11
433
Metrics
Total Citations11
Total Downloads433
Last 12 Months270
Last 6 weeks33
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
HTML
PDF
research-article
October 2020
Published By ACM
Using Advanced Vector Extensions AVX-512 for MPI Reductions
EuroMPI/USA '20: Proceedings of the 27th European MPI Users' Group MeetingSeptember 2020, pp 1–10https://doi.org/10.1145/3416315.3416316

As the scale of high-performance computing (HPC) systems continues to grow, researchers are devoted themselves to explore increasing levels of parallelism to achieve optimal performance. The modern CPU’s design, including its features of hierarchical ...
3
132
Metrics
Total Citations3
Total Downloads132
Last 12 Months36
Last 6 weeks3
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
June 2020
Published By ACM
Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications
PASC '20: Proceedings of the Platform for Advanced Scientific Computing ConferenceJune 2020, Article No.: 2, pp 1–11https://doi.org/10.1145/3394277.3401846

Climate and weather can be predicted statistically via geospatial Maximum Likelihood Estimates (MLE), as an alternative to running large ensembles of forward models. The MLE-based iterative optimization procedure requires the solving of large-scale ...
17
373
Metrics
Total Citations17
Total Downloads373
Last 12 Months61
Last 6 weeks5
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
Upcoming Conferences
Skip slideshow

ICSE '23

May 14 - 20, 2023

Melbourne Convention Centre, Melbourne, Australia

ICSE '23 Website

HPDC '23

June 16 - 23, 2023

Orlando World Marriott, Orlando, FL, USA

HPDC '23 Website

SPAA '23

June 17 - 19, 2023

Orlando World Marriott, Orlando, FL, USA

SPAA '23 Website

ICS '23

June 21 - 23, 2023

Orlando World Center Marriott, Orlando , FL, USA

PASC '23

June 26 - 28, 2023

Congress Center Davos, Davos, Switzerland

PASC '23 Website

SC '23

November 12 - 17, 2023

Colorado Convention Center, Denver, CO, USA

SC '23 Website

ICSE '24

April 12 - 21, 2024

Centro Cultural de Bel?m, Lisbon, Portugal

ICSE '24 Website
research-article
Public Access
March 2020
Published By ACM
Load-balancing Sparse Matrix Vector Product Kernels on GPUs
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 1March 2020, Article No.: 2, pp 1–26https://doi.org/10.1145/3380930

Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that ...
18
1,370
Metrics
Total Citations18
Total Downloads1,370
Last 12 Months455
Last 6 weeks52
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
HTML
PDF
research-article
Public Access
November 2019
Published By ACM
SLATE: design of a modern distributed and accelerated linear algebra library
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2019, Article No.: 26, pp 1–18https://doi.org/10.1145/3295500.3356223

The SLATE (Software for Linear Algebra Targeting Exascale) library is being developed to provide fundamental dense linear algebra capabilities for current and upcoming distributed high-performance systems, both accelerated CPU-GPU based and CPU based. ...
35
1,207
Metrics
Total Citations35
Total Downloads1,207
Last 12 Months348
Last 6 weeks46
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
Public Access
August 2019
Published By ACM
Massively Parallel Automated Software Tuning
ICPP '19: Proceedings of the 48th International Conference on Parallel ProcessingAugust 2019, Article No.: 92, pp 1–10https://doi.org/10.1145/3337821.3337908

This article presents an implementation of a distributed autotuning engine developed as part of the Bench-testing OpenN Software Autotuning Infrastructure project. The system is geared towards performance optimization of computational kernels for ...
4
311
Metrics
Total Citations4
Total Downloads311
Last 12 Months49
Last 6 weeks3
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
July 2019
Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisNovember 2018, Article No.: 47, pp 1–11https://doi.org/10.1109/SC.2018.00050

Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) ...
17
114
Metrics
Total Citations17
Total Downloads114
Last 12 Months32
Last 6 weeks1
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
Public Access
June 2019
Published By ACM
Least squares solvers for distributed-memory machines with GPU accelerators
ICS '19: Proceedings of the ACM International Conference on SupercomputingJune 2019, pp 117–126https://doi.org/10.1145/3330345.3330356

This work presents an implementation of a linear least squares solver for distributed-memory machines with GPU accelerators, developed as part of the Software for Linear Algebra Targeting Exascale (SLATE) package. From the algorithmic standpoint, the ...
2
285
Metrics
Total Citations2
Total Downloads285
Last 12 Months32
Last 6 weeks3
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
Public Access
June 2019
Published By ACM
Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software
PASC '19: Proceedings of the Platform for Advanced Scientific Computing ConferenceJune 2019, Article No.: 9, pp 1–11https://doi.org/10.1145/3324989.3325719

We present an automated performance evaluation framework that enables an automated workflow for testing and performance evaluation of software libraries. Integrating this component into an ecosystem enables sustainable software development, as a ...
4
424
Metrics
Total Citations4
Total Downloads424
Last 12 Months95
Last 6 weeks9
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
Open Access
May 2019
Published By ACM
PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP
ACM Transactions on Mathematical Software (TOMS), Volume 45, Issue 2June 2019, Article No.: 16, pp 1–35https://doi.org/10.1145/3264491

The recent version of the Parallel Linear Algebra Software for Multicore Architectures (PLASMA) library is based on tasks with dependencies from the OpenMP standard. The main functionality of the library is presented. Extensive benchmarks are targeted ...
18
3,138
Metrics
Total Citations18
Total Downloads3,138
Last 12 Months700
Last 6 weeks69
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
HTML
PDF
research-article
November 2018
Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisNovember 2018, Article No.: 47, pp 1–11

Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) ...
0
662
Metrics
Total Citations0
Total Downloads662
Last 12 Months28
Last 6 weeks4
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
June 2018
Published By ACM
ADAPT: an event-based adaptive collective communication framework
HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed ComputingJune 2018, pp 118–130https://doi.org/10.1145/3208040.3208054

The increase in scale and heterogeneity of high-performance computing (HPC) systems predispose the performance of Message Passing Interface (MPI) collective communications to be susceptible to noise, and to adapt to a complex mix of hardware ...
12
361
Metrics
Total Citations12
Total Downloads361
Last 12 Months48
Last 6 weeks3
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
Public Access
November 2017
Published By ACM
Investigating half precision arithmetic to accelerate dense linear system solvers
ScalA '17: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale SystemsNovember 2017, Article No.: 10, pp 1–8https://doi.org/10.1145/3148226.3148237

The use of low-precision arithmetic in mixed-precision computing methods has been a powerful tool to accelerate numerous scientific computing applications. Artificial intelligence (AI) in particular has pushed this to current extremes, making use of ...
35
728
Metrics
Total Citations35
Total Downloads728
Last 12 Months135
Last 6 weeks16
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
November 2017
Published By ACM
Dynamic task discovery in PaRSEC: a data-flow task-based runtime
ScalA '17: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale SystemsNovember 2017, Article No.: 6, pp 1–8https://doi.org/10.1145/3148226.3148233

Successfully exploiting distributed collections of heterogeneous many-cores architectures with complex memory hierarchy through a portable programming model is a challenge for application developers. The literature is not short of proposals addressing ...
40
294
Metrics
Total Citations40
Total Downloads294
Last 12 Months79
Last 6 weeks15
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
Public Access
November 2017
Published By ACM
Flexible batched sparse matrix-vector product on GPUs
ScalA '17: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale SystemsNovember 2017, Article No.: 3, pp 1–8https://doi.org/10.1145/3148226.3148230

We propose a variety of batched routines for concurrently processing a large collection of small-size, independent sparse matrix-vector products (SpMV) on graphics processing units (GPUs). These batched SpMV kernels are designed to be flexible in order ...
4
285
Metrics
Total Citations4
Total Downloads285
Last 12 Months32
Last 6 weeks3
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
proceeding
November 2017
Published By ACM
ScalA '17: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Novel scalable scientific algorithms are needed in order to enable key science applications to exploit the computational power of large-scale systems. This is especially true for the current tier of leading petascale machines and the road to exascale ...
109
2,284
Metrics
Total Citations109
Total Downloads2,284
Last 12 Months359
Last 6 weeks43
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
research-article
Public Access
June 2017
Published By ACM
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs
ICS '17: Proceedings of the International Conference on SupercomputingJune 2017, Article No.: 5, pp 1–10https://doi.org/10.1145/3079079.3079103

This paper presents a software framework for solving large numbers of relatively small matrix problems using GPUs. Our approach combines novel and existing HPC techniques to methodically apply performance analysis, kernel design, low-level optimizations,...
12
510
Metrics
Total Citations12
Total Downloads510
Last 12 Months69
Last 6 weeks7
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
Public Access
February 2017
Published By ACM
High-performance Cholesky factorization for GPU-only execution
GPGPU-10: Proceedings of the General Purpose GPUsFebruary 2017, pp 42–52https://doi.org/10.1145/3038228.3038237

We present our performance analysis, algorithm designs, and the optimizations needed for the development of high-performance GPU-only algorithms, and in particular, for the dense Cholesky factorization. In contrast to currently promoted designs that ...
9
815
Metrics
Total Citations9
Total Downloads815
Last 12 Months202
Last 6 weeks33
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences