Author: Abdelfattah, Ahmad : Search

Applied Filters

People

Conferences

Reproducibility Badges

Publication Date

6 Results for: Author: Abdelfattah, AhmadEdit SearchSave SearchRSS

Searched The ACM Full-Text Collection (691,749 records)|Expand your search to The ACM Guide to Computing Literature (3,482,418 records)

Showing 1 - 6of6 Results

Filters

Select All

Export Citations Save to Binder

per page:

Relevance

research-article
November 2022
Results Reproduced / v1.1
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisNovember 2022, Article No.: 26, pp 1–14

Many scientific applications rely on sparse direct solvers for their numerical robustness. However, performance optimization for these solvers remains a challenging task, especially on GPUs. This is due to workloads of small dense matrices that are ...
0
36
Metrics
Total Citations0
Total Downloads36
Last 12 Months36
Last 6 weeks12
1
Supplementary Material
SC22_Presentation_Ahmad.mp4
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
research-article
Public Access
June 2021
Published By ACM
A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines
ACM Transactions on Mathematical Software (TOMS), Volume 47, Issue 3September 2021, Article No.: 21, pp 1–23https://doi.org/10.1145/3431921

This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a ...
11
433
Metrics
Total Citations11
Total Downloads433
Last 12 Months270
Last 6 weeks33
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
HTML
PDF
research-article
Public Access
August 2019
Published By ACM
Massively Parallel Automated Software Tuning
ICPP '19: Proceedings of the 48th International Conference on Parallel ProcessingAugust 2019, Article No.: 92, pp 1–10https://doi.org/10.1145/3337821.3337908

This article presents an implementation of a distributed autotuning engine developed as part of the Bench-testing OpenN Software Autotuning Infrastructure project. The system is geared towards performance optimization of computational kernels for ...
4
311
Metrics
Total Citations4
Total Downloads311
Last 12 Months49
Last 6 weeks3
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
Public Access
June 2017
Published By ACM
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs
ICS '17: Proceedings of the International Conference on SupercomputingJune 2017, Article No.: 5, pp 1–10https://doi.org/10.1145/3079079.3079103

This paper presents a software framework for solving large numbers of relatively small matrix problems using GPUs. Our approach combines novel and existing HPC techniques to methodically apply performance analysis, kernel design, low-level optimizations,...
12
510
Metrics
Total Citations12
Total Downloads510
Last 12 Months69
Last 6 weeks7
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
View online with eReader
PDF
research-article
May 2016
Published By ACM
KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators
ACM Transactions on Mathematical Software (TOMS), Volume 42, Issue 3June 2016, Article No.: 18, pp 1–31https://doi.org/10.1145/2818311

KBLAS is an open-source, high-performance library that provides optimized kernels for a subset of Level 2 BLAS functionalities on CUDA-enabled GPUs. Since performance of dense matrix-vector multiplication is hindered by the overhead of memory accesses, ...
24
335
Metrics
Total Citations24
Total Downloads335
Last 12 Months27
Last 6 weeks2
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access
Upcoming Conferences

ICS '23

June 21 - 23, 2023

Orlando World Center Marriott, Orlando , FL, USA

SC '23

November 12 - 17, 2023

Colorado Convention Center, Denver, CO, USA

SC '23 Website
research-article
November 2014
Pipelining computational stages of the tomographic reconstructor for multi-object adaptive optics on a multi-GPU system
SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2014, pp 262–273https://doi.org/10.1109/SC.2014.27

The European Extremely Large Telescope project (E-ELT) is one of Europe's highest priorities in ground-based astronomy. ELTs are built on top of a variety of highly sensitive and critical astronomical instruments. In particular, a new instrument called ...
5
140
Metrics
Total Citations5
Total Downloads140
Last 12 Months2
Last 6 weeks0
Export Citations
Save to Binder
Save to Binder
Create a New Binder
Name
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Caption

Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers

A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines

Massively Parallel Automated Software Tuning

Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators

Upcoming Conferences

Pipelining computational stages of the tomographic reconstructor for multi-object adaptive optics on a multi-GPU system