Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Pooling and sharing data increases and distributes its value. But since data cannot be revoked once shared, scenarios that require controlled release of data for regulatory, privacy, and legal reasons default to not sharing. Because selectively ...
The training of deep neural network models on large data remains a difficult problem, despite progress towards scalable techniques. In particular, there is a mismatch between the random but predetermined order in which AI flows select training samples ...
Serverless computing presents an attractive model for general distributed computing as it focuses on abstracting the infrastructure required to execute an application. This workshop investigates the intersection between high performance computing and ...
Modern scientific instruments, such as detectors at synchrotron light sources, generate data at such high rates that online processing is needed for data reduction, feature detection, experiment steering, and other purposes. The same high data rates ...
It is our pleasure to welcome you to the second workshop on High Performance Serverless Computing (HiPS2022). The recent years have seen growing adoption of serverless computing as a model for computing in the cloud, as well as a model for remote and ...
In an in-situ workflow, multiple components such as simulation and analysis applications are coupled with streaming data transfers. The multiplicity of possible configurations necessitates an auto-tuner for workflow optimization. Existing auto-tuning ...
Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models. ...
The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2–3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In ...
We introduce Xtract, an automated and scalable system for bulk metadata extraction from large, distributed research data repositories. Xtract orchestrates the application of metadata extractors to groups of files, determining which extractors to apply ...
It is our pleasure to welcome you to the first workshop on High Performance Serverless Computing (HiPS2021). Serverless computing is poised to become not only the face of cloud computing in the commercial world, but also a model for remote and ...
Recent advances in networking technology and serverless architectures have enabled automated distribution of compute workloads at the function level. As heterogeneity and physical distribution of computing resources increase, so too does the need to ...
Ptychography is an advanced high-resolution X-ray imaging technique that can generate extremely large datasets. Ptychographic reconstruction transforms reciprocal space experimental data to high-resolution 2D real-space images. GPUs have been used ...
Atomistic-scale simulations are prominent scientific applications that require the repetitive execution of a computationally expensive routine to calculate a system's potential energy. Prior work shows that these expensive routines can be replaced with ...
Modern science and engineering computing environments often feature storage systems of different types, from parallel file systems in high-performance computing centers to object stores operated by cloud providers. To enable easy, reliable, secure, and ...
Parsl is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems. Parsl relies on developers annotating Python ...
In-situ parallel workflows couple multiple component applications via streaming data transfer to avoid data exchange via shared file systems. Such workflows are challenging to configure for optimal performance due to the huge space of possible ...
Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation ...
X-ray computed tomography is a commonly used technique for noninvasive imaging at synchrotron facilities. Iterative tomographic reconstruction algorithms are often preferred for recovering high quality 3D volumetric images from 2D X-ray images, however, ...
The traditional model of having simulations write data to disk for offline analysis can be prohibitively expensive on computers with limited storage capacity or I/O bandwidth. In situ data analysis has emerged as a necessary paradigm to address this ...
Persistent identifiers (PIDs) are essential for making data Findable, Accessible, Interoperable, and Reusable, or FAIR. While the advantages of PIDs for data publication and citation are well understood, and Digital Object Identifiers (DOIs) are ...