Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Cloud systems are becoming increasingly powerful and complex. It is highly challenging to identify anomalous execution behaviors and pinpoint problems by examining the overwhelming intermediate results/states in complex application workflows. Domain ...
With the development of computer technology, large amounts of data are stored and analyzed, which provides a new perspective for analyzing social and economic problems and assisting scientific decision-making. Tourism is the main source of revenues for ...
Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation ...
Scientific data publications may include interactive data applications designed by scientists to explore a scientific problem. Defined as knowledge systems, their development is complex when data are aggregated from multiple sources over time. ...
Across various scientific domains, digital publication of technical documents, often in the form of conference/journal article submissions, are the first accessible instance of new human knowledge in these respective fields. Synthesizing and curating ...
With the emerging of data-driven analysis and data science, the needs of high-performance computing resources and cyberinfrastructure (CI) have been widely spread across almost all domain fields in academia. While CI providers have continued success ...
Big data analytics pipeline becomes popular for large volume data processing, Apache Zeppelin provides an integrated environment for data ingestion, data discovery, data analytics and data visualization and collaboration with an extended framework which ...
Although R has become an analytic platform for many scientific domains, high performance has rarely been a trait of R. The inefficiency can come from the R programming specification itself or the interpreter environment implementation. Profiling and ...
Transportation agencies often own extensive networks of monocular traffic cameras, which are typically used for traffic monitoring. However, the information captured by such cameras can also be of great value for transportation planning and operations ...
Increasingly, digital libraries and archives need to and are using cyberinfrastructure and machine learning to meet curation, data management, and researchers needs. This workshop focuses on facilitating adoption and integration between these spaces. It ...
Research computing centers provide a wide variety of services including large-scale computing resources, data storage, high-speed interconnect and scientific software repositories to facilitate continuous competitive research. Efficient management of ...
The value of research data not only resides in its content, but in how it is made available to users. Research data is often presented interactively through a web application, the design of which is often the result of years of work by researchers. ...
XALT is a job-monitoring tool to collect accurate, detailed, and continuous job level and link-time data on all MPI jobs running on a computing cluster. Due to its usefulness and complementariness to other system logs, XALT has been deployed on Stampede ...
A dataset from the field of High Performance Computing (HPC) was curated with the focus on facilitating its reuse and to appeal to a broader audience beyond HPC specialists. At an early stage in the research project, the curators gathered requirements ...
Recent advances in cytometry instrumentation are enabling the generation of "big data" at the single cell level for the identification of cell-based biomarkers, which will fundamentally change the current paradigm of diagnosis and personalized treatment ...
We present a case of archival analysis using a combination of data mining methods. The team of researchers, composed by archivists and computer scientists, used a collection of declassified Department of State Cables as a case study. The methods ...
An emerging need in information retrieval is to identify a set of documents conforming to an abstract description. This task presents two major challenges to existing methods of document retrieval and classification. First, similarity based on overall ...
High-resolution display environments consisting of many individual displays arrayed to form a single visible surface are commonly used to present large scale data. Using these displays often involves a control paradigm where interactions become ...
We develop and evaluate a version of the excluded middle vantage point forest in support of range searches and load balancing for parallel queries. The algorithm is evaluated using a benchmark suite that includes real-world biological sequence ...
This paper describes the use of relational database management system (RDBMS) and treemap visualization to represent and analyze a group of personal digital collections created in the context of work and with no external metadata. We evaluated the ...