research-article

Globus automation services: Research process automation across the space–time continuum

Authors Info & Claims
Published:09 March 2023Publication History
Skip Abstract Section

Abstract

Abstract

Research process automation–the reliable, efficient, and reproducible execution of linked sets of actions on scientific instruments, computers, data stores, and other resources–has emerged as an essential element of modern science. We report here on new services within the Globus research data management platform that enable the specification of diverse research processes as reusable sets of actions, flows, and the execution of such flows in heterogeneous research environments. To support flows with broad spatial extent (e.g., from scientific instrument to remote data center) and temporal extent (from seconds to weeks), these Globus automation services feature: (1) cloud hosting for reliable execution of even long-lived flows despite sporadic failures; (2) a simple specification and extensible asynchronous action provider API, for defining and executing a wide variety of actions and flows involving heterogeneous resources; (3) an event-driven execution model for automating execution of flows in response to arbitrary events; and (4) a rich security model enabling authorization delegation mechanisms for secure execution of long-running actions across distributed resources. These services permit researchers to outsource and automate the management of a broad range of research tasks to a reliable, scalable, and secure cloud platform. We present use cases for Globus automation services, describe their design and implementation, present microbenchmark studies, and review experiences applying the services in a range of applications.

Highlights

The research process automation problem, fundamental to modern science, is defined.

A research process automation approach based on cloud-hosted services is proposed.

New Globus automation services are described that implement this approach

Benchmark results and experiences at large scientific instruments are reported.

References

  1. [1] Stach E., et al., Autonomous experimentation systems for materials development: A community perspective, Matter 4 (9) (2021) 27022726, 10.1016/j.matt.2021.06.036.Google ScholarGoogle Scholar
  2. [2] Leong C.J., et al., An object-oriented framework to enable workflow evolution across materials acceleration platforms, Matter 5 (10) (2022) 31243134.Google ScholarGoogle Scholar
  3. [3] Liu Z., et al., Bridging data center AI systems with edge computing for actionable information retrieval, in: 3rd Workshop on Extreme-Scale Experiment-in-the-Loop Computing, IEEE, 2021, pp. 1523, 10.1109/XLOOP54565.2021.00008.Google ScholarGoogle Scholar
  4. [4] Trifan A., et al., Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action, Int. J. High Perform. Comput. Appl. (2022), 10.1177/10943420221113.Google ScholarGoogle Scholar
  5. [5] Barreto C., et al., Web Services Business Process Execution Language Version 2.0 primer, 2007, OASIS Specification.Google ScholarGoogle Scholar
  6. [6] Conductor scalable workflow orchestration, 2022, https://conductor.netflix.com. (Accessed November 2022).Google ScholarGoogle Scholar
  7. [7] D. Xin, et al., How Developers Iterate on Machine Learning Workflows, in: IDEA Workshop at KDD, 2018.Google ScholarGoogle Scholar
  8. [8] Chard K., et al., Efficient and secure transfer, synchronization, and sharing of big data, IEEE Cloud Comput. 1 (3) (2014) 4655, 10.1109/MCC.2014.52.Google ScholarGoogle Scholar
  9. [9] Amazon States Language, 2022, https://states-language.net/. (Accessed January 2022).Google ScholarGoogle Scholar
  10. [10] Tuecke S., et al., Globus auth: A research identity and access management platform, in: 12th IEEE International Conference on e-Science, 2016, pp. 203212, 10.1109/eScience.2016.7870901.Google ScholarGoogle Scholar
  11. [11] Ananthakrishnan R., et al., Globus platform-as-a-service for collaborative science applications, Concurr. Comput.: Pract. Exper. 27 (2) (2015) 290305, 10.1002/cpe.3262.Google ScholarGoogle Scholar
  12. [12] Vescovi R., et al., Linking scientific instruments and computation: Patterns, technologies, and experiences, Patterns 3 (10) (2022), 10.1016/j.patter.2022.100606.Google ScholarGoogle Scholar
  13. [13] Bicer T., et al., High-performance ptychographic reconstruction with federated facilities, in: Smoky Mountains Computational Sciences and Engineering Conference, Springer, 2021, pp. 173189. https://arxiv.org/abs/2111.11330.Google ScholarGoogle Scholar
  14. [14] Blaiszik B., et al., A data ecosystem to support machine learning in materials science, MRS Commun. 9 (4) (2019) 11251133, 10.1557/mrc.2019.118.Google ScholarGoogle Scholar
  15. [15] Charbonneau A.L., et al., Making Common Fund data more findable: Catalyzing a data ecosystem, 2021,10.1101/2021.11.05.467504. BioRxiv, Cold Spring Harbor Laboratory.Google ScholarGoogle Scholar
  16. [16] Sherrell D.A., et al., Fixed-target serial crystallography at the Structural Biology Center, J. Synchrotron Radiat. 29 (5) (2022) 11411151, 10.1107/S1600577522007895.Google ScholarGoogle Scholar
  17. [17] Levental M., et al., Ultrafast focus detection for automated microscopy, in: International Conference on Computational Science, Springer, 2022, pp. 403416, 10.1109/eScience51609.2021.00039.Google ScholarGoogle Scholar
  18. [18] Ali A., et al., FairDMS: Rapid model training by data and model reuse, 2022, https://arxiv.org/abs/2204.09805.Google ScholarGoogle Scholar
  19. [19] Diederichs K., et al., Serial synchrotron X-ray crystallography (SSX), in: Protein Crystallography, Springer, 2017, pp. 239272.Google ScholarGoogle Scholar
  20. [20] Winter G., et al., DIALS: Implementation and evaluation of a new integration package, Acta Crystallogr. Section D 74 (2) (2018) 8597, 10.1107/S2059798317017235.Google ScholarGoogle Scholar
  21. [21] Uervirojnangkoorn M., et al., Enabling X-ray free electron laser crystallography for challenging biological systems from a limited number of crystals, Elife 4 (2015).Google ScholarGoogle Scholar
  22. [22] Hidayetoglu M., et al., MemXCT: Design, optimization, scaling, and reproducibility of X-ray tomography imaging, IEEE Trans. Parallel Distrib. Syst. 33 (9) (2021) 20142031, 10.1109/TPDS.2021.3128032.Google ScholarGoogle Scholar
  23. [23] Liu Z., et al., TomoGAN: Low-dose synchrotron X-ray tomography with generative adversarial networks, J. Opt. Soc. Amer. A 37 (3) (2020) 422434, 10.1364/JOSAA.375595.Google ScholarGoogle Scholar
  24. [24] Lehmkühler F., et al., From femtoseconds to hours–measuring dynamics over 18 orders of magnitude with coherent X-rays, Appl. Sci. 11 (13) (2021) 6179.Google ScholarGoogle Scholar
  25. [25] Maiden A.M., et al., Superresolution imaging via ptychography, J. Opt. Soc. Amer. A 28 (4) (2011) 604612.Google ScholarGoogle Scholar
  26. [26] Pokharel R., Overview of high-energy X-ray diffraction microscopy (HEDM) for mesoscale material characterization in three-dimensions, in: Materials Discovery and Design, Springer International Publishing, 2018, pp. 167201, 10.1007/978-3-319-99465-9_7.Google ScholarGoogle Scholar
  27. [27] Dubochet J., Cryo-EM—The first thirty years, J. Microsc. 245 (3) (2012) 221224.Google ScholarGoogle Scholar
  28. [28] Huerta E.A., et al., Enabling real-time multi-messenger astrophysics discoveries with deep learning, Nat. Rev. Phys. 1 (10) (2019) 600608, 10.1038/s42254-019-0097-4.Google ScholarGoogle Scholar
  29. [29] Bernier J.V., et al., Far-field high-energy diffraction microscopy: A tool for intergranular orientation and strain analysis, J. Strain Anal. Eng. Des. 46 (7) (2011) 527547.Google ScholarGoogle Scholar
  30. [30] MIDAS, microstructural imaging using diffraction analysis software, 2022, https://www.aps.anl.gov/Science/Scientific-Software/MIDAS. (Accessed March 2022).Google ScholarGoogle Scholar
  31. [31] Blaiszik B., et al., The Materials Data Facility: Data services to advance materials science research, JOM 68 (8) (2016) 20452052, 10.1007/s11837-016-2001-3.Google ScholarGoogle Scholar
  32. [32] Chard R., et al., DLHub: Model and data serving for science, in: 33rd IEEE International Parallel and Distributed Processing Symposium, 2019, pp. 283292, 10.1109/IPDPS.2019.00038.Google ScholarGoogle Scholar
  33. [33] Li Z., et al., DLHub: Simplifying publication, discovery, and use of machine learning models in science, J. Parallel Distrib. Comput. 147 (2021) 6476.Google ScholarGoogle Scholar
  34. [34] Common Fund Data Ecosystem (CFDE), https://commonfund.nih.gov/dataecosystem.Google ScholarGoogle Scholar
  35. [35] Allcock W.E., et al., Petrel: A programmatically accessible research data service, in: Practice and Experience in Advanced Research Computing, ACM, 2019, pp. 17, 10.1145/3332186.3332241.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Senior A.W., et al., Improved protein structure prediction using potentials from deep learning, Nature 577 (7792) (2020) 706710, 10.1038/s41586-019-1923-7.Google ScholarGoogle Scholar
  37. [37] Chard K., et al., Globus Nexus: A platform-as-a-service provider of research identity, profile, and group management, Future Gener. Comput. Syst. 56 (2016) 571583, 10.1016/j.future.2015.09.006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Allen B., et al., Software as a service for data scientists, Commun. ACM 55 (2) (2012) 8188, 10.1145/2076450.2076468.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Ananthakrishnan R., et al., Globus platform services for data publication, in: Practice and Experience on Advanced Research Computing, ACM, 2018, pp. 14:114:7, 10.1145/3219104.3219127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Ananthakrishnan R., et al., An open ecosystem for pervasive use of persistent identifiers, in: Practice and Experience in Advanced Research Computing, ACM, 2020, pp. 99105, 10.1145/3311790.3396660.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Chard R., et al., FuncX: A federated function serving fabric for science, in: 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020, pp. 6576, 10.1145/3369583.3392683.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Li Z., et al., FuncX: Federated function as a service for science, IEEE Trans. Parallel Distrib. Syst. 33 (12) (2022) 49484963, 10.1109/TPDS.2022.3208767.Google ScholarGoogle Scholar
  43. [43] Alt J., et al., OAuth SSH with globus auth, in: Practice and Experience in Advanced Research Computing, ACM, 2020, pp. 3440, 10.1145/3311790.3396658.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Globus Action Providers, 2022, https://docs.globus.org/api/flows/hosted-action-providers/. (Accessed August 2022).Google ScholarGoogle Scholar
  45. [45] AWS Step Functions Visual workflows for modern applications, 2022, https://aws.amazon.com/step-functions. (Accessed January 2022).Google ScholarGoogle Scholar
  46. [46] Wright A., et al., JSON Schema: A Media Type for Describing JSON Documents, Internet Engineering Task Force, 2020, Work in Progress.Google ScholarGoogle Scholar
  47. [47] Hardt D., OAuth 2.0 Authorization Framework Specification, no. 6749, Internet Engineering Task Force, 2012, http://tools.ietf.org/html/rfc6749.Google ScholarGoogle Scholar
  48. [48] Globus action provider tools, 2022, https://action-provider-tools.readthedocs.io/. (Accessed August 2022).Google ScholarGoogle Scholar
  49. [49] Existing workflow systems, 2022, https://s.apache.org/existing-workflow-systems. (Accessed January 2022).Google ScholarGoogle Scholar
  50. [50] Ludäscher B., et al., Scientific workflow management and the Kepler system, Concurr. Comput.: Pract. Exper. 18 (10) (2006) 10391065.Google ScholarGoogle Scholar
  51. [51] Goecks J., et al., Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences, Genome Biol. 11 (8) (2010) 113.Google ScholarGoogle Scholar
  52. [52] Deelman E., et al., Pegasus, a workflow management system for science automation, Future Gener. Comput. Syst. 46 (2015) 1735.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] M. Albrecht, et al., Makeflow: A portable abstraction for data intensive computing on clusters, clouds, and grids, in: 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, 2012, pp. 1–13.Google ScholarGoogle Scholar
  54. [54] Babuji Y., et al., Parsl: Pervasive parallel programming in Python, in: 28th International Symposium on High-Performance Parallel and Distributed Computing, ACM, 2019, pp. 2536, 10.1145/3307681.3325400.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] da Silva R.F., et al., A community roadmap for scientific workflows research and development, in: IEEE Workshop on Workflows in Support of Large-Scale Science, 2021, pp. 8190, 10.1109/WORKS54523.2021.00016.Google ScholarGoogle Scholar
  56. [56] Liew C.S., et al., Scientific workflows: Moving across paradigms, ACM Comput. Surv. 49 (4) (2016) 139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Krauter K., et al., A taxonomy and survey of grid resource management systems for distributed computing, Softw. - Pract. Exp. 32 (2) (2002) 135164.Google ScholarGoogle Scholar
  58. [58] Deelman E., et al., Workflows and e-Science: An overview of workflow system features and capabilities, Future Gener. Comput. Syst. 25 (5) (2009) 528540.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Wilde M., et al., Swift: A language for distributed parallel scripting, Parallel Comput. 37 (9) (2011) 633652, 10.1016/j.parco.2011.05.005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Hull D., et al., Taverna: A tool for building and running workflows of services, Nucleic Acids Res. 34 (suppl_2) (2006) W729W732.Google ScholarGoogle Scholar
  61. [61] Curbera F., et al., Unraveling the Web services web: An introduction to SOAP, WSDL, and UDDI, IEEE Internet Comput. 6 (2) (2002) 8693.Google ScholarGoogle Scholar
  62. [62] Alshuqayran N., et al., A systematic mapping study in microservice architecture, in: IEEE 9th International Conference on Service-Oriented Computing and Applications, IEEE, 2016, pp. 4451.Google ScholarGoogle Scholar
  63. [63] Candela L., et al., A workflow language for research e-infrastructures, Int. J. Data Sci. Anal. 11 (4) (2021) 361376.Google ScholarGoogle Scholar
  64. [64] DAGman: The directed acyclic graph manager, http://www.cs.wisc.edu/condor/dagman.Google ScholarGoogle Scholar
  65. [65] Common workflow language specifications, v1.0.2, 2020, https://www.commonwl.org/v1.0/. (Accessed April 2020).Google ScholarGoogle Scholar
  66. [66] Emmerich W., et al., Grid service orchestration using the business process execution language (BPEL), J. Grid Comput. 3 (3) (2005) 283304.Google ScholarGoogle Scholar
  67. [67] Tan W., et al., A comparison of using Taverna and BPEL in building scientific workflows: The case of caGrid, Concurr. Comput.: Pract. Exper. 22 (9) (2010) 10981117.Google ScholarGoogle Scholar
  68. [68] Tan W., et al., BPEL4Job: A fault-handling design for job flow management, in: International Conference on Service-Oriented Computing, Springer Berlin Heidelberg, 2007, pp. 2742.Google ScholarGoogle Scholar
  69. [69] Amazon simple workflow service, 2022, https://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-welcome.html. (Accessed January 2022).Google ScholarGoogle Scholar
  70. [70] GitHub actions, 2022, https://github.com/features/actions/. (Accessed January 2022).Google ScholarGoogle Scholar
  71. [71] AWS CodePipeline, 2022, https://aws.amazon.com/codepipeline/. (Accessed January 2022).Google ScholarGoogle Scholar
  72. [72] Eugster P.T., et al., The many faces of publish/subscribe, ACM Comput. Surv. 35 (2) (2003) 114131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. [73] A. Alqaoud, et al., Publish/subscribe as a model for scientific workflow interoperability, in: 4th Workshop on Workflows in Support of Large-Scale Science, 2009, pp. 1–10.Google ScholarGoogle Scholar
  74. [74] Kamburugamuve S., et al., A framework for real time processing of sensor data in the cloud, J. Sensors 2015 (2015).Google ScholarGoogle Scholar
  75. [75] Renart E., et al., Online decision-making using edge resources for content-driven stream processing, in: 13th International Conference on e-Science, IEEE, 2017, pp. 384392.Google ScholarGoogle Scholar
  76. [76] Experimental Physics and Industrial Control System (EPICS), 2022, https://epics.anl.gov. (Accessed August 2022).Google ScholarGoogle Scholar
  77. [77] M. Quigley, et al., ROS: An open-source Robot Operating System, in: ICRA Workshop on Open Source Software, Vol. 3, 2009, p. 5.Google ScholarGoogle Scholar
  78. [78] Xu H., et al., iRODS primer 2: Integrated Rule-Oriented Data System, Synth. Lect. Inf. Concepts Retr. Serv. 9 (3) (2017) 1131.Google ScholarGoogle Scholar
  79. [79] B. Ur, et al., Practical trigger-action programming in the smart home, in: Conference on Human Factors in Computing Systems, 2014, pp. 803–812.Google ScholarGoogle Scholar
  80. [80] B. Ur, et al., Trigger-action programming in the wild: An analysis of 200,000 IFTTT recipes, in: Conference on Human Factors in Computing Systems, 2016, pp. 3227–3231.Google ScholarGoogle Scholar
  81. [81] Chard R., et al., High-throughput neuroanatomy and trigger-action programming: A case study in research automation, in: 1st International Workshop on Autonomous Infrastructure for Science, 2018,10.1145/3217197.3217206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. [82] Goscinski W.J., et al., The Multi-modal Australian ScienceS Imaging and Visualization Environment (MASSIVE) high performance computing infrastructure: Applications in neuroscience and neuroinformatics research, Front. Neuroinform. 8 (2014) 30.Google ScholarGoogle Scholar
  83. [83] Plale B., et al., CASA and LEAD: Adaptive cyberinfrastructure for real-time multiscale weather forecasting, Computer 39 (11) (2006) 5664.Google ScholarGoogle Scholar
  84. [84] Elias A.R., et al., Where’s the bear?–Automating wildlife image processing using IoT and edge cloud systems, in: IEEE/ACM Second International Conference on Internet-of-Things Design and Implementation, IEEE, 2017, pp. 247258.Google ScholarGoogle Scholar
  85. [85] Beckman P., et al., SPRUCE: A system for supporting urgent high-performance computing, in: Grid-Based Problem Solving Environments, Springer, 2007, pp. 295311.Google ScholarGoogle Scholar
  86. [86] Altintas I., Using dynamic data driven cyberinfrastructure for next generation disaster intelligence, in: International Conference on Dynamic Data Driven Application Systems, Springer, 2020, pp. 1821.Google ScholarGoogle Scholar
  87. [87] Boccali T., et al., Dynamic distribution of high-rate data processing from CERN to remote HPC data centers, Comput. Softw. Big Sci. 5 (1) (2021) 113.Google ScholarGoogle Scholar
  88. [88] Wilkins-Diehr N., et al., TeraGrid science gateways and their impact on science, Computer 41 (11) (2008) 3241.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. [89] Blaschke J.P., et al., Real-time XFEL data analysis at SLAC and NERSC: A trial run of nascent exascale experimental data analysis, 2021, arXiv:2106.11469.Google ScholarGoogle Scholar
  90. [90] Cholia S., et al., NEWT: A RESTful service for building high performance computing web applications, in: Gateway Computing Environments Workshop, IEEE, 2010, pp. 111.Google ScholarGoogle Scholar
  91. [91] Stubbs J., et al., Tapis: An API platform for reproducible, distributed computational research, in: Future of Information and Communication Conference, Springer, 2021, pp. 878900.Google ScholarGoogle Scholar
  92. [92] Thain D., et al., Distributed computing in practice: The condor experience, Concurr. Comput.: Pract. Exper. 17 (2–4) (2005) 323356.Google ScholarGoogle Scholar
  93. [93] Salim M., et al., Balsam: Near real-time experimental data analysis on supercomputers, in: 1st IEEE/ACM Workshop on Large-Scale Experiment-in-the-Loop Computing, IEEE, 2019, pp. 2631.Google ScholarGoogle Scholar
  94. [94] Nickolay S., et al., Towards accommodating real-time jobs on HPC platforms, 2021, https://arxiv.org/abs/2103.13130.Google ScholarGoogle Scholar
  95. [95] Antypas K.B., et al., Enabling discovery data science through cross-facility workflows, in: IEEE International Conference on Big Data, 2021, pp. 36713680, 10.1109/BigData52589.2021.9671421.Google ScholarGoogle Scholar
  96. [96] Bard D., et al., The LBNL Superfacility Project Report, 2022,10.48550/arXiv.2206.11992.Google ScholarGoogle Scholar
  97. [97] Bard D.J., et al., Automation for data-driven research with the NERSC superfacility API, in: High Performance Computing, Springer International Publishing, Cham, 2021, pp. 333345.Google ScholarGoogle Scholar
  98. [98] Stansberry D., et al., DataFed: Towards reproducible research via federated data management, in: International Conference on Computational Science and Computational Intelligence, IEEE, 2019, pp. 13121317.Google ScholarGoogle Scholar
  99. [99] Sparkes A., et al., Towards robot scientists for autonomous scientific discovery, Automated Experimentation 2 (1) (2010) 111.Google ScholarGoogle Scholar
  100. [100] Roch L.M., et al., ChemOS: Orchestrating autonomous experimentation, Science Robotics 3 (19) (2018) eaat5559.Google ScholarGoogle Scholar
  101. [101] Steiner S., et al., Organic synthesis in a modular robotic system driven by a chemical programming language, Science 363 (6423) (2019).Google ScholarGoogle Scholar
  102. [102] Burger B., et al., A mobile robotic chemist, Nature 583 (7815) (2020) 237241.Google ScholarGoogle Scholar
  103. [103] Noack M.M., et al., Gaussian processes for autonomous data acquisition at large-scale synchrotron and neutron facilities, Nat. Rev. Phys. 3 (10) (2021) 685697.Google ScholarGoogle Scholar

Index Terms

(auto-classified)
  1. Globus automation services: Research process automation across the space–time continuum

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!