ABSTRACT
Modern distributed systems can benefit from the availability of large-scale and heterogeneous computing infrastructures. However, the complexity and dynamic nature of these environments also call for self-adaptation abilities, as guaranteeing efficient resource usage and acceptable service levels through static configurations is very difficult.
In this talk, we discuss a hierarchical auto-scaling approach for distributed applications, where application-level managers steer the overall process by supervising component-level adaptation managers. Following a bottom-up approach, we first discuss how to exploit model-free and model-based reinforcement learning to compute auto-scaling policies for each component. Then, we show how Bayesian optimization can be used to automatically configure the lower-level auto-scalers based on application-level objectives. As a case study, we consider distributed data stream processing applications, which process high-volume data flows in near real-time and cope with varying and unpredictable workloads.
- Yahya Al-Dhuraibi, Fawaz Paraiso, Nabil Djarallah, and Philippe Merle. 2018. Elasticity in Cloud Computing: State of the Art and Research Challenges., Vol. 11, 2 (2018), 430--447. https://doi.org/10.1109/TSC.2017.2711009Google Scholar
- Valeria Cardellini, Francesco Lo Presti, Matteo Nardelli, and Gabriele Russo Russo. 2022. Run-Time Adaptation of Data Stream Processing Systems: The State of the Art. ACM Comput. Surv., Vol. 54 (2022), 36 pages.Issue 11s. https://doi.org/10.1145/3514496Google Scholar
Digital Library
- Marios Fragkoulis, Paris Carbone, Vasiliki Kalavri, and Asterios Katsifodimos. 2020. A Survey on the Evolution of Stream Processing Systems. CoRR, Vol. abs/2008.00842 (2020). arxiv: 2008.00842 https://arxiv.org/abs/2008.00842Google Scholar
- Peter I. Frazier. 2018. A Tutorial on Bayesian Optimization. Vol. abs/1807.02811. arxiv: 1807.02811 http://arxiv.org/abs/1807.02811Google Scholar
- Omid Gheibi, Danny Weyns, and Federico Quin. 2021. Applying Machine Learning in Self-Adaptive Systems: A Systematic Literature Review. ACM Transactions on Autonomous and Adaptive Systems, Vol. 15, 3, Article 9 (2021), 37 pages. https://doi.org/10.1145/3469440Google Scholar
Digital Library
- Thomas Heinze, Leonardo Aniello, Leonardo Querzoni, and Zbigniew Jerzak. 2014. Cloud-based Data Stream Processing. In Proc. of 8th ACM Int'l Conf. on Distributed Event-Based Systems, DEBS '14. 238--245. https://doi.org/10.1145/2611286.2611309Google Scholar
Digital Library
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, et al. 2015. Human-Level Control Through Deep Reinforcement Learning. Nat., Vol. 518, 7540 (2015), 529--533. https://doi.org/10.1038/nature14236Google Scholar
Cross Ref
- Henriette Rö ger and Ruben Mayer. 2019. A Comprehensive Survey on Parallelization and Elasticity in Stream Processing. ACM Comput. Surv., Vol. 52, 2 (2019), 36:1--36:37. https://doi.org/10.1145/3303849Google Scholar
Digital Library
- Gabriele Russo Russo, Valeria Cardellini, and Francesco Lo Presti. 2019. Reinforcement Learning Based Policies for Elastic Stream Processing on Heterogeneous Resources. In Proc. of 13th ACM Int'l Conf. on Distributed and Event-based Systems, DEBS '19. 31--42. https://doi.org/10.1145/3328905.3329506Google Scholar
Digital Library
- Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction 2 ed.). MIT Press, Cambridge, MA, USA.Google Scholar
Digital Library
Index Terms
Using Reinforcement Learning to Control Auto-Scaling of Distributed Applications
Comments