Abstract
Pooling and sharing data increases and distributes its value. But since data cannot be revoked once shared, scenarios that require controlled release of data for regulatory, privacy, and legal reasons default to not sharing. Because selectively controlling what data to release is difficult, the few data-sharing consortia that exist are often built around data-sharing agreements resulting from long and tedious one-off negotiations.
We introduce Data Station, a data escrow designed to enable the formation of data-sharing consortia. Data owners share data with the escrow knowing it will not be released without their consent. Data users delegate their computation to the escrow. The data escrow relies on delegated computation to execute queries without releasing the data first. Data Station leverages hardware enclaves to generate trust among participants, and exploits the centralization of data and computation to generate an audit log.
We evaluate Data Station on machine learning and data-sharing applications while running on an untrusted intermediary. In addition to important qualitative advantages, we show that Data Station: i) outperforms federated learning baselines in accuracy and runtime for the machine learning application; ii) is orders of magnitude faster than alternative secure data-sharing frameworks; and iii) introduces small overhead on the critical path.
- [n.d.]. FATE. https://fate.fedai.org/ Online; accessed 29 May 2022.Google Scholar
- [n.d.]. Python-fuse interface to libfuse. https://github.com/libfuse/python-fuse. Online; accessed 29 May 2022.Google Scholar
- Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu. 2002. Hippocratic databases. In 28th International Conference on Very Large Databases. Elsevier, 143--154.Google Scholar
Cross Ref
- Yael Amsterdamer and Osnat Drien. 2020. Towards Fine-Grained Data Access Control Through Active Peer Probing.. In EDBT. 403--406.Google Scholar
- Panagiotis Antonopoulos, Arvind Arasu, Kunal D Singh, Ken Eguro, Nitish Gupta, Rajat Jain, Raghav Kaushik, Hanuma Kodavalla, Donald Kossmann, Nikolas Ogg, et al. 2020. Azure SQL database always encrypted. In ACM SIGMOD International Conference on Management of Data. 1511--1525.Google Scholar
Digital Library
- Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, Andre Martin, Christian Priebe, Joshua Lind, Divya Muthukumaran, Dan O'keeffe, Mark L Stillwell, et al. 2016. SCONE: Secure Linux containers with Intel SGX. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 689--703.Google Scholar
- Johes Bater, Xi He, William Ehrich, Ashwin Machanavajjhala, and Jennie Rogers. 2018. Shrinkwrap: Differentially-private query processing in private data federations. Proceedings of the VLDB Endowment 12, 3 (2018), 307--320.Google Scholar
Digital Library
- Johes Bater, Yongjoo Park, Xi He, Xiao Wang, and Jennie Rogers. 2020. Saqe: practical privacy-preserving approximate query processing for data federations. Proceedings of the VLDB Endowment 13, 12 (2020), 2691--2705.Google Scholar
Digital Library
- Andrew Baumann, Marcus Peinado, and Galen Hunt. 2015. Shielding Applications from an Untrusted Cloud with Haven. ACM Trans. Comput. Syst. 33, 3, Article 8 (aug 2015), 26 pages. Google Scholar
Digital Library
- Daniel J Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Titouan Parcollet, Pedro PB de Gusmão, and Nicholas D Lane. 2020. Flower: A friendly federated learning research framework. arXiv preprint arXiv.2007.14390 (2020).Google Scholar
- CCG CCE Tech Pubs - Intel Corp. 2022. 12th Generation Intel® Core™ Processors --- Datasheet, Volume 1 of 2. https://www.intel.com/content/www/us/en/products/docs/processors/core/core-technical-resources.html. Online; accessed 28 February 2022.Google Scholar
- Chris Clifton, Murat Kantarcioğlu, AnHai Doan, Gunther Schadow, Jaideep Vaidya, Ahmed Elmagarmid, and Dan Suciu. 2004. Privacy-preserving data integration and sharing. In 9th ACM SIGMOD workshop on Research Issues in Data Mining and Knowledge Discovery. 19--26.Google Scholar
Digital Library
- Ankur Dave, Chester Leung, Raluca Ada Popa, Joseph E Gonzalez, and Ion Stoica. 2020. Oblivious coopetitive analytics using hardware enclaves. In 15th European Conference on Computer Systems. 1--17.Google Scholar
Digital Library
- Tim Dierks and Eric Rescorla. 2008. The transport layer security (TLS) protocol version 1.2. (2008).Google Scholar
- Peter F Edemekong, Pavan Annamaraju, and Micelle J Haydel. 2018. Health insurance portability and accountability act. (2018).Google Scholar
- Muhammad El-Hindi, Carsten Binnig, Arvind Arasu, Donald Kossmann, and Ravi Ramamurthy. 2019. BlockchainDB: A shared database on blockchains. Proceedings of the VLDB Endowment 12, 11 (2019), 1597--1609.Google Scholar
Digital Library
- Raul Castro Fernandez, Ziawasch Abedjan, Famien Koko, Gina Yuan, Samuel Madden, and Michael Stonebraker. 2018. Aurum: A data discovery system. In IEEE 34th International Conference on Data Engineering. IEEE, 1001--1012.Google Scholar
- Ian Foster. 2018. Research infrastructure for the safe analysis of sensitive data. The Annals of the American Academy of Political and Social Science 675, 1 (2018), 102--120.Google Scholar
Cross Ref
- Benny Fuhry, HA Jayanth Jain, and Florian Kerschbaum. 2021. Encdbdb: Searchable encrypted, fast, compressed, in-memory database using enclaves. In 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 438--450.Google Scholar
Cross Ref
- Craig Gentry. 2009. A fully homomorphic encryption scheme. Ph.D. Dissertation. Stanford University. https://crypto.stanford.edu/craig/craig-thesis.pdf.Google Scholar
Digital Library
- Christian Göttel, Rafael Pires, Isabelly Rocha, Sébastien Vaucher, Pascal Felber, Marcelo Pasin, and Valerio Schiavoni. 2018. Security, performance and energy trade-offs of hardware-assisted memory protection mechanisms. In IEEE 37th Symposium on Reliable Distributed Systems. IEEE, 133--142.Google Scholar
Cross Ref
- Vipul Goyal, Omkant Pandey, Amit Sahai, and Brent Waters. 2006. Attribute-based encryption for fine-grained access control of encrypted data. In 13th ACM Conference on Computer and Communications Security. 89--98.Google Scholar
Digital Library
- Tyler Hunt, Zhiting Zhu, Yuanzhong Xu, Simon Peter, and Emmett Witchel. 2018. Ryoan: A distributed sandbox for untrusted computation on secret data. ACM Transactions on Computer Systems 35, 4 (2018), 1--32.Google Scholar
Digital Library
- Intel Corporation. 2017. Enclave Memory Measurement Tool for Intel® Software Guard Extensions (Intel® SGX) Enclaves. https://www.intel.com/content/dam/develop/external/us/en/documents/enclave-measurement-tool-intel-sgx-737361.pdf. Online; accessed 24 February 2022.Google Scholar
- Intel Corporation. 2021. Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3D: System Programming Guide, Part 4. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html. Online; accessed 23 February 2022.Google Scholar
- Inter-university Consortium for Political and Social Research. 2022. ICPSR Data Enclaves. https://www.icpsr.umich.edu/web/pages/ICPSR/access/restricted/enclave.html. Online; accessed 18 February 2022.Google Scholar
- Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2021. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14, 1--2 (2021), 1--210.Google Scholar
- Poul-Henning Kamp and Robert NM Watson. 2000. Jails: Confining the omnipotent root. In 2nd International SANE Conference, Vol. 43. 116.Google Scholar
- David Kaplan, Jeremy Powell, and Tom Woller. 2016. AMD memory encryption. White paper (2016).Google Scholar
- Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi S. Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, Francois Halbach, Alex Rocha, and Joe Stubbs. 2020. Lessons Learned from the Chameleon Testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC '20). USENIX Association.Google Scholar
- Colin Ian King. [n.d.]. stress-ng. https://github.com/ColinIanKing/stress-ng Online; accessed 29 May 2022.Google Scholar
- Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- Julia Lane, Pascal Heus, and Tim Mulcahy. 2008. Data Access in a Cyber World: Making Use of Cyberinfrastructure. Transactions on Data Privacy 1, 1 (2008), 2--16.Google Scholar
Digital Library
- Federated Learning. 2017. Collaborative machine learning without centralized training data. Publication date: Thursday, April 6 (2017).Google Scholar
- Dayeol Lee, David Kohlbrenner, Shweta Shinde, Krste Asanović, and Dawn Song. 2020. Keystone: An open framework for architecting trusted execution environments. In 15th European Conference on Computer Systems. 1--16.Google Scholar
Digital Library
- Mengyuan Li, Yinqian Zhang, and Zhiqiang Lin. 2021. CROSSLINE: Breaking" Security-by-Crash" based Memory Isolation in AMD SEV. In ACM SIGSAC Conference on Computer and Communications Security. 2937--2950.Google Scholar
Digital Library
- Mengyuan Li, Yinqian Zhang, Huibo Wang, Kang Li, and Yueqiang Cheng. 2021. {CIPHERLEAKS}: Breaking Constant-time Cryptography on AMD SEV via the Ciphertext Side Channel. In 30th USENIX Security Symposium (USENIX Security 21). 717--732.Google Scholar
- John Liagouris, Vasiliki Kalavri, Muhammad Faisal, and Mayank Varia. 2021. Secrecy: Secure collaborative analytics on secret-shared data. arXiv preprint arXiv.2102.01048 (2021).Google Scholar
- Sujaya Maiyya, Victor Zakhary, Mohammad Javad Amiri, Divyakant Agrawal, and Amr El Abbadi. 2019. Database and distributed computing foundations of blockchains. In International Conference on Management of Data. 2036--2041.Google Scholar
Digital Library
- H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2016. Communication-Efficient Learning of Deep Networks from Decentralized Data. (2016). Google Scholar
Cross Ref
- Chandrasekaran Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS) 17, 1 (1992), 94--162.Google Scholar
Digital Library
- Nightingale Open Science. 2022. https://www.nightingalescience.org/. Online; accessed 25 February 2022.Google Scholar
- Helen Nissenbaum. 2004. Privacy as contextual integrity. Wash. L. Rev. 79 (2004), 119.Google Scholar
- NORC. 2022. NORC Data Enclave. https://www.norc.org/Research/Capabilities/Pages/data-enclave.aspx. Online; accessed 18 February 2022.Google Scholar
- Nisha Panwar, Shantanu Sharma, Guoxi Wang, Sharad Mehrotra, Nalini Venkatasubramanian, Mamadou H Diallo, and Ardalan Amiri Sani. 2021. IoT notary: Attestable sensor data capture in IoT environments. ACM Transactions on Internet of Things 3, 1 (2021), 1--30.Google Scholar
Digital Library
- Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: An embeddable analytical database. In International Conference on Management of Data. 1981--1984.Google Scholar
Digital Library
- G Anthony Reina, Alexey Gruzdev, Patrick Foley, Olga Perepelkina, Mansi Sharma, Igor Davidyuk, Ilya Trushkin, Maksim Radionov, Aleksandr Mokrov, Dmitry Agapov, Jason Martin, Brandon Edwards, Micah J. Sheller, Sarthak Pati, Prakash Narayana Moorthy, Shih han Wang, Prashant Shah, and Spyridon Bakas. 2021. OpenFL: An open-source framework for Federated Learning. arXiv:2105.06413 [cs.LG]Google Scholar
- Mark Russinovich, Edward Ashton, Christine Avanessians, Miguel Castro, Amaury Chamayou, Sylvan Clebsch, Manuel Costa, Cédric Fournet, Matthew Kerner, Sid Krishna, et al. 2019. CCF: A framework for building confidential verifiable replicated services. Technical report, Microsoft Research and Microsoft Azure (2019).Google Scholar
- Felix Schuster, Manuel Costa, Cédric Fournet, Christos Gkantsidis, Marcus Peinado, Gloria Mainar-Ruiz, and Mark Russinovich. 2015. VC3: Trustworthy data analytics in the cloud using SGX. In IEEE Symposium on Security and Privacy. IEEE, 38--54.Google Scholar
Digital Library
- AMD SEV-SNP. 2020. Strengthening VM isolation with integrity protection and more. White Paper, January (2020).Google Scholar
- Alex Shamis, Peter Pietzuch, Miguel Castro, Edward Ashton, Amaury Chamayou, Sylvan Clebsch, Antoine Delignat-Lavaud, Cedric Fournet, Matthew Kerner, Julien Maffre, et al. 2021. PAC: Practical Accountability for CCF. arXiv preprint arXiv.2105.13116 (2021).Google Scholar
- Yuanyuan Sun, Sheng Wang, Huorong Li, and Feifei Li. 2021. Building enclave-native storage engines for practical encrypted databases. Proceedings of the VLDB Endowment 14, 6 (2021), 1019--1032.Google Scholar
Digital Library
- SUSE. 2022. AMD SEV Guide. https://documentation.suse.com/sles/15-SP2/html/SLES-amd-sev/art-amd-sev.html. Online; accessed 28 February 2022.Google Scholar
- Miklos Szeredi. 2010. FUSE: Filesystem in userspace. http://fuse.sourceforge.net (2010).Google Scholar
- Carol Tenopir, Suzie Allard, Kimberly Douglass, Arsev Umur Aydinoglu, Lei Wu, Eleanor Read, Maribeth Manoff, and Mike Frame. 2011. Data sharing by scientists: practices and perceptions. PloS one 6, 6 (2011), e21101.Google Scholar
- UCI. 2022. Adult Income Dataset. https://www.kaggle.com/wenruliu/adult-income-dataset. Online; accessed 1 March 2022.Google Scholar
- Nikolaj Volgushev, Malte Schwarzkopf, Ben Getchell, Mayank Varia, Andrei Lapets, and Azer Bestavros. 2019. Conclave: secure multi-party computation on big data. In 14th EuroSys Conference. 1--18.Google Scholar
Digital Library
- Frank Wang, James Mickens, Nickolai Zeldovich, and Vinod Vaikuntanathan. 2016. Sieve: Cryptographically enforced access control for user data in untrusted clouds. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). 611--626.Google Scholar
- Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H Yang, Farhad Farokhi, Shi Jin, Tony QS Quek, and H Vincent Poor. 2020. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security 15 (2020), 3454--3469.Google Scholar
Digital Library
- Alan F Westin. 1968. Privacy and freedom. Washington and Lee Law Review 25, 1 (1968), 166.Google Scholar
- Alex Wong. [n.d.]. COVID-Net. https://github.com/AlexSWong/COVID-Net Online; accessed 21 May 2022.Google Scholar
- Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse of texts. In 5th ACM workshop on Scientific Cloud Somputing. 9--16.Google Scholar
Digital Library
- Wenting Zheng, Ankur Dave, Jethro G Beekman, Raluca Ada Popa, Joseph E Gonzalez, and Ion Stoica. 2017. Opaque: An oblivious and encrypted distributed analytics platform. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 283--298.Google Scholar
- Wenting Zheng, Ryan Deng, Weikeng Chen, Raluca Ada Popa, Aurojit Panda, and Ion Stoica. 2021. Cerebro: A Platform for {Multi-Party} Cryptographic Collaborative Learning. In 30th USENIX Security Symposium (USENIX Security 21). 2723--2740.Google Scholar
- Jinwei Zhu, Kun Cheng, Jiayang Liu, and Liang Guo. 2021. Full Encryption: An end to end encryption mechanism in GaussDB. Proceedings of the VLDB Endowment 14, 12 (2021), 2811--2814.Google Scholar
Digital Library
- Alexander Ziller, Andrew Trask, Antonio Lopardo, Benjamin Szymkow, Bobby Wagner, Emma Bluemke, Jean-Mickael Nounahon, Jonathan Passerat-Palmbach, Kritika Prakash, Nick Rose, et al. 2021. Pysyft: A library for easy federated learning. In Federated Learning Systems. Springer, 111--139.Google Scholar
Comments