| 研究生: |
陳幼恩 Chen, Yu-En |
|---|---|
| 論文名稱: |
基於多雲架構之多方數據主成份分析隱私保護技術 Privacy Protection Technology for Principal Component Analysis of Multi-Party Data Based on Multi-Cloud Architecture |
| 指導教授: |
左瑞麟
Tso, Ray-Lin |
| 口試委員: |
陳昱圻
Chen, Yu-Chi 楊明豪 Yang, Ming-Hour 蔡東佐 Tsai, Tung-Tso 王紹睿 Wang, Shao-Jui |
| 學位類別: |
碩士
Master |
| 系所名稱: |
資訊學院 - 資訊安全碩士學位學程 Master Program in Information Security |
| 論文出版年: | 2025 |
| 畢業學年度: | 114 |
| 語文別: | 英文 |
| 論文頁數: | 68 |
| 中文關鍵詞: | 隱私保護 、秘密共享 、主成分分析 、多方計算 |
| 外文關鍵詞: | Privacy-preserving, Secret Sharing, Principal component analysis, Multi party Computation |
| 相關次數: | 點閱:496 下載:9 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著隱私保護需求在資料分析與機器學習領域的重要性日益提升,多方資料環境下的安全協同計算已成為關鍵課題。為此,本研究提出一項基於多雲架構之多方數據主成分分析(PCA)隱私保護技術,能於無需集中明文資料的情況下,完成分散式的特徵萃取任務。本技術整合門限式秘密共享、混淆機制與協同計算協議,有效避免任何單一雲端節點接觸原始資料或中間統計資訊。本方法以保護協方差矩陣為核心,結合多方資料提供者與多雲節點,實現可擴展且具安全保證的主成分特徵計算流程。各方僅需共享經過秘密分割的統計量,即可在不暴露任何原始內容的前提下,完成特徵值與特徵向量之協同運算。最終所輸出的主成分資訊可供資料消費者用於後續模型訓練與分析任務,兼顧隱私保護與實用性。本技術設計強調對多方數據場景的適應性,並在安全性、通訊與計算開銷之間取得良好平衡,具備高度擴展潛力,適用於醫療、金融與工業等需嚴格隱私保護之跨域應用場景。
As the demand for privacy protection in data analytics and machine-learning continues to grow, secure collaborative computation over multi-party data has become a critical research topic. To address this challenge, we propose a privacy-preserving principal component analysis (PCA) technique based on a multi-cloud architecture, which performs distributed feature extraction without aggregating plaintext data. The proposed technique combines threshold secret sharing, obfuscation mechanisms, and collaborative-computing protocols, thereby preventing any single cloud node from accessing raw data or intermediate statistical information. Focusing on the protection of the covariance matrix, the method links multiple data providers with multiple cloud nodes to create a scalable, provably secure workflow for computing principal components. Each party shares only secret-split statis tics, enabling the joint calculation of eigenvalues and eigenvectors without revealing any original content. The resulting principal-component information can be delivered to data consumers for downstream model training and analysis, ensuring both privacy and usability. Designed for multi-party data scenarios, the architecture strikes an effective balance among security, communication, and computational overhead. It offers strong scalability and is well-suited to cross-domain applications—such as healthcare, finance, and industrial settings—where stringent privacy protection is essential.
1 Introduction 1
1.1 Motivation 2
1.2 Contributions 4
1.3 Organization 5
2 Related Works 7
2.1 Homomorphic Encryption-Based Approaches 7
2.2 Secret Sharing and MPC-Based Approaches 8
2.3 Secret Sharing-Based Dual-Cloud Approaches 8
3 Preliminaries 9
3.1 Secret Sharing 9
3.2 Beaver Triples 11
3.3 Similar Matrix. 16
3.4 Inverse of Square Root 18
3.5 Principle Components Analysis 19
4 System Architecture 23
4.1 System Model 23
4.2 Threat Model 25
5 Proposed Protocols 27
5.1 Beaver Triple 28
5.2 Covariance Matrix 30
5.3 Obfuscated Covariance Matrix 31
5.4 Provider Selection and Distribution 33
5.5 Inverse Square Root 34
5.6 Eigenvalues and Eigenvectors 35
5.7 Dimensionality Reduction 37
5.8 MSSPCA 38
6 Security Analysis 43
6.1 Security of Individual Protocols 44
7 Experiments and Results 55
7.1 Experimental Setup 55
7.2 Correctness Evaluation 55
7.3 Efficiency Evaluation 58
7.4 Scalability Evaluation 59
7.5 Parameter Settings in Secure Computation 60
7.6 Experimental Summary 61
8 Conclusion 63
References 65
[1] Daniel E. O’Leary. “Artificial Intelligence and Big Data.” In: IEEE Intelligent Sys- tems 28.2 (2013), pp. 96–99 (cit. p. 1).
[2] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, et al. “Practical Secure Aggrega- tion for Privacy-Preserving Machine Learning.” In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. CCS ’17. Dallas, Texas, USA: Association for Computing Machinery, 2017, pp. 1175–1191 (cit. p. 1).
[3] Hoon Wei Lim, Geong Sen Poh, Jia Xu, and Varsha Chittawar. “PrivateLink : Privacy- Preserving Integration and Sharing of Datasets.” In: IEEE Transactions on Informa- tion Forensics and Security 15 (2020), pp. 564–577 (cit. p. 1).
[4] Arvind Narayanan and Vitaly Shmatikov. “Robust De-anonymization of Large Sparse Datasets.” In: 2008 IEEE Symposium on Security and Privacy (sp 2008). 2008, pp. 111– 125 (cit. p. 1).
[5] Wayne Jansen and Timothy Grance. SP 800-144. Guidelines on Security and Privacy in Public Cloud Computing. Tech. rep. Gaithersburg, MD, USA: National Institute of Standards and Technology (NIST), 2011 (cit. p. 1).
[6] Johannes Heurix, Peter Zimmermann, Thomas Neubauer, and Stefan Fenz. “A tax- onomy for privacy enhancing technologies.” In: Computers & Security 53 (2015),
pp. 1–17 (cit. p. 1).
[7] Inayat Ali, Eraj Khan, and Sonia Sabir. “Privacy-preserving data aggregation in resource- constrained sensor nodes in Internet of Things: A review.” In: Future Computing and Informatics Journal 3.1 (2018), pp. 41–50 (cit. p. 1).
[8] Adi Shamir. “How to share a secret.” In: Commun. ACM 22.11 (Nov. 1979), pp. 612– 613 (cit. p. 1).
[9] Nileshkumar Kakade and Utpalkumar Patel. “Secure Secret Sharing Using Homo- morphic Encryption.” In: 2020 11th International Conference on Computing, Com- munication and Networking Technologies (ICCCNT). 2020, pp. 1–7 (cit. p. 2).
[10] Saloni Kwatra and Vicenç Torra. “Data reconstruction attack against principal com- ponent analysis.” In: International Symposium on Security and Privacy in Social Net- works and Big Data. Springer. 2023, pp. 79–92 (cit. p. 2).
[11] Valeria Nikolaenko, Stratis Ioannidis, Udi Weinsberg, et al. “Privacy-preserving ma- trix factorization.” In: Proceedings of the 2013 ACM SIGSAC Conference on Com- puter & Communications Security. CCS ’13. Berlin, Germany: Association for Com- puting Machinery, 2013, pp. 801–812 (cit. p. 2).
[12] Samanvaya Panda. “Principal component analysis using CKKS homomorphic scheme.” In: Cyber Security Cryptography and Machine Learning: 5th International Sympo- sium, CSCML 2021, Be’er Sheva, Israel, July 8–9, 2021, Proceedings 5. Springer. 2021, pp. 52–70 (cit. pp. 3, 7).
[13] Xiaoyu Fan, Guosai Wang, Kun Chen, Xu He, and Wei Xu. “Ppca: Privacy-preserving principal component analysis using secure multiparty computation (mpc).” In: arXiv preprint arXiv:2105.07612 (2021) (cit. pp. 3, 8, 56–59).
[14] Ian T Jolliffe. Principal component analysis for special types of data. Springer, 2002 (cit. p. 7).
[15] Xinbo Liu, Yaping Lin, Qin Liu, and Xin Yao. “A privacy-preserving principal com- ponent analysis outsourcing framework.” In: 2018 17th IEEE International Confer- ence On Trust, Security And Privacy In Computing And Communications/12th IEEE
International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE. 2018, pp. 1354–1359 (cit. p. 7).
[16] Xirong Ma, Chuan Ma, Yali Jiang, and Chunpeng Ge. “Improved privacy-preserving PCA using optimized homomorphic matrix multiplication.” In: Computers & Secu- rity 138 (2024), p. 103658 (cit. p. 8).
[17] Zhihua Xia, Qi Gu, Lizhi Xiong, and Wenhao Zhou. “Privacy-preserving image re- trieval based on additive secret sharing.” In: International Journal of Autonomous and Adaptive Communications Systems 17.2 (2024), pp. 99–126 (cit. p. 8).
[18] Yingting Liu, Chaochao Chen, Longfei Zheng, et al. “Privacy preserving pca for multiparty modeling.” In: arXiv preprint arXiv:2002.02091 (2020) (cit. p. 8).
[19] Hsin-Chan Yen. “Privacy Preserving Principal Components Analysis: A Dual-Cloud Construction Based on Secret Sharing for Protecting Covariance Matrix.” Available from the National Digital Library of Theses and Dissertations in Taiwan. Master’s Thesis. National Taiwan University of Science and Technology, 2024 (cit. pp. 3, 8, 60).
[20] Donald Beaver. “Efficient multiparty protocols using circuit randomization.” In: Ad- vances in Cryptology—CRYPTO’91: Proceedings 11. Springer. 1992, pp. 420–432
(cit. p. 11).
[21] Michel Journée, Yurii Nesterov, Peter Richtárik, and Rodolphe Sepulchre. “General- ized power method for sparse principal component analysis.” In: Journal of Machine Learning Research 11 (2010) (cit. p. 20).
[22] Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university press, 2012 (cit. p. 17).
[23] Josef Stoer, Roland Bulirsch, R Bartels, Walter Gautschi, and Christoph Witzgall. Introduction to numerical analysis. Vol. 1993. Springer, 1980 (cit. p. 18).
[24] RP Kanwal and KC Liu. “A Taylor expansion approach for solving integral equa- tions.” In: International Journal of Mathematical Education in Science and Technol- ogy 20.3 (1989), pp. 411–414 (cit. p. 18).
[25] Louis W Ehrlich. “A modified Newton method for polynomials.” In: Communica- tions of the ACM (1967), pp. 107–108 (cit. p. 19).
[26] Ian T Jolliffe. Principal component analysis for special types of data. Springer, 2002.
[27] Payman Mohassel and Yupeng Zhang. “Secureml: A system for scalable privacy- preserving machine learning.” In: 2017 IEEE symposium on security and privacy (SP). IEEE. 2017, pp. 19–38.
[28] R. Canetti. “Universally composable security: a new paradigm for cryptographic pro- tocols.” In: Proceedings 42nd IEEE Symposium on Foundations of Computer Sci-ence.
2001, pp. 136–145 (cit. p. 43).
[29] Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. “Modeling wine preferences by data mining from physicochemical properties.” In: Decision Support Systems 47.4 (2009), pp. 547–553 (cit. p. 55).
[30] Lingjun Meng, Peter van der Putten, and Haiyang Wang. “A comprehensive bench- mark of the artificial immune recognition system (AIRS).” In: International Con- ference on Advanced Data Mining and Applications. Springer, 2005, pp. 66–76 (cit. p. 55).
[31] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, et al. “Scikit-learn: Ma- chine learning in
Python.” In: the Journal of machine Learning research 12 (2011), pp. 2825–2830 (cit. p. 55).