跳到主要內容

簡易檢索 / 詳目顯示

研究生: 吳明倫
Wu, Ming-Lun
論文名稱: 以兩層式機器學習進行連網設備識別
Two-Level Machine Learning for Network Enabled Devices Identification
指導教授: 胡毓忠
Hu, Yuh-Jong
口試委員: 黃世禎
Huang, Sun-Jen
張家銘
Chang, Jia-Ming
學位類別: 碩士
Master
系所名稱: 理學院 - 資訊科學系
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 46
中文關鍵詞: 物聯網連網設備資訊安全兩層式機器學習半監督式學習網路掃描資料支援向量機隨機森林二元分類器
外文關鍵詞: Network Enabled Devices, Two-level Machine Learning, Censys, Network Scan Data
DOI URL: http://doi.org/10.6814/NCCU201900635
相關次數: 點閱:89下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著物聯網技術的蓬勃發展,網路上連網設備數量呈現爆炸性的成長,提供的服務也更為多元,使人們的生活更方便。然連網設備產品的設計不良及資安防護能力的缺乏,使設備漏洞遭駭客利用的事件層出不窮,導致充斥連網設備的家庭及企業網路環境面臨重大資安威脅。為了瞭解目標網路內連接有多少具有潛在風險的連網設備,藉由連網設備識別來瞭解網路狀況便是資安防護的第一步。本研究希望探索以兩層式機器學習(Two-level Machine Learning)的技術,用於處理量體龐大且具有階層式資料(Hierarchical Structure Data)特性的連網設備資料上,並比較與目前常用的單層式機器學習間的差異,加上結合半監督式學習的概念,探索自動處理受歸類為未知設備的可能性。

    本研究使用 Censys 網路掃描資料集來進行支援向量機(Support Vector Machine)及隨機森林(Random Forest)兩種分類演算法的二元分類器訓練,進而對連網設備資料進行分類;並採半監督式學習概念,嘗試找出以基於密度的分群演算法來處理受歸類為未知類別設備的最佳參數。最後透過多項模擬實驗來驗證與比較在這個應用問題中,兩種分類演算法及單層與兩層式機器學習之間的差異,並就實驗成果提出相關量化與質化的觀察結果。


    With the rapid development of Internet of Things technology, the number of network enabled devices on the Internet has exploded and the services provided have become more diverse, making people's lives more convenient. However, the poor design of network enabled devices and the lack of security protection capabilities have led to an endless stream of equipment exploits by hackers, which has led to major security threats to home and corporate network environments that are full of network enabled devices. In order to understand how many potentially network enabled devices are connected to the target network, it is the first step of security protection to understand the network status through network enabled devices identification. This study hopes to explore the technology of two-level machine learning, which is used to process network enabled devices with large volume and hierarchical structure data characteristics, then compare differences with common single-level machine learning. Combined with the concept of semi-supervised learning to explore the possibility of automatically classifying objects which are classified as unknown device.

    This study uses the Censys network scan dataset to perform binary classifier training with Support Vector Machine and Random Forest classification algorithms, and then classifies the network enabled devices. With semi-supervised learning concepts, trying to find out the best parameters for classified unknown devices by density-based clustering algorithms. Finally, through a number of simulation experiments to verify and compare the differences between the two classification algorithms and single-level and two-level machine learning in this application problem, then provides relevant quantitative and qualitative observations on the experimental results.

    第一章 導論 1
    1.1 研究動機 1
    1.2 研究目的 2
    1.3 研究貢獻 3
    第二章 研究背景 5
    2.1 兩層式機器學習 5
    2.2 未知資料處理 9
    2.3 連網設備識別 13
    2.4 網路掃描資料 14
    第三章 相關研究 18
    3.1 兩層式機器學習研究案例 18
    3.2 連網設備識別研究案例 19
    3.3 半監督式學習研究案例 20
    第四章 兩層式機器學習流程設計 21
    4.1 資料前處理階段 22
    4.2 建模方式 25
    4.3 模擬實驗設計 27
    第五章 研究實作與比較 29
    5.1 網路掃描資料處理流程 29
    5.2 兩層式機器學習流程 33
    5.3 模擬實驗 37
    第六章 結論與未來展望 42
    6.1 研究結論 42
    6.2 未來展望 43
    參考文獻 44

    [1] Y. Yuchen et al. A survey on security and privacy issues in internet-of-things. IEEE Internet of Things Journal, 4(5):1250-1258, 2017.
    [2] A. Gupta et al. dkk.,(2013), vulnerability assessment and penetration testing. International Journal of Engineering Trends and Technology, 4(3-2013), 2013.
    [3] Susan Dumais and Hao Chen. Hierarchical classification of web content. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 256-263. ACM, 2000.
    [4] Huei Chen Wu. A study on multi-layered automatic book classification system using data mining. Master's thesis, National Chung Hsing University, 2015.
    [5] O. Papadopoulou et al. A Two-Level Classification Approach for Detecting Clickbait Posts using Text-Based Features. arXiv preprint arXiv:1710.08528, 2017.
    [6] O. Chapelle et al. Semi-Supervised Learning. The MIT Press, 1st edition, 2010.
    [7] Levi Lelis and Jörg Sander. Semi-supervised density-based clustering. In 2009 Ninth IEEE International Conference on Data Mining, pages 842-847. IEEE, 2009.
    [8] M. Ester et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226-231, 1996.
    [9] Kishore Angrishi. Turning internet of things (iot) into internet of vulnerabilities (iov): Iot botnets. arXiv preprint arXiv:1702.03681, 2017.
    [10] Keaton Mower and Hovav Shacham. Pixel perfect: Fingerprinting canvas in html5. Proceedings of W2SP, pages 1-12, 2012.
    [11] Z. Durumeric et al. Zmap: Fast internet-wide scanning and its security applications. In Presented as part of the 22nd {USENIX} Security Symposium ({USENIX} Security 13), pages 605-620, 2013.
    [12] Z. Durumeric et al. A search engine backed by Internet-wide scanning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 542-553. ACM, 2015.
    [13] D. Arora et al. Big Data Analytics for Classification of Network Enabled Devices. In 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pages 708-713, March 2016.
    [14] M. Miettinen et al. IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 2177-2184, June 2017.
    [15] B. Genge et al. ShoVAT: Shodan-based vulnerability assessment tool for Internet-facing services. Security and communication networks, 9(15):2696-2714, 2016.
    [16] S. Shaikh et al. Implementation of dbscan algorithm for internet traffic classification. International Journal of Computer Science and Information Technology Research (IJCSITR), pages 25-32, 2013.
    [17] Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861-874, 2006.
    [18] Arie Ben-David. About the relationship between roc curves and cohen's kappa. Engineering Applications of Artificial Intelligence, 21(6):874-882, 2008.
    [19] Ka Yee Yeung and Walter L Ruzzo. Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763-774, 2001.
    [20] Leland McInnes and John Healy. Accelerated hierarchical density based clustering. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on, pages 33-42. IEEE, 2017.

    QR CODE
    :::