跳到主要內容

簡易檢索 / 詳目顯示

研究生: 柯百翼
Ko, Pai-Yi
論文名稱: 二元分類的同類別異質性
Label Heterogeneity in Binary Classification
指導教授: 周珮婷
Chou, Pei-Ting
口試委員: 蕭維政
Hsiao, Wei-Cheng
林怡伶
Lin, Yi-Ling
學位類別: 碩士
Master
系所名稱: 商學院 - 統計學系
Department of Statistics
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 51
中文關鍵詞: 二元分類多元分類標籤內嵌樹Pseudo Likelihood分類器類別異質性
外文關鍵詞: Binary Classification, Multiclass Classification, Label Tree, Pseudo Likelihood Classifier, Label Heterogeneity
DOI URL: http://doi.org/10.6814/NCCU202000962
相關次數: 點閱:118下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 機器學習上,二元分類為最常見的資料型態,這種資料型態可能存在著同類別異質性的潛在問題,導致分類器模型的分類錯誤。本研究為使模型能夠更仔細的辨識資料之間的差異,提升預測分類準確率,透過華德最小變異聚合的概念將二元分類的兩類別分別進行階層式分群,將分群後的結果重新定義為新的次類別。原始的二元分類資料集轉變為多元分類資料集後,本研究使用標籤內嵌樹(Label Embedding Tree)與分類器模型 - Pseudo Likelihood 進行分類並得出多元分類預測結果,再將預測的次類別結果轉換為原始的二元分類類別。研究結果顯示此結構下得出的分類預測結果並不輸於其他著名的二元分類器模型的分類預測結果,並且不同的是分類預測結果皆穩定處於一個波動不大的區間內,反之其他二元分類器模型的分類預測結果因變數集的更動而產生了劇烈的變動,因此本研究提出的研究方法不僅一定程度上解決了同類別異質性的問題且提升分類預測率,同時能夠透過此研究結構得到穩定的分類預測率。


    Binary classification is one of the most common problems in machine learning research. However, the noisy label is one of the potential difficulties in binary classification. This study aims to solve this common challenge by using sub-labels information based on the original label. Hierarchical clustering is used first to build a hierarchy of sub-label clusters. The heterogeneity which exists in the original labels is identified to improve classification accuracy. Label tree and Pseudo Likelihood classifier are used in the current study for classification. The findings show that the performance of the Label tree and Pseudo Likelihood classifier is not inferior to the other well-known binary classification models. The classification results are stable compared to those classifiers with different feature subsets. We believe the proposed method solves the heterogeneity problem that exists in the original labels in classification.

    第一章 緒論 1
    第一節 研究背景與動機 1
    第二節 研究目的 2
    第二章 文獻回顧 4
    第三章 研究方法 7
    第一節 分類預測模型 8
    第二節 變數挑選 11
    第四章 研究過程與結果 13
    第一節 資料介紹 13
    第二節 研究過程與結果 20
    第五章 結論與建議 48
    第一節 結論 48
    第二節 未來研究方向與建議 49
    第六章 參考文獻 50

    一、 中文參考文獻
    [1] 王宗惇, & 陳儒賢. (2016). 結合自組織映射圖網路與支撐向量機於颱風期間水庫入流量預測之研究. [Reservoir Inflow Forecasting During Typhoon Periods by Combining Self-Organizing Map with Support Vector Regression]. 農業工程學報, 62(2), 1-16. doi:10.29974/JTAE.201606_62(2).0001
    [2] 李亭玫. (2017). 一個用於情緒分類的腦波分群方法. (碩士). 國立宜蘭大學,宜蘭縣. Retrieved from https://hdl.handle.net/11296/853kp5
    [3] 謝弘一. (2011). 資料探勘於信用卡顧客行為評分模型之建構. (博士). 輔仁大學, 新北市. Retrieved from https://hdl.handle.net/11296/c79yd9

    二、 英文參考文獻
    [4] Charrad, M., Ghazzali, N., Boiteau, V., & Niknafs, A. (2012). NbClust package for determining the number of clusters in a dataset.
    [5] Fushing, H., Liu, S.-Y., Hsieh, Y.-C., & McCowan, B. (2018). From patterned response dependency to structured covariate dependency: Entropy based. categorical-pattern-matching. PloS one, 13(6), e0198253-e0198253. doi:10.1371/journal.pone.0198253
    [6] Fushing, H., & Wang, X. (2020). Coarse- and fine-scale geometric information content of Multiclass Classification and implied Data-driven Intelligence. Proceedings of Machine Learning and Data Mining in Pattern Recognition, Petra Perner (Ed.), 16th International Conference on Machine Learning and Data Mining, MLDM 2020.
    [7] Gopalakrishnan, M., Sridhar, V., & Krishnamurthy, H. (1995). Some applications of clustering in the design of neural networks. Pattern Recognition Letters, 16(1), 59-65. doi:https://doi.org/10.1016/0167-8655(94)00064-A
    [8] Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655-665. doi:https://doi.org/10.1016/j.eswa.2004.12.022
    [9] Kim, Y. S., & Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of credit scoring model. Expert Systems with. Applications, 26(4), 567-573. doi:https://doi.org/10.1016/j.eswa.2003.10.013
    [10] Kuo, R. J., Ho, L. M., & Hu, C. M. (2002). Integration of self-organizing feature map and K-means algorithm for market segmentation. Computers & Operations. Research, 29(11), 1475-1493. doi:https://doi.org/10.1016/S0305-0548(01)00043-0
    [11] Sung, A. H. (1998). Ranking importance of input parameters of neural networks. Expert Systems with Applications, 15(3), 405-411. doi:https://doi.org/10.1016/S0957-4174(98)00041-4

    QR CODE
    :::