跳到主要內容

簡易檢索 / 詳目顯示

研究生: 古政弘
Gu, Cheng-Hung
論文名稱: 機器學習方法於分類或預測問題之比較與應用
Machine Learning Methods in Classification or Prediction: Some Comparison and Applications
指導教授: 張育瑋
Chang, Yu-Wei
口試委員: 周珮婷
Chou, Pei-Ting
簡立欣
Chien, Li-Hsin
學位類別: 碩士
Master
系所名稱: 商學院 - 統計學系
Department of Statistics
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 60
中文關鍵詞: 分類迴歸樹貝氏可加性迴歸樹隨機森林
外文關鍵詞: Classification and Regression Tree, Bayesian Additive Regression Trees, Random Forest
相關次數: 點閱:47下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年新的機器學習方法相當蓬勃發展,根據其應變數為連續型或類別型,這
    些方法可以被應用於預測或分類問題中。本研究感興趣一些機器學習方法的預測或分類準確度為何,並且特別聚焦於可解釋性的機器學習,因為在實際資料分析中,應用者也常會感興趣自變數與應變數的關係之解釋。在此考慮七種機器學習方法或統計方法:分類迴歸樹(Classification and Regression Tree)、貝氏可加性迴歸樹(Bayesian additive regression trees)、隨機森林(random forest)、多變量適應性迴歸弧線(multivariate adaptive regression splines)、廣義相加模型(generalized additive model)、線性判別分析(linear discriminant analysis)及二次判別分析 (quadratic discriminant analysis),將這些方法分別應用至兩筆實際資料,對於資料的訓練集進行建模,比較各種方法在測試資料集之預測或分類效果。


    In recent years, a multitude of machine learning methods have been proposed.Depending on whether the response variable is continuous or ordinal categorical, these methods can be applied to prediction or classification problems. This study is interested in the predictive or classification accuracy of various machine learning methods, with a particular focus on interpretable machine learning. In practical data analysis, users often seek to understand the relationships between independent and dependent variables.We consider seven machine learning and statistical methods: Classification and Regression Tree, Bayesian Additive Regression Trees, Random Forest, Multivariate Adaptive Regression Splines, Generalized Additive Model, Linear Discriminant Analysis, and Quadratic Discriminant Analysis. We apply these methods to two real datasets. Subsequently, we compare the prediction and classification performance of the seven methods on the test sets.

    第一章 緒論 1

    第二章 模型介紹 3
    2.1 CART 3
    2.1.1 迴歸樹 3
    2.1.2 分類樹 4
    2.2 BART 模型 5
    2.2.1 先驗分配之設定 5
    2.2.2 後驗分配之統計推論 9
    2.3 隨機森林 9
    2.4 MARS 10
    2.5 GAM 11
    2.6 LDA & QDA 12
    2.7 建模使用之軟體套件 13

    第三章 類別型應變數之資料分析 14
    3.1 分類模型建模之評比指標 14
    3.2 ILPD 資料 (Indian Liver Patient Dataset) 15
    3.2.1 資料介紹與描述性統計 15
    3.2.2 ILPD 資料之建模與預測 17
    3.3 生育力資料 (Fertility Dataset) 19
    3.3.1 資料介紹與描述性統計 19
    3.3.2 生育力資料之建模與預測 20
    3.4 類別型應變數資料集總結 22

    第四章 連續型應變數之資料分析 24
    4.1 預測模型建模之評比指標 24
    4.2 Auto MPG 資料 24
    4.2.1 資料介紹與描述性統計 24
    4.2.2 Auto MPG 資料之建模與預測 26
    4.3 房地產估價(Real Estate Valuation)資料 27
    4.3.1 資料介紹與描述性統計 27
    4.3.2 房地產估價資料之建模與預測 29
    4.4 連續型應變數資料集總結 31

    第五章 結論 32

    參考文獻 33

    Breiman, L. (2001). Random Forests. Machine Learning, 45, 5-32.
    Chipman, H., George, E., & Mcculloch, R. (2010). BART:Bayesian Additive Regression Trees. Annals of Applied Statistics, 4, 266-98
    Hastie, T., & Tibshirani, R. (1987). Generalized Additive Models: Some Applications. Journal of the American Statistical Association, 82, 371–386.
    Friedman, J. H. (1991). Multivariate Adaptive Regression Splines. Annals of Applied Statistics, 19, 1-67.
    Kim, C., & Park, S. (2022). Comparison of Tree-Based Ensemble Models for Regression. Communications for Statistical Applications and Methods, 29, 561-589.
    Knežević, marinela., Has, A., & Zekic´ -sušac, M. (2021). Predicting EnergyCost of Public Buildings by Artificial Neural Networks, CART, and Random Forest. Neurocomputing, 439, 223-233.
    Barros, F., Carvalho, G. C., Costa, Y., & Martins, I. (2022). Sea-Level RiseEffects on Macrozoobenthos Distribution within an Estuarine Gradient Using Species
    Distribution Modeling. Ecological Informatics, 71, 101816.
    Hong, H., Naghibi, S.A., Moradi Dashtpagerdi, M. et al. (2017). A comparative between linear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and
    weights-of-evidence models for forest fire susceptibility mapping in China. Arab J Geosci 10, 167.
    VE, S. & Cho, Y. (2020). Season wise bike sharing demand analysis using random forest algorithm. Computational Intelligence, 40.
    Du, J., Liu, J. S, & Krakovna, V. (2015). Selective Bayesian Forest Classifier Simultaneous Variable Selection and Classification. Arxiv.
    Martín, B., González–Arias, J., & Vicente–Vírseda, J. A. (2021). Machine learning as
    a successful approach for predicting complex spatio–temporal patterns in animal species abundance. Animal Biodiversity and Conservation, 44.2, 289-301.
    Gunnarsson, B. R., vanden Broucke, S., Baesens, B., Óskarsdóttir, M., & Lemahieu,W. (2021). Deep learning for credit scoring : do or don’t? EUROPEAN JOURNAL
    OF OPERATIONAL RESEARCH, 295, 292-305.
    Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. I. (1984). Classification and regression trees. Belmont, Calif.:Wadsworth.
    Chipman, H. A., George, E. I., & Mcculloch, R. E. (1998). Bayesian CART Model Search. Journal of the American Statistical Association, 93, 935- 948.
    Bleich, J., & Kapelner, A. (2014, November 24). BartMachine: Machine Learning with
    Bayesian Additive Regression Trees. Arxiv.
    Mcculloch, R., Spanbauer, C., & Sparapani, R. (2021). Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: TheBART R Package. Journal of Statistical Software, 97, 1–66.
    Urbanek, S. (2024, January 26). RJava: Low-Level R to Java Interface.
    Straw I, Wu H. Investigating for bias in healthcare algorithms: a sex-stratified
    analysis of supervised machine learning models in liver disease prediction. BMJ Health CareInform 2022;29:e100457.
    Prasad babu, M. S., Ramana, B. V., & Venkateswarlu, N. B. (2012). A Critical Comparative Study of Liver Patients from USA and INDIA: An Exploratory Analysis. International Journal of Computer Science Issues, 9, 101-114.
    Ramana, Bendi., & Venkateswarlu, N. (2012). ILPD (Indian Liver PatientDataset). UCI Machine Learning Repository.
    Quinlan, R. (1993). Auto MPG. UCI Machine Learning Repository.

    無法下載圖示 全文公開日期 2029/07/21
    QR CODE
    :::