| 研究生: |
吳志龍 Wu, Zhi-Long |
|---|---|
| 論文名稱: |
基於隨機森林模型下P2P網路借貸違約預測 The prediction of default in P2P Lending based on Random Forest Model |
| 指導教授: |
廖四郎
Liao, Szu-Lang |
| 口試委員: |
廖四郎
Liao, Szu-Lang 張興華 Chang, Hsing-Hua 黃星華 Huang, Hsing-Hua 王昭文 Wang, Chou-Wen 張惠龍 Chang, Hui-Lung |
| 學位類別: |
碩士
Master |
| 系所名稱: |
商學院 - 金融學系 Department of Money and Banking |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 41 |
| 中文關鍵詞: | P2P借貸 、隨機森林模型 、Logistic回歸模型 、個人信用風險評估 |
| 外文關鍵詞: | P2P Lending, Random Forest, Logistic regreesion, Private credit risk evaluation |
| DOI URL: | http://doi.org/10.6814/NCCU201900322 |
| 相關次數: | 點閱:123 下載:9 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究使用傳統的Logistic回歸模型與機器學習的隨機森林模型對P2P借貸的個人信用風險進行評估預測。本研究的數據來源於LendingClub的2018年度公開數據資料,先對P2P借貸的個人信用風險因素進行挑選,再使用挑選出的變量對Logistic回歸模型與隨機森林模型進行訓練,並用測試集檢驗兩個模型對個人信用風險的預測能力。結果表明,隨機森林模型的在決策樹為800棵,每棵決策樹的特征值為3個的時候,隨機森林模型預測準確率最高。與Logistic回歸模型比較,隨機森林模型有著更高的精度。本研究還對兩個模型進行了模型性能的比較,結果表明隨機森林的模型性能好過去Logistic回歸模型。
In this paper we evaluate and predict the private credit risk in P2P lending by using traditional Logistic regression and Random Forest in machine learning. For the open data from LendingClub in 2018, we select the private credit risk factors of P2P lending first, and train the Logistic regression and Random Forest model with selected variables. We test the prediction ability of two models with test set. The result shows that when there are 800 decision trees and 3 features for each tree in Random Forest model, accuracy of the model reaches best. Compared with Logistic regression, Random Forest has higher precision. We also compare the performance of two models, which shows that Random Forest is better than Logistic regression.
第一章 緒論 1
第一節 研究動機 1
第二節 研究目的 3
第三節 研究架構 4
第二章 文獻探討 5
第一節 P2P介紹 5
第二節 個人信用評估因素 6
第三節 Logistic回歸模型違約預測 7
第四節 決策樹與隨機森林模型 8
第三章 研究方法 10
第一節 隨機森林模型 10
1.決策樹算法介紹 10
2.特徵選擇 11
3.CART演算法 11
4.決策樹的剪枝 12
5.決策樹的缺點 13
6.集成學習 13
7.隨機森林模型 15
第二節 Logistic回歸模型 17
第四章 實證結果 19
第一節 資料來源 19
第二節 變數選取 19
第三節 資料處理 21
1.缺失值得處理 21
2.量化處理 21
3.相關性檢驗 24
4.資料劃分 26
第四節 混淆矩陣 26
第五節 隨機森林模型結果 27
1.決策樹個數的選擇 27
2.特徵數量的選擇 28
3.隨機森林模型預測結果 29
第六節 Logistic回歸模型預測結果 30
第七節 不平衡問題處理與結果比較 31
第八節 模型性能檢測 33
第五章 研究結果與未來展望 37
第一節 研究結果 37
第二節 未來展望 38
參考文獻 39
Bekhet,H.A.,Eletter,S.F.K.(2014).Credit risk assessment model for Jordanian commercial banks: Neural scoring approach. Review of Development Finance, Vol.4,20-28.
Berkson,J.(1944). Application of the Logistic Function to Bio-Assay. Journal of the American Statistical Association , Vol.39, 357-365.
Breiman,L.(2001). Random forest. Machine learning, 45(1), 5-32
Bruett,T.(2007).Cows, Kiva, and Prosper. Com: how disintermediation and the Inter net are changing microfinance. Community Development Investment Review, 2, 44-50.
Caemichael,D.(2014). Modeling default for peer-to-peer loans . Working Paper.
Coffman,J.Y.(1986). The proper role of tree analysis in forecasting the risk behavior of borrowers. Management Decision System, No.3, 47-59.
Dasarathy,B.V. Sheela,B.V.(1979). A composite classifier system design: Concepts and methodology. Proceedings of the IEEE, 67(5),708-713.
Emekter,R.Tu,Y.Jirasakuldech,B&Lu,M.(2015).Evaluating credit risk and loan performance in online Peer-to-Peer(P2P) lending. Applied Economics, 47(1), 54-70
Everett,C.R.(2015).Group membership, relationship banking and loan default risk : the case of online social lending. Banking and Fiance Review,7(2),14-54.
Freedman,S.Jin,G.Z.(2008).Dynamic Learning and Selection: the Early Years. University of Maryland.
Garman,S.R.Hampshire,R.C.Krishnan,R.(2008).Person-to-Person Lending: The Pursuit of (More) Competitive Credit Markets. E-Life: Web-Enabled Convergence of Commerce, Vol.14,54-58
Herzenstein,M.Andrews,R.L. Dholakia,U.M.Lyandres,E.(2008). The democratization of personal consumer loans Determinants of success in online peer-to-peer lending communities. Boston University School of Management Research Paper.
Jin,G.Z.Freedman,S.(2008). Do Social Networks Solve Information Problem for Peer-to-Peer Lending? Evidence from Prosper.Com.NET Institute Working Paper, 8-43
Klafft,M.(2008).Peer to peer lending: auctioning microcredits over the internet. Working Paper.
Maddala,G.S.(1983).Limited-dependent and qualitative variables in econometrics. Cambridge University Press.
Magee,J.R.(2011).Peer-to-peer lending in the United States: surviving after Dodd-Frank. North Carolina Banking Institute Journal, Vol.15, 139-174
Makowski,P.(1985). Credit scoring branches out. Credit World,Vol.75, 30-37.
Mateescu,A.(2015).Peer-to-Peer Lending. Data&Society,1-23.
Iyer,R.Khwaja,A.I.Luttmer,E.F.P.&Shue,K.(2009). Screening in new credit market: Can individual lenders infer borrower creditworthiness in peer-to-peer lending?. Working Paper
Ohlson,J.A.(1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of accounting research, Vol.18,No.1,109-131.
Pope, Sydnor,2011.What’s in a picture, vidence of discrimination from Prosper.com. Journal of Human Resource,Vol.46,No.1, 53-92
Serrano-cinca,C.Gutierrez-Nieto,B.&Lopez-Palacios,L.(2015).Determinants of default in P2P lending. PloS one, 10(10),e0139427
Stiglitz,J.E.Weiss,A.(1981). Credit rationing in markets with imperfect information. The American economic review, 71(3), 393-410
王磊、范超、解明明(2014)。數據挖掘模型在小企業主信用評分領域的應用,統計研究,31卷,第10期,89-98。
張培強(2011)。信用卡客戶的分類研究,生產力研究,第4期, 87-88。
張萬軍(2015)。基於大數據的個人信用風險評估模型研究,對外經濟貿易大學。
彭康(2018)。基於P2P網絡借貸的個人信用風險評估,暨南大學。
蘇杭西子(2018)。基於隨機森林模型的個人信用風險評估研究,湖南大學。
劉暢(2015)。基於Logistic的P2P網絡借貸信用風險測度研究,安徽財經大學。