跳到主要內容

簡易檢索 / 詳目顯示

研究生: 佘欣玲
SHE, XIN-LING
論文名稱: 整體學習應用於線上零售的回購預測
Ensemble learning for customer retention prediction in online retailing
指導教授: 莊皓鈞、周彥君
Chuang, Hao-Chun、Chou, Yen-Chun
口試委員: 許嘉霖
Hsu, Chia-Lin
學位類別: 碩士
Master
系所名稱: 商學院 - 資訊管理學系
Department of Management Information System
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 38
中文關鍵詞: 整體學習零售業回購預測
外文關鍵詞: Ensemble learning, Online retailers, Cutomer retention
DOI URL: http://doi.org/10.6814/NCCU201900463
相關次數: 點閱:108下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 回購於顧客關係管理中扮演重要角色,其中為了改善過度行銷與溝通成本過高的狀況,消費者回購的議題成為線上零售業者提升經營績效的關鍵。本研究針對回購議題,首先了解如何從交易、退貨或取消等紀錄建構會員的消費行為和特徵?其次,研究如何採用XGBoost與LightGBM兩種整體學習的演算法,應用於預測消費者回購的議題,並比較何者的預測效果較優?第三,透過整體學習結合貝氏網路,探討哪些消費行為會影響回購?最後,如何從業者角度評估模型之結果,以提供完整的分析顧客回購之方法?
    本研究相較於過去學者採用少量的特徵變數進行預測,本研究進行深入的特徵工程,總共建構167個變數,提供較完整的消費行為與特徵。另外,提供 XGBoost與LightGBM 兩種演算法的預測結果,且模型準確率最高可達90%,並將各模型進行深入探討與比較分析。更進一步地將整體學習結合貝氏網路,探討重要特徵與回購之關係,不僅協助業者了解哪些消費特徵會影響顧客的回購行為,透過模型的預測結果提供業者潛在的回購名單。對於模型預測的結果,提供業者成本效益之評估,協助業者以利潤為導向的決策依據,除了可以避免消費者對過度行銷反感,亦可降低業者與會員的溝通成本,讓業者可以了解顧客需求,並提升經營的績效。


    Customer retention plays an important role in customer relationship management. In order to reduce the cost of communicating with customers and avoid over-marketing, capturing customer retention has become the key to online retail operations. This research attempts to address the following issues pertaining to customer retention. First, how can online retailers construct customer behaviors and characteristics from records of transactions, returns, and cancellations? Secondly, how to use the cutting-edge ensemble learning algorithms - XGBoost and LightGBM - to predict customer retention? Which algorithm performs better? Third, how can we combine knowledge extracted from ensemble learning the Bayesian network, to establish causal diagrams of how customer characteristics drive customer retention? Finally, how to evaluate the results of predictive models from a business perspective and perform a cost-benefit analysis of customer retention analytics?
    Compared with the past research using much fewer feature to predict customer retention, this research presents a fairly comprehensive feature engineering that results in a total of 167 variables of customer characteristics. In addition, we show that both XGBoost and LightGBM algorithms achieve prediction accuracy up to 90%. Furthermore, this study integrates ensemble learning with the Bayesian network to explore the relationship between important features and customer retention. Doing so helps retailers understand which characteristics will affect customer retention, in addition to providing a potential repurchase list based on model predictions. Finally, this study conducts a cost-effectiveness analysis according to model predictions, with the aim of helping online retailers make profit-oriented decisions for digital marketing.

    第一章 緒論 1
    第一節 研究背景 1
    第二節 研究目的 2
    第二章 文獻探討 4
    第一節 零售業回購 4
    第二節 整體學習 5
    第三節 XGBoost 與LightGBM的介紹與比較 6
    第三章 資料處理與特徵工程 11
    第一節資料前處理 11
    第二節 特徵變數說明 13
    第三節 基本資料統計 15
    第四章 研究結果 19
    第一節回購預測 19
    第二節貝氏網路 26
    第三節利潤表現分析 31
    第五章 結論 34
    參考文獻 36

    Abirami, M., & Pattabiraman, V. (2016). Data mining approach for intelligent customer behavior analysis for a retail store. Paper presented at the Proceedings of the 3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC–16’) (pp. 283-291). Springer, Cham.
    Al-Tit, A. A. (2015). The effect of service and food quality on customer satisfaction and hence customer retention. Asian Social Science, 11(23), 129.
    Alpaydin, E. (2016). Machine learning: the new AI: MIT press.
    Amin, M., Rezaei, S., & Tavana, F. S. (2015). Gender differences and consumer’s repurchase intention: the impact of trust propensity, usefulness and ease of use for implication of innovative online retail. International Journal of Innovation and Learning, 17(2), 217-233.
    Aren, S., Güzel, M., Kabadayı, E., & Alpkan, L. (2013). Factors affecting repurchase intention to shop at the same website. Procedia-Social and Behavioral Sciences, 99, 536-544.
    Bijalwan, V., Kumar, V., Kumari, P., & Pascual, J. (2014). KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1), 61-70.
    Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
    Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
    Bzdok, D., Altman, N., & Krzywinski, M. (2018). Statistics versus machine learning. Nature methods, 15(4), 233.
    Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Paper presented at the Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. (pp. 785-794). ACM.
    Colubri, A., Silver, T., Fradet, T., Retzepi, K., Fry, B., & Sabeti, P. (2016). Transforming clinical data into actionable prognosis models: machine-learning framework and field-deployable app to predict outcome of Ebola patients. PLoS neglected tropical diseases, 10(3), e0004549.
    Dai, C., Zhang, H., Arens, E., & Lian, Z. (2017). Machine learning approaches to predict thermal demands using skin temperatures: Steady-state conditions. Building and Environment, 114, 1-10.
    Díaz, G. R. (2017). The influence of satisfaction on customer retention in mobile phone market. Journal of Retailing and Consumer Services, 36, 75-85.
    Fader, P. S., Hardie, B. G., & Lee, K. L. (2005). RFM and CLV: Using iso-value curves for customer base analysis. Journal of marketing research, 42(4), 415-430.
    Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. Paper presented at the icml. (Vol. 96, pp. 148-156).
    Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
    Gupta, S., & Kim, H. W. (2008). Linking structural equation modeling to Bayesian networks: Decision support for customer retention in virtual communities. European Journal of Operational Research, 190(3), 818-833.
    Hennig-Thurau, T., & Hansen, U. (2013). Relationship marketing: gaining competitive advantage through customer satisfaction and customer retention. Copenhagen, Denmark: Spieger.
    Ho, T. K. (1998). Nearest neighbors in random subspaces. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 640-648). Springer, Berlin, Heidelberg.
    Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye,Q.,Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Paper presented at the Advances in Neural Information Processing Systems.
    Kumar, V. (2010). Customer relationship management.Wiley international encyclopedia of marketing.
    Lo, A. S., Stalcup, L. D., & Lee, A. (2010). Customer relationship management for hotels in Hong Kong. International Journal of Contemporary Hospitality Management, 22(2), 139-159.
    Martínez, A., Schmuck, C., Pereverzyev Jr, S., Pirker, C., & Haltmeier, M. (2018). A machine learning framework for customer purchase prediction in the non-contractual setting. European Journal of Operational Research.
    Perveen, S., Shahbaz, M., Guergachi, A., & Keshavjee, K. (2016). Performance analysis of data mining classification techniques to predict diabetes. Procedia Computer Science, 82, 115-121.
    Renjith, S. (2015). An Integrated Framework to Recommend Personalized Retention Actions to Control B2C E-Commerce Customer Churn. arXiv preprint arXiv:1511.06975.
    Saleh, K., & Shukairy, A. (2010). Conversion optimization: The art and science of converting prospects to customers: " O'Reilly Media, Inc.".
    Soltani, Z., & Navimipour, N. J. (2016). Customer relationship management mechanisms: A systematic review of the state of the art literature and recommendations for future research. Computers in Human Behavior, 61, 667-688.
    Wen, C., Prybutok, V. R., & Xu, C. (2011). An integrated model for customer online repurchase intention. Journal of Computer Information Systems, 52(1), 14-23.
    Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241-259.
    Xiao, Q., Chang, H. H., Geng, G., & Liu, Y. (2018). An ensemble machine-learning model to predict historical PM2. 5 concentrations in China from satellite data. Environmental science & technology, 52(22), 13260-13269.
    Zhang, Y., Pang, L., Shi, L., & Wang, B. (2014). Large scale purchase prediction with historical user actions on B2C online retail platform. arXiv preprint arXiv:1408.6515.
    Zhu, Y., Xie, C., Wang, G.-J., & Yan, X.-G. (2017). Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Computing and Applications, 28(1), 41-50.

    QR CODE
    :::