跳到主要內容

簡易檢索 / 詳目顯示

研究生: 唐思琪
Tang, Szu-Chi
論文名稱: 基於增強學習的直播電商推薦系統
Reinforcement learning based live streaming e-commerce recommender system
指導教授: 林怡伶
Ling, Yi-Lin
口試委員: 蕭舜文
Hsiao, Shun-Wen
學位類別: 碩士
Master
系所名稱: 商學院 - 資訊管理學系
Department of Management Information System
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 66
中文關鍵詞: 直播電商推薦系統強化學習探索與利用之權衡神經網路
外文關鍵詞: User context, Exploitation-exploration trade-off, Gated Recurrent Unit, Variational Autoencoder, Bayesian neural networks
DOI URL: http://doi.org/10.6814/NCCU202201098
相關次數: 點閱:128下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,直播電商逐漸受到重視。不同於傳統的電商和單向推播的電視購物,直播電商更加強調即時互動性。由於開設直播的成本低,直播主發起直播的頻率很高、商品也是不斷推成出新,這些都促成了複雜且快速變動的環境,而推薦系統能夠幫助消費者在資訊爆炸的情況下快速做出決定。過往的推薦系統研究注重於準確率的最佳化,不只引發了同溫層效應,更因為總是推薦類似的商品,長期下來導致消費者的不滿意以及流失。為了在精準推薦與探索新喜好的取捨中獲得較好的平衡,我們將此議題看作是一個具備使用者情境的多臂吃角子老虎機問題。此研究在直播電商這種新的商業情境下,提出一個基於強化學習的推薦系統。它能夠通過靜態的顧客特徵以及具時序性的顧客特徵,找出顧客、直播主以及商品之間的關係。我們使用了一種循環神經網路——門基循環單元,來找出顧客隨時間變化的喜好。我們的直播電商推薦系統能夠藉由變分自動變碼器來模糊化顧客的特徵,並在推薦商品的過程中利用貝葉斯神經網路來引入不確定性,來達成控制探索顧客喜好與利用的平衡。據我們所知,我們是第一個提出以基於神經網路的上下文吃角子老虎機演算法,來解決直播電商平台環境下推薦問題的研究。我們比較了經典的多臂吃餃子老虎機演算法,並透過真實世界資料的實驗來初步驗證了我們的理論,並且展示了其在商業實務問題中的潛在應用。


    In recent years, live stream e-commerce shopping has received extensive attention from e-commerce businesses and streaming platforms. Different from traditional TV shopping and online shopping, the emerging products roll out continuously on the live stream shopping platform where users and streamers interact and synchronize in real-time. Such a dynamic environment forms a complex user context. The recommender system plays a crucial role in assisting users in information-seeking tasks and decision-making from information overload. Previous recommender systems mainly focus on optimizing accuracy, which results in filter bubbles problem and high churn rates in the long run. To balance exploration and exploitation (EE) trade-off under a dynamic and fast-changing recommendation context, the research formulates the problem as a contextual bandit problem. This study provides a reinforcement learning (RL)-based solution for a new business scenario (i.e., live stream e-commerce) which addresses three relationships between customers, streamers, and products in both static and temporal user contexts. We use Gated Recurrent Unit (GRU) to model the context changes in users' preferences in streamers and products while maintaining their long-term engagement. By encoded uncertainty in neural networks with Variational Autoencoder (VAE) for user modeling and Bayesian Neural Network (BNN) for a product recommendation, the proposed Live E-commerce Recommender System (LERS) can control the balance of EE trade-off. To the best of our knowledge, our study is the first neural network-based contextual bandit algorithm dealing with the recommendation problem in the live streaming e-commerce platforms. We compared our algorithm with classic multi-armed bandit algorithms including UCB1, LinUCB, Exp3, and NeuralUCB. Preliminary experiment results on real-world data corroborate our theory and shed light on potential applications of our algorithm to real-world business problems.

    Acknowledgements i
    摘要 ii
    Abstract iii
    Contents v
    List of Figures viii
    List of Tables x
    1 Introduction 1
    2 RelatedWork 4
    2.1 Live Streaming E-commerce 4
    2.2 Recommender Systems 5
    2.3 Live Streaming Recommender System 6
    2.4 Contextual Multi-armed Bandit Methods 8
    2.5 Uncertainty Modeling 10
    3 The Proposed Framework 12
    3.1 Problem Definition 12
    3.2 Framework Overview 13
    3.3 Gated Recurrent Unit Networks in Temporal Context Model 16
    3.4 Variational Autoencoder for Blurry Context 18
    3.5 Bayesian Neural Networks for Exploring Product Recommendation 20
    3.6 Training Procedure 21
    4 Experiments 25
    4.1 Datasets 25
    4.2 Implementation Environment 25
    4.3 Customer Context Features 26
    4.3.1 Static Context Features 26
    4.3.2 Customer-Product Context Features 26
    4.3.3 Customer-Streamer Context Features 27
    4.4 Temporal Context Modeling 28
    4.4.1 RNN-based Models for Temporal Context 28
    4.4.2 Identify the Appropriate Sequence Length of Temporal Context 29
    4.5 Full Context Analysis 33
    4.6 Dimension Reduction Analysis 35
    4.7 Production Recommendation Analysis 36
    4.7.1 Evaluation Metrics 36
    4.7.2 Experiment Dataset 38
    4.7.3 Recommendation Context for Product Recommendation 40
    4.7.4 Temporal Context for Product Recommendation 42
    4.7.5 End-to-End Live E-commerce Recommender System 44
    4.8 Algorithm Comparison Experiments 45
    4.8.1 Experiments Settings 46
    4.8.2 Normal Dataset 47
    4.8.3 Active Dataset 50
    4.8.4 Repeat Dataset 51
    5 Discussion 54
    5.1 Offline Environment 54
    5.2 Feature Enrichment 54
    5.3 Context Engineering 55
    5.4 Neural Network 55
    6 Conclusion 57
    References 59

    Allesiardo, R., Féraud, R., & Bouneffouf, D. (2014). A neural networks committee for the contextual bandit problem. In Processings of the international conference on neural information processing (Vol. 8834, pp. 374–381). doi: 10.1007/978-3-319 -12637-1_47
    Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Ma- chine Learning Research, 3(Nov), 397–422.
    Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the mul- tiarmed bandit problem. Machine learning, 47(2), 235–256. doi: 10.1023/A: 1013689704352
    Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1), 48–77. doi: 10 .1137/S0097539701398375
    Blundell, C., Cornebise, J., Kavukcuoglu, K., & Wierstra, D. (2015). Weight uncertainty in neural networks. In Proceedings of the 32nd international conference on inter- national conference on machine learning (Vol. 37, pp. 1613–1622).
    Bouneffouf, D., Bouzeghoub, A., & Gançarski, A. L. (2012). A contextual-bandit algo- rithm for mobile context-aware recommender system. In International conference on neural information processing (pp. 324–331).
    Burtini, G., Loeppky, J., & Lawrence, R. (2015). A survey of online experiment design with the stochastic multi-armed bandit. Retrieved from https://arxiv.org/abs/1510.00757
    Cai, J., Wohn, D. Y., Mittal, A., & Sureshbabu, D. (2018). Utilitarian and hedonic moti- vations for live streaming shopping. In Proceedings of the 2018 acm international conference on interactive experiences for tv and online video (p. 81–88). doi: 10.1145/3210825.3210837
    Cheng, Z., & Shen, J. (2016, April). On effective location-aware music recommen- dation. ACM Transactions on Information Systems (TOIS), 34(2), 1–32. doi: 10.1145/2846092
    Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. Retrieved from https://arxiv.org/abs/1406.1078
    Choe, D.-E., Kim, H.-C., & Kim, M.-H. (2021). Sequence-based modeling of deep learn- ing with lstm and gru networks for structural damage detection of floating offshore wind turbine blades. Renewable Energy, 174, 218–235.
    Chu, W., Li, L., Reyzin, L., & Schapire, R. (2011). Contextual bandits with linear payoff functions. In Proceedings of the 14th international conference on artificial intelli- gence and statistics (pp. 208–214).
    Docherty, I. (2018). New governance challenges in the era of ‘smart’mobility. In Governance of the smart mobility transition.
    Du, C., Gao, Z., Yuan, S., Gao, L., Li, Z., Zeng, Y., ... Lee, K.-C. (2021). Exploration in online advertising systems with deep uncertainty-aware learning. In Proceedings of the 27th acm sigkdd conference on knowledge discovery & data mining (pp. 2792– 2801).
    Fang, H., Zhang, D., Shu, Y., & Guo, G. (2020). Deep learning for sequential recom- mendation: Algorithms, influential factors, and evaluations. ACM Transactions on Information Systems (TOIS), 39(1), 1–42.
    Gal, Y., & Ghahramani, Z. (2016). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (pp. 1050–1059).
    Gawlikowski, J., Tassi, C. R. N., Ali, M., Lee, J., Humt, M., Feng, J., ... others (2021). A survey of uncertainty in deep neural networks. Retrieved from https://arxiv.org/abs/2107.03342
    Gediminas, A., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE transactions on knowledge and data engineering, 17(6), 734-749. doi: https://doi.org/10.1109/ TKDE.2005.99
    Gulrajani, I., Kumar, K., Ahmed, F., Taiga, A. A., Visin, F., Vazquez, D., & Courville, A. (2016). Pixelvae: A latent variable model for natural images. Retrieved from https://arxiv.org/abs/1611.05013
    He, X., Chen, T., Kan, M.-Y., & Chen, X. (2015). Trirank: Review-aware explainable recommendation by modeling aspects. In Proceedings of the 24th acm international on conference on information and knowledge management (p. 1661–1670). doi: 10.1145/2806416.2806504
    Hu, M., & Chaudhry, S. S. (2020). Enhancing consumer engagement in e-commerce live streaming via relational bonds. Internet Research, 30(3). doi: 10.1108/INTR-03 -2019-0082
    Kakade, S. M., Shalev-Shwartz, S., & Tewari, A. (2008). Efficient bandit algorithms for online multiclass prediction. In Proceedings of the 25th international conference on machine learning (pp. 440–447). doi: 10.1145/1390156.1390212
    Katehakis, M. N., & Veinott Jr, A. F. (1987). The multi-armed bandit problem: Decom- position and computation. Mathematics of Operations Research, 12(2), 262–268. doi: 10.1287/moor.12.2.262
    Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. Retrieved from https://arxiv.org/abs/1312.6114
    Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5), 604–632.
    Ko, H.-C., & Chen, Z.-Y. (2020). Exploring the factors driving live streaming shopping intention: a perspective of parasocial interaction. In Proceedings of the 2020 inter- national conference on management of e-commerce and e-government (pp. 36–40).
    Langford, J., & Zhang, T. (2007). The Epoch-Greedy algorithm for contextual multi- armed bandits. In Proceedings of the 20th international conference on neural in- formation processing systems (p. 817–824).
    Lauret, P., Fock, E., Randrianarivony, R. N., & Manicom-Ramsamy, J.-F. (2008). Bayesian neural network approach to short time load forecasting. Energy conver- sion and management, 49(5), 1156–1166.
    Lee, H. I., Choi, I. Y., Moon, H. S., & Kim, J. K. (2020). A multi-period product recom- mender system in online food market based on recurrent neural networks. Sustain- ability, 12(3), 969.
    Li, J., Ren, P., Chen, Z., Ren, Z., Lian, T., & Ma, J. (2017). Neural attentive session-based recommendation. In (pp. 1419–1428). doi: 10.1145/3132847
    Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on world wide web (pp. 661–670). doi: 10.1145/1772690.1772758
    Li, S., Karatzoglou, A., & Gentile, C. (2016). Collaborative filtering bandits. In Proceed- ings of the 39th international acm sigir conference on research and development in information retrieval (pp. 539–548).
    Lin, C.-Y., & Chen, H.-S. (2019). Personalized channel recommendation on live streaming platforms. Multimedia Tools and Applications, 78(2), 1999–2015.
    Liu, Y. W., Lin, C. Y., & Huang, J. L. (2015). Live streaming channel recommendation using hits algorithm. In 2015 ieee international conference on consumer electronics taiwan (pp. 118–119).
    Martinez-Cantin, R., De Freitas, N., Brochu, E., Castellanos, J., & Doucet, A. (2009). A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonomous Robots, 27(2), 93–103. doi: 10.1007/s10514-009-9130-2
    Mullachery, V., Khera, A., & Husain, A. (2018). Bayesian neural networks. Retrieved from https://arxiv.org/abs/1801.07710
    Pradel, B., Sean, S., Delporte, J., Guérif, S., Rouveirol, C., Usunier, N., ... France, O. (2011). A case study in a recommender system based on purchase data. In Proceed- ings of the 17th acm sigkdd international conference on knowledge discovery and data mining - kdd ’11 (pp. 377–385). doi: 10.1145/2020408
    Rappaz, J., McAuley, J., & Aberer, K. (2021). Recommendation on live-streaming plat- forms: Dynamic availability and repeat consumption. In Fifteenth acm conference on recommender systems (pp. 390–399).
    Reinartz, W. J., & Kumar, V. (2003). The impact of customer relationship characteristics on profitable lifetime duration. Journal of marketing, 67(1), 77–99.
    Santana, L. L. B. d. S., Souza, A. B. S., Santana, D. L., Dourado, W. A., & Durão, F. A. (2017). Evaluating ensemble strategies for recommender systems under metadata reduction. In Proceedings of the 23rd brazillian symposium on multimedia and the web (pp. 125–132). doi: 10.1145/3126858.3126879
    Satyal, S., Weber, I., Paik, H.-y., Di Ciccio, C., & Mendling, J. (2018). AB testing for process versions with contextual multi-armed bandit algorithms. In Proceedings of the international conference on advanced information systems engineering (pp. 19–34). doi: 10.1007/978-3-319-91563-0_2
    Shahrampour, S., Rakhlin, A., & Jadbabaie, A. (2017). Multi-armed bandits in multi-agent networks. In Proceedings of the 2017 ieee international conference on acous- tics, speech and signal processing (p. 2786-2790). doi: 10.1109/ICASSP.2017.7952664
    Shani, G., & Gunawardana, A. (2011). Evaluating recommendation systems. In F. Ricci, L. Rokach, B. Shapira, & P. B. Kantor (Eds.), Recommender systems handbook (pp. 257–297). doi: 10.1007/978-0-387-85820-3_8
    Su, X. (2019, dec). An empirical study on the influencing factors of e-commerce live streaming. In 2019 international conference on economic management and model engineering, icemme 2019 (pp. 492–496). doi: 10.1109/ICEMME49371 .2019.00103
    Sun, Y., Shao, X., Li, X., Guo, Y., & Nie, K. (2019). How live streaming influences purchase intentions in social commerce: An it affordance perspective. Electronic Commerce Research and Applications, 37, 100886. doi: https://doi.org/10.1016/ j.elerap.2019.100886
    Takahashi, R., & Zhang, S. (2017). Towards bursting filter bubble via contextual risks and uncertainties. Retrieved from https://arxiv.org/abs/1706.09985
    Truong, Q.-T., Salah, A., & Lauw, H. W. (2021). Bilateral variational autoencoder for collaborative filtering. In Proceedings of the 14th acm international conference on web search and data mining (pp. 292–300).
    Vanchinathan, H. P., Nikolic, I., De Bona, F., & Krause, A. (2014). Explore-exploit in top-n recommender systems via gaussian processes. In Proceedings of the 8th acm conference on recommender systems (pp. 225–232).
    Vuyyuru, V. A., Rao, G. A., & Murthy, Y. (2021). A novel weather prediction model using a hybrid mechanism based on mlp and vae with fire-fly optimization algorithm. Evolutionary Intelligence, 14(2), 1173–1185.
    Wang, H., Wu, Q., & Wang, H. (2016). Learning hidden features for contextual bandits. In Proceedings of the 25th acm international on conference on information and knowledge management (pp. 1633–1642).
    Wang, Z., Lee, S.-J., & Lee, K.-R. (2018). Factors influencing product purchase intentionin taobao live streaming shopping. Journal of Digital Contents Society, 19(4), 649–659.
    Wikipedia. (2022). Livestream shopping — Wikipedia, the free encyclopedia. Retrieved from http://en.wikipedia.org/w/index.php?title=Livestream\
    %20shopping&oldid=1065424656
    Wongkitrungrueng, A., & Assarut, N. (2020). The role of live streaming in building consumer trust and engagement with social commerce sellers. Journal of Business Research, 117, 543-556. doi: https://doi.org/10.1016/j.jbusres.2018.08.032
    Wongkitrungrueng, A., Dehouche, N., & Assarut, N. (2020). Live streaming commerce from the sellers’perspective: implications for online relationship marketing. Jour- nal of Marketing Management, 36(5-6), 488–518.
    Xu, X., Wu, J.-H., & Li, Q. (2020). What drives consumer shopping behavior in live streaming commerce? Journal of Electronic Commerce Research, 21(3), 144–167.
    Xue, F., He, X., Wang, X., Xu, J., Liu, K., & Hong, R. (2019, April). Deep item-based col- laborative filtering for top-N recommendation. ACM Transactions on Information Systems (TOIS), 37(3). doi: 10.1145/3314578
    Yang, T.-W., Shih, W.-Y., Huang, J.-L., Ting, W.-C., & Liu, P.-C. (2013). A hybrid preference-aware recommendation algorithm for live streaming channels. In 2013 conference on technologies and applications of artificial intelligence (pp. 188– 193).
    Zhang, S., Liu, H., He, J., Han, S., & Du, X. (2021). Deep sequential model for anchor recommendation on live streaming platforms. Big Data Mining and Analytics, 4(3), 173–182.
    Zhang, S., Yao, L., Sun, A., & Tay, Y. (2019). Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR), 52(1), 1–38. doi: 10.1145/3285029
    Zhang, X., Xie, H., Li, H., & CS Lui, J. (2020). Conversational contextual bandit: Algorithm and application. In Proceedings of the web conference 2020 (pp. 662–672). Zhou, D., Li, L., & Gu, Q. (2020). Neural contextual bandits with UCB-based exploration. In Proceedings of the 37th international conference on machine learning (Vol. 119, pp. 11492–11502).
    Zhou, M., Huang, J., Wu, K., Huang, X., Kong, N., & Campy, K. S. (2021, nov). Characterizing Chinese consumers’ intention to use live e-commerce shopping. Technology in Society, 67, 101767. doi: 10.1016/J.TECHSOC.2021.101767
    Zou, L., Xia, L., Ding, Z., Song, J., Liu, W., & Yin, D. (2019). Reinforcement learning to optimize long-term user engagement in recommender systems. In Proceedings of the 25th acm sigkdd international conference on knowledge discovery & data mining (pp. 2810–2818).

    無法下載圖示 全文公開日期 2027/07/26
    QR CODE
    :::