| 研究生: |
洪浚皓 Hung, Chun-Hao |
|---|---|
| 論文名稱: |
統計分析與資料視覺化在電影利潤預測上之研究 Applications of Statistical Analysis and Data Visualization to MovieLens Data for Profit Prediction |
| 指導教授: |
張源俊
Chang, Yuan-Chin |
| 口試委員: |
張源俊
Chang, Yuan-Chin 鄭宗記 Cheng, Tsung-Chi 陳瑞彬 Chen, Ray-Bing |
| 學位類別: |
碩士
Master |
| 系所名稱: |
商學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2020 |
| 畢業學年度: | 108 |
| 語文別: | 英文 |
| 論文頁數: | 58 |
| 中文關鍵詞: | MovieLens 資料集 、機器學習 、推薦系統 、探索性資料分析 |
| 外文關鍵詞: | MovieLens Dataset, Machine Learning, Recommendation System, Exploratory Data Analysis |
| DOI URL: | http://doi.org/10.6814/NCCU202001674 |
| 相關次數: | 點閱:278 下載:39 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著電影成為重要的娛樂文化,在今日,電影產業已經成長得相當龐大以及難以預測。自從電影在1927年,聲音以及影像能被同步,到小鹿斑比(1942)以動畫電影在第二次世界大戰期間取得巨大的成功。在往後的70年間,隨著科技的進步以及拍攝手法的發展,電影產業成長的速度極為快速,今日,一部電影需要經過極大的努力以及許多的手續,才能被大眾觀賞。因此若我們能精準的預測一部作品的利潤,則能更好的說服製片公司能投資龐大的金錢以製作出好電影。在本篇論文,我們會透過資料探索以及資料視覺化探討電影類別的趨勢,然後提出一個方法,在投入那些巨大努力之前,來預測電影利潤。除利預測利潤這個主要目標之外,我們還會基於一個部落格文章的想法做修改,提出一個建造推薦系統的方法。
Watching films or motion pictures is an important entertainment culture such that the film industry becomes more complex and unpredictable nowadays. After sucessfully syncroning sound and frames of film in 1927[10], Bambi (1942) had a huge progress in making an animation film during World War II. Since then, as the advancement of technology and the development of filming techniques, the movie industry has grown rapidly and vastly in the following 70 years. Now, to play a piece of work to audiences, we have to go through a lot of processes with all kinds of efforts. Thus, to have better prediction of the possible profit of our work, then it may encourage the production companies to invest in such movies. In this thesis, we discuss the trend of genre and other information via exploration data, and data visualization, and then propose a prediction method for the potential profit of movies before investing more resources. Besides this main goal – predicting movie profits, we also discuss how to have a novel recommendation system via modifying the ideas of the blog post as potential future studies.
1 Introduction 4
2 Introduction of MovieLens dataset 6
2.1 MovieLens 20M Dataset 6
2.2 The Calibrated Data 7
3 EDA on Rating Data 10
4 Recommendation System 16
5 Trend of Genres 23
5.1 Genre Trend 24
5.2 Genre Similarity Matrix 26
6 Tag Analysis 28
7 Predict Movie Profits 31
7.1 Scraping Dataset 32
7.2 EDA and Data Cleaning 34
7.3 Building Model and Prediction 48
8 Conclusion and Future Studies 53
9 Reference 56
[1] James Baglama and Lothar Reichel. “Augmented implicitly restarted Lanczos bidiagonalization methods”. In: SIAM Journal on Scientific Computing 27.1 (2005), pp. 19–42.
[2] Posts on Data Science Diarist. Building a Recommendation System with Beer Data. https : / / www . r - bloggers . com / building - a - recommendation-system-with-beer-data/. Accessed: 2020-05-20.
[3] Timothy A Davis and Yifan Hu. “The University of Florida sparse matrix collection”. In: ACM Transactions on Mathematical Software (TOMS) 38.1 (2011), pp. 1–25.
[4] IMDb. Året gjennom Børfjord (1991). https : / / www . imdb . com / title/tt0103301/. Accessed: 2020-05-20.
[5] IMDb. Babylon 5. https : / / www . imdb . com / title / tt0105946/. Accessed: 2020-05-20.
[6] IMDb. Bicicleta, cullera, poma (2010). https : / / www . imdb . com / title/tt1710542/. Accessed: 2020-05-20.
[7] IMDb. Brazil: In the Shadow of the Stadiums. https://www.imdb. com/title/tt3778744/. Accessed: 2020-05-20.
[8] IMDb. Cialo (original title). https://www.imdb.com/title/tt4358230/. Accessed: 2020-05-20.
[9] IMDb. Das Millionenspiel (1970). https://www.imdb.com/title/ tt0066079/. Accessed: 2020-05-20.
[10] IMDb. Don Juan Trivia. https://www.imdb.com/title/tt0016804/ trivia. Accessed: 2020-06-16.
[11] IMDb. Im Schmerz geboren. https://www.imdb.com/title/tt3096440/. Accessed: 2020-05-20.
[12] IMDb. In Our Garden (2002). https : / / www . imdb . com / title / tt0495225/. Accessed: 2020-05-20.
[13] IMDb. Michael Laudrup - en fodboldspiller (1993). https : / / www . imdb.com/title/tt0378357/. Accessed: 2020-05-20.
[14] IMDb. Moving Alan (2003). https://www.imdb.com/title/tt0310741/. Accessed: 2020-05-20.
[15] IMDb. My Own Man (2014). https : / / www . imdb . com / title / tt3356434/. Accessed: 2020-05-20.
[16] IMDb. National Theatre Live: Frankenstein (2011). https://www. imdb.com/title/tt1795369/. Accessed: 2020-05-20.
[17] IMDb. P’tit Quinquin. https://www.imdb.com/title/tt3053694/. Accessed: 2020-05-20.
[18] IMDb. Polskie gówno (2014). https : / / www . imdb . com / title / tt4438688/. Accessed: 2020-05-20.
[19] IMDb. Slaying the Badger. https://www.imdb.com/title/tt3793686/. Accessed: 2020-05-20.
[20] IMDb. Star Trek Beyond (original title). https://www.imdb.com/ title/tt2660888/. Accessed: 2020-05-20.
[21] IMDb. Star Trek IV: The Voyage Home (original title). https://www. imdb.com/title/tt0092007/. Accessed: 2020-05-20.
[22] IMDb. Stephen Fry in America. https://www.imdb.com/title/ tt1307789/. Accessed: 2020-05-20.
[23] IMDb. The Court-Martial of Jackie Robinson (1990). https://www. imdb.com/title/tt0099311/. Accessed: 2020-05-20.
[24] IMDb. The Dark Knight Trivia. https://www.imdb.com/title/ tt0468569/trivia. Accessed: 2020-06-30.
[25] IMDb. Third Reich: The Rise Fall. https://www.imdb.com/title/ tt1855924/. Accessed: 2020-05-20.
[26] IMDb. Two: The Story of Roman Nyro (2013). https://www.imdb. com/title/tt2740874/. Accessed: 2020-05-20.
[27] Guolin Ke et al. “Lightgbm: A highly efficient gradient boosting decision tree”. In: Advances in neural information processing systems. 2017, pp. 3146–3154.
[28] Sven Kosub. “A note on the triangle inequality for the Jaccard distance”. In: Pattern Recognition Letters 120 (2019), pp. 36–38.
[29] MovieLens. Star Trek Beyond. https://movielens.org/movies/ 135569. Accessed: 2020-05-20.
[30] MovieLens. Star Trek IV: The Voyage Home. https://movielens. org/movies/1376. Accessed: 2020-05-20.
[31] Scott L Phillips. Beyond sound: the college and career guide in music technology. Oxford University Press on Demand, 2013.
[32] Wikipedia. MovieLens. https://en.wikipedia.org/wiki/MovieLens. Accessed: 2020-05-20.