| 研究生: |
林佩欣 Lin, Pei-Hsin |
|---|---|
| 論文名稱: |
基於大型語言模型的即時新聞檢索與生成系統開發 Building a Real-Time News Retrieval-Augmented Generation System with Large Language Models |
| 指導教授: |
蔡炎龍
Tsai, Yen-Lung |
| 口試委員: |
呂欣澤
Lu, Hsin-Tse 洪智傑 Hung, Chih-Chieh |
| 學位類別: |
碩士
Master |
| 系所名稱: |
創新國際學院 - 全球傳播與創新科技碩士學位學程 Master’s Program in Global Communication and Innovation Technology |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 94 |
| 中文關鍵詞: | 檢索增強生成(RAG) 、大型語言模型(LLM) 、新聞檢索 、生成式人工智慧 、開源模型 |
| 外文關鍵詞: | Retrieval-Augmented Generation, Large Language Models, News Retrieval, Generative AI, Open-Source Models |
| 相關次數: | 點閱:55 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
大型語言模型(LLM)具備優異的查詢理解與文本生成能力,有效解決傳統檢索系統中的語意挑戰。然而,其高度依賴預訓練知識,容易產生過時或虛構的資訊,特別是在即時新聞檢索的應用情境中更為明顯。
相較於其他資料類型,新聞資料具備時效性高、語意重複率高、資訊片段化等特性,進一步加深了檢索與生成任務的挑戰。因此,本研究選擇以新聞資料作為基礎,實作一套檢索增強生成系統,並將情境設定於新聞專業人員使用內部查詢系統時的多樣需求,藉此觀察RAG技術是否能有效降低幻覺現象,提升資訊正確性。
研究首先自行爬取ETtoday新聞資料作為知識來源,並設計實驗比較四種模型配置:Llama3-8B、Llama3-8B 搭配 RAG、GPT-4、以及 GPT-4 搭配 RAG。測試任務涵蓋事實問答、事件綜合與新聞摘要等多個面向,並針對生成內容的正確性與完整性等進行評分。結果顯示,引入RAG技術可顯著提升模型回應的事實準確性,並有效減少生成幻覺的情況。
在系統開發方面,涵蓋從資料爬取、前處理、文件切分至向量資料庫設計,最終以 Gradio 實作互動介面。研究過程中亦強調檢索品質的追蹤與回饋機制,確保最終生成答案的可靠性。
Large language models (LLMs) excel at query understanding and text generation but are prone to outdated or hallucinated content, especially in real-time news retrieval. News data poses unique challenges such as high redundancy, temporal sensitivity, and fragmented information.
This study develops a retrieval-augmented generation (RAG) system tailored for internal news query scenarios and evaluates its effectiveness using ETtoday news data. Four model setups—LLaMA3-8B, LLaMA3-8B+RAG, GPT-4, and GPT-4+RAG—were tested across tasks like factual Q&A and summarization.
The evaluation focused on factual accuracy and completeness. Results show that RAG significantly improves output reliability while reducing hallucinations. The system implementation includes data crawling, data preprocessing, word chunking, and vector database building, with an interactive frontend built using Gradio.
Throughout the experiment, emphasis was placed on monitoring retrieval quality and incorporating feedback checking to ensure the reliability of final outputs.
1. Introduction 1
1.1 Research Objectives 2
1.2 Research Framework 3
2. Theocratical Background 4
2.1 Traditional Information Retrieval Methods 4
2.1.1 Keyword-Based Retrieval 4
2.1.2 TF-IDF 5
2.1.3 BM25 6
2.2 Deep Learning for Information Retrieval 7
2.2.1 Recurrent Neural Networks (RNN) 7
2.2.2 Transformer 9
2.3 Large Language Models (LLM) 12
Advantages and Limitations 13
Common Solutions 14
2.4 Generative artificial intelligence (AI) 15
Advantages and Limitations 16
Applications of Generative AI in News Information Retrieval 17
2.5 Retrieval-Augmented Generation (RAG) 18
Applications 18
Limitations and Challenges 19
2.6 AI agent 20
3. Methodology 22
3.1 System structure 22
3.2 Data Preprocessing 23
Data Set 23
Text Segmentation 24
3.3 Create Vector database 26
Optimization of Sentence Embeddings 26
Storing Vectorized Data with FAISS 27
3.4 Response Generation 29
Llama3-8B 29
Llama3-8B + RAG 30
GPT-4 30
GPT-4 + RAG 30
System prompt 31
4. Experiment Results 33
Validation Question Design 33
Experiment Result and Demo 37
Key Factors Influencing Evaluation Differences 42
Recommended Task Types for Each Model 43
Key Finding 44
Summary of Model Performance 45
5. Conclusions 46
Limitations 47
Adaptive Retrieval Techniques 47
Future Expect 49
6. References 50
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P.,
Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., . . . Amodei, D. (2020). Language Models are Few-Shot Learners. Neural Information Processing Systems, 33, 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots. On The Dangers of Stochastic Parrots: Can Language Models Be Too Big?, 610–623. https://doi.org/10.1145/3442188.3445922
Chase, L. (2022). LangChain: Building applications with LLMs through composability. GitHub Documentation.
Gharge, S., & Chavan, M. (2017). An integrated approach for malicious tweets detection using NLP. 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT). https://doi.org/10.1109/icicct.2017.7975235
Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Montreal, Quebec, Canada, 2014, pp. 2672−2680.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. In MIT Press eBooks. https://dl.acm.org/citation.cfm?id=3086952
Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020, February 10). REALM: Retrieval-Augmented Language Model Pre-Training. arXiv.org. https://arxiv.org/abs/2002.08909
Hearst, M. A. (2009). Search user interfaces. http://ci.nii.ac.jp/ncid/BA91702558
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hochreiter, S. (1998). The vanishing gradient problem during learning recurrent neural nets
and problem solutions. International Journal of Uncertainty Fuzziness and Knowledge-Based Systems, 06(02), 107–116. https://doi.org/10.1142/s0218488598000094
Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2020). The curious case of neural text degeneration. arXiv (Cornell University). https://arxiv.org/pdf/1904.09751.pdf
Izacard, G., & Grave, E. (2020, July 2). Leveraging Passage Retrieval with
Generative Models for Open Domain Question Answering. arXiv.org. https://arxiv.org/abs/2007.01282
Johnson, J., Douze, M., & Jégou, H. (2017, February 28). Billion-scale similarity search with GPUs. arXiv.org. https://arxiv.org/abs/1702.08734
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. (2020). Dense Passage Retrieval for Open-Domain Question Answering. Dense Passage Retrieval for Open-domain Question Answering. https://doi.org/10.18653/v1/2020.emnlp-main.550
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
https://doi.org/10.1038/nature14539
Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020, May 2). On faithfulness and factuality in abstractive summarization. arXiv.org. https://arxiv.org/abs/2005.00661
Manning, C. D., Raghavan, P., & Schütze, H. (2009). Introduction to information retrieval.
Choice Reviews Online, 46(05), 46–2715. https://doi.org/10.5860/choice.46-2715
Mitra, B., & Craswell, N. (2018). An Introduction to Neural Information Retrieval t.
Foundations and Trends® in Information Retrieval, 13(1), 1–126. https://doi.org/10.1561/1500000061
Mikolov, T., Karafiát, M., Burget, L., Černocký, J., & Khudanpur, S. (2010). Recurrent
neural network based language model. Interspeech 2022. https://doi.org/10.21437/interspeech.2010-343
Masterman, T., Besen, S., Sawtell, M., & Chao, A. (n.d.). The landscape of emerging AI agent architectures for reasoning, planning, and tool calling: a survey. arXiv.org. https://arxiv.org/abs/2404.11584
Nogueira, R., & Cho, K. (2019, January 13). Passage Re-ranking with BERT. arXiv.org. https://arxiv.org/abs/1901.04085
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and
Trends in Information Retrieval Vol. 2, No 1-2, 1–135 http://dx.doi.org/10.1561/1500000011
Robertson, S., & Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and
beyond. Foundations and Trends® in Information Retrieval, 3(4), 333–389. https://doi.org/10.1561/1500000019
Ramit Sawhney, Harshit Joshi, Saumya Gandhi, and Rajiv Ratn Shah. 2020.
A Time-Aware Transformer Based Model for Suicide Ideation Detection on Social Media. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7685–7697, Online. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.619
Shuster, K., Poff, S., Chen, M., Kiela, D., & Weston, J. (2021). Retrieval augmentation reduces hallucination in conversation. Empirical Methods in Natural Language Processing, 3784–3803. https://aclanthology.org/2021.findings-emnlp.320/
Trabelsi, M., Chen, Z., Davison, B. D., & Heflin, J. (2021). Neural ranking models for
document retrieval. Information Retrieval, 24(6), 400–444. https://doi.org/10.1007/s10791-021-09398-0
Thorne, J., & Vlachos, A. (2018). Automated fact checking: task formulations, methods and future directions. arXiv (Cornell University), 3346–3359. https://arxiv.org/pdf/1806.07687.pdf
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N.,
Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. Advances in neural information processing systems, 30.
https://doi.org/10.48550/arXiv.1706.03762
Yasunaga, M., Ren, H., Bosselut, A., Liang, P., & Leskovec, J. (2021). QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. https://doi.org/10.18653/v1/2021.naacl-main.45
Zhang, Y., Ni, A., Mao, Z., Wu, C. H., Zhu, C., Deb, B., Awadallah, A. H., Radev, D., & Zhang, R. (2021, October 16). SUMM^N: a Multi-Stage summarization framework for long input dialogues and documents. arXiv.org. https://arxiv.org/abs/2110.10150
Zhang, T., Ladhak, F., Durmus, E., Liang, P., McKeown, K., & Hashimoto, T. B. (2023, January 31). Benchmarking large language models for news summarization. arXiv.org. https://arxiv.org/abs/2301.13848
此全文未授權公開