| 研究生: |
林佩昀 Lin, Pei-Yun |
|---|---|
| 論文名稱: |
ScoreRAG:基於檢索增強生成與一致性相關性評分的結構化新聞生成框架 ScoreRAG: A Retrieval-Augmented Generation Framework with Consistency-Relevance Scoring and Structured Summarization for News Generation |
| 指導教授: |
蔡炎龍
Tsai, Yen-Lung |
| 口試委員: |
陳天進
Chen, Ten-Ging 張宜武 Chang,Yi-Wu |
| 學位類別: |
碩士
Master |
| 系所名稱: |
理學院 - 應用數學系 Department of Mathematical Sciences |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 54 |
| 中文關鍵詞: | 檢索增強生成 、新聞生成 、大型語言模型 、語意重排序 、分級摘要生成 、自然語言處理 |
| 外文關鍵詞: | Retrieval-Augmented Generation, News Generation, Large Language Models, Semantic Reranking, Graded Summarization, Natural Language Processing |
| 相關次數: | 點閱:47 下載:15 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究提出名為ScoreRAG的方法,旨在提升大型語言模型生成新聞文章的品質。雖然自然語言處理和大型語言模型已有顯著發展,然而在新聞生成任務中,語言模型仍面臨幻覺、事實不一致以及缺乏領域專業知識等挑戰。ScoreRAG透過結合檢索增強生成、一致性的相關性評估和結構化摘要的多階段框架整合來解決這些問題。該系統首先從向量資料庫中檢索相關新聞文檔,並對應至新聞資料庫以獲取完整的文章內容,接著利用大型語言模型評估檢索文檔與新聞的相關性分數,並根據相關性分數對檢索文檔進行排序和過濾,移除低相關文章。最後,系統根據相關性分數進行分級摘要,並將結果與系統提示詞一同輸入語言模型進行最終輸出。透過此方法,ScoreRAG旨在顯著提高生成新聞內容的準確性、連貫性、資訊豐富度以及專業性,同時在整個生成過程中保持穩定性和一致性。程式碼與演示:https://github.com/peiyun2260/ScoreRAG
This research introduces ScoreRAG, an approach to enhance the quality of automated news generation. Despite advancements in Natural Language Processing and large language models, current news generation methods often struggle with hallucinations, factual inconsistencies, and lack of domain-specific expertise when producing news articles. ScoreRAG addresses these challenges through a multi-stage framework combining retrieval-augmented generation, consistency relevance evaluation, and structured summarization. The system first retrieves relevant news documents from a vector database, maps them to complete news items, and assigns consistency relevance scores based on large language model evaluations. These documents are then reranked according to relevance, with low-quality items filtered out. The framework proceeds to generate graded summaries based on relevance scores, which guide the large language model in producing complete news articles following professional journalistic standards. Through this methodical approach, ScoreRAG aims to significantly improve the accuracy, coherence, informativeness, and professionalism of generated news articles while maintaining stability and consistency throughout the generation process. The code and demo are available at: https://github.com/peiyun2260/ScoreRAG
中文摘要i
Abstract ii
Contents iii
List of Tables vi
List of Figures vii
1 Introduction 1
1.1 Research Background 1
2 Literature Review 3
2.1 Overview of Natural Language Processing 3
2.2 Transformer Architecture 4
2.2.1 Input Embedding and Positional Encoding 5
2.2.2 Self-attention mechanism 7
2.2.3 Multi-head self-attention mechanism 9
2.2.4 Position-wise fully connected feed-forward network 11
2.2.5 Residual and normalization 11
2.2.6 Masked multi-head attention mechanism 13
2.2.7 The final linear and softmax layer 14
2.3 Decoding Strategies 15
2.3.1 Temperature Sampling 15
2.3.2 Top-k Sampling and Top-p Sampling 16
2.4 Enhancing LLMs into AI Agents 17
2.4.1 Retrieval-Augmented Generation 17
2.4.2 Rerank 18
2.4.3 AI Planning 19
3 Methods 22
3.1 Data Preprocessing 22
3.1.1 Text Embeddings 23
3.2 System Architecture 25
3.2.1 RAG-based News Retrieval 25
3.2.2 Mapping News from Database 26
3.2.3 Consistency Scoring and Reranking 27
3.2.4 Score-based Summarization Generation 27
3.2.5 Guided News Generation 28
4 Experiments 30
4.1 Experiment Setup 30
4.1.1 Compared Methods 30
4.1.2 Evaluation Strategy 30
4.2 Results 32
4.2.1 LLM Evaluation 32
4.2.2 Expert Evaluation 33
4.2.3 Analysis 35
5 Conclusion 36
6 Future Work 37
Bibliography 39
Appendix A Detailed generated news from different prompts 42
Appendix B Clean text function in data preprocessing 44
Appendix C Embedding configuration for text preprocessing 45
Appendix D Retriever function for news embedding 46
Appendix E The function of evaluating consistency score 47
Appendix F The function of generating graded summaries 49
Appendix G The function of guided news generation 51
Appendix H Frontend interface of ScoreRAG 52
Appendix I The evaluation criteria of ScoreRAG 54
[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
[2] Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst., 43(2), January 2025.
[3] Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli. Hallucination is inevitable: An innate limitation of large language models, 2024.
[4] Sebastian Farquhar et al. Detecting hallucinations in large language models using semantic entropy. Nature, 630:625–630, June 2024.
[5] Daniel Jurafsky and James H. Martin. Speech and Language Processing. Stanford University, 3rd edition, 2023. Draft edition.
[6] Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, 1996.
[7] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995.
[8] Geoffrey E. Hinton, Simon Osindero, and Yee Whye Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
[9] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
[10] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning internal representations by error propagation. Tech. Rep. ICS 8506, Institute for Cognitive Science, University of California, San Diego, 1985.
[11] Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14(2):179–211, 1990.
[12] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR 2013), Workshop Track, 2013. Presented at the ICLR 2013 Workshop Track.
[13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[15] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Pearson, 4 edition, 2021.
[16] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Matt Kusner, Willie Neiswanger, Wen-tau Yu, and Sebastian Riedel. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
[17] Hongyin Li, Jie Huang, Jiahui Huang, Lei Han, and Bing Qin. Re2g: Retrieve, rerank, and generate for factual open-domain question answering. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3143–3157, December 2023.
[18] Zeming Ji, Nayeon Lee, Rudolf Frieske, Tao Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Andrea Madotto, and Pascale Fung. Hallucination in natural language generation: A survey. ACM Computing Surveys, 2023.
[19] Drew McDermott, Malik Ghallab, Adele Howe, Craig Knoblock, James Kurien, John L. Bresina, Brian Drabble, Alan Garvey, Keith Golden, J. Scott Penberthy, David Smith, and Daniel Weld. Pddl—the planning domain definition language. Tech. Rep. CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control, 1998. 40
[20] Shinn Yao, Yujia Zhao, Dian Yu, Jing Cao, Michael K. Y. Li, Nanyun Peng, and Daniel S. Weld. Plan-and-solve prompting for complex reasoning tasks. In Findings of the Association for Computational Linguistics: EMNLP 2023, December 2023.
[21] Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei.
Multilingual e5 text embeddings: A technical report, 2024.
[22] Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451. Association for Computational Linguistics, 2020.
[23] Chroma Core. Chroma: The AI-native open-source embedding database. https://github.com/chroma-core/chroma. Accessed: Apr. 21, 2025.
[24] LangChain-AI. LangChain: Build context-aware reasoning applications. https://github.com/langchain-ai/langchain. Accessed: Apr. 21, 2025.
[25] Chroma. Chroma: Open-source AI application database. https://www.trychroma.com/.Accessed: Apr. 21, 2025.
[26] MongoDB, Inc. MongoDB: The developer data platform. https://www.mongodb.com/.Accessed: Apr. 21, 2025.
[27] Xuezhi Wang, Jason Wei, Dale Schuurmans, Ed Chi, Quoc Le, and Eric Chi. Self-consistency improves chain of thought reasoning in language models. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.