跳到主要內容

簡易檢索 / 詳目顯示

研究生: 葉柏皓
Yeh, Bo-Hao
論文名稱: 應用自動化提示工程與RAG機制於問答系統之優化
Application of automated prompt engineering and the RAG mechanism to the optimization of question answering systems
指導教授: 陳恭
Chen, Kung
口試委員: 黃瀚萱
Huang, Han-Hsuan
莊豐源
Chung, Feng-Yuan
陳恭
Chen, Kung
學位類別: 碩士
Master
系所名稱: 商學院 - 資訊管理學系
Department of Management Information System
論文出版年: 2025
畢業學年度: 113
語文別: 中文
論文頁數: 55
中文關鍵詞: 生成式AIRAG自動化提示工程PE2BERT Score
外文關鍵詞: Generative AI, RAG, Automated Prompt Engineering, PE2, BERT Score
相關次數: 點閱:33下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 生程式AI近年快速崛起,其中結合RAG(Retrieval-Augmented Generation)問答系統的應用更受到許多不同產業界廣泛關注。然而,作為問答系統核心的大型語言模型(LLM),其回答品質直接影響系統效能。傳統上透過微調LLM的方法,往往需投入大量硬體資源與專業技術,導致推廣困難。因此,本研究以自動化提示工程方法 PE2(Prompt Engineering a Prompt Engineer)為基礎框架,並根據實際應用情境進行調整與設計,將其有效融合至生成式 AI 的 RAG問答系統中。透過自動化調整與優化查詢(Query)的方式,在無需對模型進行額外微調的前提下,有效提升LLM的回答品質,並降低系統建置所需的資源成本與技術門檻。
    實驗結果顯示,本研究所提出之方法能有效提高LLM的回答品質,並改善語意相關度評估指標(BERT Score)。此外,本研究亦自行設計了一套客觀的 Query 評估標準,取代以往缺乏統一客觀指標,僅能依靠人工主觀判斷 Query 品質之不足,進一步提升了提示詞評估的一致性與可靠性。本研究最後亦提出未來的研究方向,聚焦於進一步強化生成式 AI 問答系統的穩定性與準確性,期望透過持續優化與擴展,使其更能因應多元且複雜的應用情境,提升實務運用價值。


    Generative AI has rapidly emerged in recent years, with RAG (Retrieval-Augmented Generation) QA systems receiving growing attention across industries. As the core of these systems, the response quality of large language models (LLMs) greatly affects system performance. However, improving LLMs through fine-tuning requires substantial resources and expertise, limiting its scalability. This study adopts the automated prompt engineering method PE2 (Prompt Engineer a Prompt Engineer) as a framework, tailoring it to real-world scenarios and integrating it into a generative AI-based RAG QA system. By automatically adjusting and optimizing prompt queries, our method improves response quality without additional fine-tuning, reducing technical and resource costs.
    Experiments show that the proposed approach effectively increases answer quality and improves semantic relevance (BERT Score). Additionally, we design an objective query evaluation standard to replace subjective judgment and enhance prompt consistency. Finally, this study proposes future directions for improving the robustness and precision of generative AI QA systems, aiming to enhance their adaptability to diverse and complex application scenarios.

    第1章 緒論 1
    1.1 研究背景與動機 1
    1.2 研究目的 2
    1.3 論文架構 2

    第2章 文獻探討與回顧 4
    2.1 檢索增強生成 4
     2.1.1 技術背景與發展歷程 4
     2.1.2 運作機制與系統架構 5
     2.1.3 應用情境與發展潛力 6
    2.2 提示工程 7
     2.2.1 技術背景與定義 7
     2.2.2 核心原理與技術方法 8
     2.2.3 典型應用場景 9
     2.2.4 挑戰 9
    2.3 自動化提示工程 10
     2.3.1 Automatic Prompt Engineer 11
     2.3.2 Automatic Prompt Optimization 12
     2.3.3 Prompt Engineering a Prompt Engineer 12
     2.3.4 自動提示工程框架的比較 14

    第3章 研究方法 15
    3.1 PE2 方法探討與驗證 15
     3.1.1 測試資料 15
     3.1.2 模型選擇 17
     3.1.3 PE2 流程 18
     3.1.4 驗證結果 18
    3.2 PE2 結合 RAG 問答系統 19
     3.2.1 系統設計理念 19
     3.2.2 系統架構設計 19
     3.2.3 知識庫設計 21
     3.2.4 RAG 方法設計 22
     3.2.5 PE2 結合 RAG 之設計 23

    第4章 系統實作 26
    4.1 實驗環境與開發工具 26
    4.2 程式碼設計 28
     4.2.1 知識庫建立 28
     4.2.2 RAG 方法實作 30
     4.2.3 PE2 結合 RAG 實作 32
    4.3 系統評估與限制 45
     4.3.1 系統評估 45
     4.3.2 系統限制 50

    第5章 結論與未來展望 52
    5.1 結論 52
    5.2 未來展望 53

    參考文獻 54

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei,
    D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
    Chase, H. (2022). LangChain [Software]. https://github.com/langchain-ai/langchain Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., ... & Wang, H. (2023). Retrieval-
    augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2.
    Gupta, S., Ranjan, R., & Singh, S. N. (2024). A Comprehensive Survey of Retrieval- Augmented Generation (RAG): Evolution, Current Landscape and Future Directions. arXiv preprint arXiv:2410.12837.
    Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020, November). Retrieval augmented language model pre-training. In International conference on machine learning (pp. 3929-3938). PMLR.
    Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., ... & Fung, P. (2023). Survey of hallucination in natural language generation. ACM computing surveys, 55(12), 1-38. Karpukhin, V., Oguz, B., Min, S., Lewis, P. S., Wu, L., Edunov, S., ... & Yih, W. T. (2020, November). Dense Passage Retrieval for Open-Domain Question Answering. In
    EMNLP (1) (pp. 6769-6781).
    Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
    Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33, 9459-9474.
    PromptEngineering.org. (2024). What is prompt engineering? Retrieved May 10, 2025, from https://promptengineering.org/what-is-prompt-engineering/
    Pryzant, R., Iter, D., Li, J., Lee, Y. T., Zhu, C., & Zeng, M. (2023). Automatic prompt optimization with" gradient descent" and beam search. arXiv preprint arXiv:2305.03495.
    Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67.
    Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927.
    Schulhoff, S., Ilie, M., Balepur, N., Kahadze, K., Liu, A., Si, C., ... & Resnik, P. (2024). The prompt report: A systematic survey of prompting techniques. arXiv preprint arXiv:2406.06608.
    Suzgun, M., Scales, N., Schärli, N., Gehrmann, S., Tay, Y., Chung, H. W., ... & Wei, J. (2022). Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261.
    Vatsal, S., & Dubey, H. (2024). A survey of prompt engineering methods in large language models for different nlp tasks. arXiv preprint arXiv:2407.12994.
    Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824-24837.
    Ye, Q., Axmed, M., Pryzant, R., & Khani, F. (2023). Prompt engineering a prompt engineer. arXiv preprint arXiv:2311.05661.
    Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A
    survey of large language models. arXiv preprint arXiv:2303.18223, 1(2).
    Zheng, L., Chiang, W. L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., ... & Stoica, I. (2023). Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36, 46595-46623.

    無法下載圖示 全文公開日期 2030/07/27
    QR CODE
    :::