| 研究生: |
羅永富 Lo, Yung-Fu |
|---|---|
| 論文名稱: |
語言模型潛在空間中的語義概念注入之探討 Semantic Concept Injection in the Latent Space of Language Models |
| 指導教授: |
蕭舜文
Hsiao, Shun-Wen |
| 口試委員: |
陳孟彰
Chen, Meng-Chang 黃思皓 Huang, Szu-Hao 黃意婷 Huang, Yi-Ting 郁方 Yu, Fang |
| 學位類別: |
碩士
Master |
| 系所名稱: |
商學院 - 資訊管理學系 Department of Management Information System |
| 論文出版年: | 2025 |
| 畢業學年度: | 113 |
| 語文別: | 英文 |
| 論文頁數: | 55 |
| 中文關鍵詞: | 自然語言處理 、潛在空間 、WordNet 、低秩適應(LoRA) 、語義概念注入 |
| 外文關鍵詞: | NLP, Latent space, WordNet, Low-Rank Adaptation (LoRA), Semantic concept injection |
| 相關次數: | 點閱:38 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,以Transformer為基礎的語言模型在自然語言處理領域取得了革命性的進展,這些模型能夠生成高維度的嵌入向量,捕捉複雜的語義關聯。然而,這些嵌入向量往往與直觀的人類概念結構缺乏對齊。本研究提出一個原則性架構,藉由結合WordNet等外部結構化知識,實現在語言模型潛在空間中的語義概念注入。我們運用低秩適應(LoRA)進行參數高效的微調,於模型中注入本體論語義約束,使嵌入向量的餘弦相似度對齊WordNet中階層性的Wu-Palmer相似度量。我們探討了兩種注入策略:用於複合語義概念的統一式LoRA(ULCSC)及用於單一語義概念的解耦式LoRA(DLISC)。在多項與語義相關的下游任務(如概念分類、多標籤概念分類及零樣本概念分類)中的實驗評估顯示,所提出的方法均有顯著的效能提升。在一個包含13個類別的概念分類任務中,我們的DLISC方法達到完美評分(準確率:1.0000,F1 Macro:1.0000,F1 Weighted:1.0000),大幅優於基線BERT模型(準確率:0.7778,F1 Macro:0.5051,F1 Weighted:0.7390)。此外,在多標籤概念分類及零樣本概念分類場景中,本方法始終優於基線模型,突顯其在多義性處理與語義增強上的有效性。結果顯示,本方法在提升潛在空間的可解釋性與語義含意的同時,亦維持了參數效率,驗證了利用外部知識庫實施語義概念注入以優化自然語言處理應用的可行性與有效性。
Recent advances in transformer-based language models have revolutionized natural language processing by generating high-dimensional embeddings that capture complex semantic relationships. However, these embeddings often lack alignment with intuitive human conceptual structures. This study proposes a principled framework for semantic concept injection in the latent space of language models by leveraging external structured knowledge from WordNet. We employ parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA) to inject ontological semantic constraints, enforcing the cosine similarity of embeddings to align with WordNet’s hierarchical Wu-Palmer similarity metric. Two injection strategies are explored: Unified LoRA for Composite Semantic Concepts (ULCSC) and Disentangled LoRA for Individual Semantic Concepts (DLISC). Experimental evaluations across several semantically related downstream tasks, such as concept classification, multi-label concept classification, and zero-shot concept classification, show substantial performance improvements. In a 13-class concept classification task, our DLISC method achieves perfect scores (Accuracy: 1.0000, F1 Macro: 1.0000, F1 Weighted: 1.0000), outperforming the baseline BERT model (Accuracy: 0.7778, F1 Macro: 0.5051, F1 Weighted: 0.7390). Additionally, in multi-label concept and zero-shot concept classification scenarios, our methods consistently outperform baseline models, emphasizing its effectiveness in handling polysemy and enhancing semantic meanings. Results show enhanced interpretability and semantic meanings in the latent space while maintaining parameter efficiency, validating the effectiveness of semantic concept injection using external knowledge bases for improved NLP applications.
摘要 i
Abstract ii
Contents iv
List of Figures vi
List of Tables viii
1 Introduction 1
2 Related Work 10
2.1 Representation Learning 10
2.1.1 Word2Vec 10
2.1.2 GloVe 11
2.1.3 FastText 11
2.1.4 ELMo 11
2.1.5 BERT 11
2.1.6 RoBERTa 12
2.1.7 GPT 12
2.2 Ontologies and Semantic Hierarchies 13
2.3 Knowledge Base Injection 15
2.4 Parameter-Efficient Fine-Tuning (PEFT) 17
3 Proposed Method 19
3.1 Overview 19
3.1.1 Concept-Based Word Pair Similarity Calculation 20
3.1.2 Semantic Concept Injection 20
3.1.3 Loss Function 20
3.2 Concept-Based Word Pair Similarity Calculation 21
3.2.1 Extracting Words Under a Specified Concept 21
3.2.2 Generating Word Pairs and Computing WUP Similarity 23
3.3 Semantic Concept Injection 25
3.3.1 Unified LoRA for Composite Semantic Concepts (ULCSC) 26
3.3.2 Disentangled LoRA for Individual Semantic Concepts (DLISC) 27
3.4 Loss Function 29
4 Experiments 31
4.1 Data Set 31
4.1.1 Datasets for Semantic Space Analysis 31
4.1.2 Datasets for Downstream Task Evaluation 32
4.2 Evaluation Metrics 34
4.2.1 Semantic Space Structure Evaluation 34
4.2.2 Downstream Task Evaluation 35
4.3 Semantic Space Analysis 37
4.4 Concept Classification 42
4.5 Multi-Label Concept Classification 44
4.6 Zero-Shot Concept Classification 45
5 Conclusion 47
References 49
A LoRA Hyperparameter Ablation Study 54
B Training Data Sampling 55
Abeysiriwardana, M., & Sumanathilaka, D. (2024). A survey on lexical ambiguity detection and word sense disambiguation. In Proceedings of the 20th IEEE International Colloquium on Signal Processing and its Applications (CSPA) (pp. 1–6).
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information (L. Lee, M. Johnson, & K. Toutanova, Eds.). Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051
Bystrov, D. (2024). Information retrieval multi-agent system established on the metaphysics lexical database. In Information Systems and Technological Advances for Sustainable Development (pp. 1–6). Springer. https://doi.org/10.1007/978-3-031-75329-9_1
Chandrasekaran, D., & Mago, V. (2021). Evolution of semantic similarity—a survey. ACM Computing Surveys, 54(2). https://doi.org/10.1145/3440755
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423
Guarino, N., Oberle, D., & Staab, S. (2009). What is an ontology? In S. Staab & R. Studer (Eds.), Handbook on Ontologies (pp. 1–17). Springer. https://doi.org/10.1007/978-3-540-92673-3_0
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-efficient transfer learning for NLP. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning (pp. 2790–2799, Vol. 97). PMLR. https://proceedings.mlr.press/v97/houlsby19a.html
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2022). OpenReview. https://openreview.net/forum?id=nZeVKeeFYf9
Lester, B., Al-Rfou, R., & Constant, N. (2021). The power of scale for parameter-efficient prompt tuning. In M.-F. Moens, X. Huang, L. Specia, & S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 3045–3059). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.243
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871–7880). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.703
Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 4582–4597). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.353
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2020). RoBERTa: A robustly optimized BERT pretraining approach. In Proceedings of the 8th International Conference on Learning Representations (ICLR 2020). OpenReview. https://openreview.net/forum?id=SyxS0T4tvS
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Miller, G. A. (1995). WordNet: A lexical database for english. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database*. International Journal of Lexicography, 3(4), 235–244. https://doi.org/10.1093/ijl/3.4.235
Munroe, R. (2010, May). Color name survey results. https://blog.xkcd.com/2010/05/03/color-survey-results/
Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250. https://doi.org/10.1016/j.artint.2012.07.001
Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In M. Walker, H. Ji, & A. Stent (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 2227–2237). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1202
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training (Technical report). OpenAI. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP) (pp. 3982–3992). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410
Sinha, K., Jia, R., Hupkes, D., Pineau, J., Williams, A., & Kiela, D. (2021). Masked language modeling and the distributional hypothesis: Order word matters pre-training for little. In M.-F. Moens, X. Huang, L. Specia, & S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 2888–2913). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.230
Song, Z., Yan, B., Liu, Y., Fang, M., Li, M., Yan, R., & Chen, X. (2025). Injecting domainspecific knowledge into large language models: A comprehensive survey. arXiv preprint arXiv:2502.10708.
Speer, R., Chin, J., & Havasi, C. (2017). ConceptNet 5.5: An open multilingual graph of general knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (pp. 4444–4451). AAAI Press. https://doi.org/10.1609/aaai.v31i1.11164
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(86), 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.html
Wu, Z., & Palmer, M. (1994). Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (pp. 133–138). Association for Computational Linguistics. https://doi.org/10.3115/981732.981751
全文公開日期 2030/08/10