| 研究生: |
莊昊耘 Chuang, Hao-Yun |
|---|---|
| 論文名稱: |
政治判斷的評價研究:以臺灣2024年總統大選網路評論為例的語料庫評價分析及其在大型語言模型應用之探討 Evaluations of Political Judgements: A Corpus-Based Appraisal Analysis of Online Comments on the Taiwan 2024 Presidential Election and its Utility for LLM Implementation |
| 指導教授: |
張瑜芸
Chang, Yu-Yun |
| 口試委員: |
謝舒凱
Hsieh, Shu-Kai 郭岳鑫 Kuo, Yueh-Hsin |
| 學位類別: |
碩士
Master |
| 系所名稱: |
外國語文學院 - 語言學研究所 Graduate Institute of Linguistics |
| 論文出版年: | 2025 |
| 畢業學年度: | 114 |
| 語文別: | 中文 |
| 論文頁數: | 76 |
| 中文關鍵詞: | 評價框架 、大型語言模型 、政治語料分析 、提示工程 、評價語言 |
| 外文關鍵詞: | Appraisal framework, Large Language Model, Political Discourse Analysis, Prompt Engineering, Evaluative language |
| 相關次數: | 點閱:22 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究以系統功能語言學(Systemic Functional Linguistics, SFL)的評價理論(Appraisal Theory)為基礎,探討台灣 PTT 八卦板中針對政黨的顯性評價語言,並嘗試結合大型語言模型(Large Language Model, LLM)之提示學習(prompting)方法進行自動化分類。研究聚焦於 Judgement 類別的五個子分類(normality、capacity、tenacity、propriety、veracity),針對民進黨與國民黨兩個主要政黨的評論語料進行標註與分析。方法上,首先以人工標註建立資料集,接著使用 zero-shot 與 few-shot 提示方法,評估 GPT 模型在自動判斷 Judgement 子類別的準確度,並進行錯誤分析。研究同時設計了一個互動式資料儀表板,整合趨勢視覺化與文字雲模組,協助使用者觀察不同時間區段、政黨與子類別的評價趨勢。結果顯示,few-shot 模式在整體表現上優於 zero-shot 模式。結合趨勢視覺化與文字雲發現,民進黨相關評論中於六月到八月之間,道德相關(propriety)負面評價上升,其評價亦與 2023 年台灣 #MeToo 運動有密切關係,而國民黨則集中受到能力相關(capacity)負面評價,其評價與延後提名過程有關。研究結果顯示 LLM 在輔助政治語料分析上具有潛力,但仍受限於多歧義理解與隱喻辨識能力。
This study is grounded in Appraisal framework within Systemic Functional Linguistics (SFL) and investigates explicit evaluative language directed at political parties on Taiwan's PTT Gossiping board. It explores the integration of prompting techniques from Large Language Models (LLMs) to automate classification. The research focuses on five subcategories under the Judgement: normality, capacity, tenacity, propriety, and veracity, and annotates the comments manually. Later, according to the statistical results, this study analyzes comment corpora related to Taiwan's two major political parties: the Democratic Progressive Party (DPP) and the Kuomintang (KMT). Methodologically, the study first constructs a manually annotated dataset, then applies zero-shot and few-shot prompting approaches to evaluate the performance of GPT models in identifying Judgement subcategories, followed by error analysis. An interactive visualization dashboard was also developed, combining trend visualizations and word cloud to help users observe evaluative patterns across time periods, parties, and subcategories. Results show that the few-shot approach outperforms the zero-shot method overall. Through trend visualization and word cloud analysis, the study finds that negative evaluations of Propriety targeting the DPP increased between June and August, closely related to Taiwan's 2023 #MeToo movement. In contrast, the KMT received concentrated negative evaluations of Capacity, associated with delays in its nomination process. The findings suggest that LLMs hold promise for assisting in political discourse analysis, though limitations remain in ambiguity understanding and metaphor recognition.
1 Introduction 6
1.1 Research Background 6
1.2 Research Gap 8
1.3 Research Objectives and Questions 11
1.4 Organization of the Study 12
2 Literature Review 13
2.1 Framework of Evaluative Language 13
2.2 Judgement in Political Discourse 16
2.3 Large Language Model and Prompting Strategies 18
2.3.1 Large Language Model 18
2.3.2 Few-shot Prompting 19
2.4 Automated Detection of Evaluative Language under the Appraisal Framework: A Judgement-centered Review 20
2.4.1 Traditional Machine Learning Approaches 20
2.4.2 Deep Learning Approaches 21
2.4.2.1 BERT-based Models 21
2.4.2.2 Prompt Engineering and Large Language Models 22
3 Methodology 24
3.1 Data Source 24
3.2 Data Collection and Preprocessing 25
3.3 Data Annotation 27
3.3.1 Political Party as the Target Entity 27
3.3.2 Annotation of Judgement Subcategories 30
3.3.3 Polarity and Directness 30
3.3.4 Inter-coder Agreement 31
3.3.5 Statistical Analysis 32
3.3.6 Prompt Engineering 32
3.4 Experimental Setup 34
3.5 Evaluation 35
3.6 Dashboard Implementation 36
3.6.1 Trend Visualization Design 36
3.6.2 Word Cloud Design 38
4 Results 41
4.1 Annotation Result 41
4.1.1 Judgement Evaluations 41
4.1.2 Statistical Results of Political Party and Subcategory 44
4.2 Model Training Results 49
5 Discussions 51
5.1 GPT Prompt Setting: Error Analysis 51
5.1.1 Zero-shot: Misclassification 51
5.1.2 Few-shot: Misclassification 53
5.1.3 Summary of Error Analysis 56
5.2 Evaluation Targeting DPP: Propriety 57
5.3 Evaluation Targeting KMT: Capacity 60
6 Conclusions 64
References 67
Appendix A 73
A.1 Zero-shot Prompt 73
A.2 Few-shot Prompt 73
Aroyehun, S. T., & Gelbukh, A. (2020, December). Automatically predicting judgement dimensions of human behaviour. In M. Kim, D. Beck, & M. Mistica (Eds.), Proceedings of the 18th Annual Workshop of the Australasian Language Technology Association (pp. 131–134). Australasian Language Technology Association. https://aclanthology.org/2020.alta-1.18/
Benamara, F., Taboada, M., & Mathieu, Y. (2017). Evaluative language beyond bags of words: Linguistic insights and computational applications. Computational Linguistics, 43, 201–264. https://doi.org/10.1162/COLI_a_00278
Brown, P., & Levinson, S. C. (1987). Politeness: Some universals in language usage. Cambridge University Press.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. CoRR, abs/2005.14165. https://arxiv.org/abs/2005.14165
Casey, W., Navendu, G., & Argamon, S. (2005). Using appraisal taxonomies for sentiment analysis.
Cavasso, L., & Taboada, M. (2021). A corpus analysis of online news comments using the appraisal framework. Journal of Corpora and Discourse Studies, 4(1). https://doi.org/10.18573/jcads.61
Chang, C., Hsiao, Y., & Chiu, Y.-J. (2025). Dynamic public perceptions of and media influences on military threats to Taiwan: A method triangulation approach. Communication Research, 0(0), 00936502251339692. https://doi.org/10.1177/00936502251339692
Claassen, C., & Magalhães, P. C. (2022). Effective government and evaluations of democracy. Comparative Political Studies, 55(5), 869–894. https://doi.org/10.1177/00104140211036042
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). https://doi.org/10.18653/v1/N19-1423
Du Bois, J. (2007, January). The stance triangle. https://doi.org/10.1075/phns.164.07du
Gabrielova, E. V., & Maksimenko, O. I. (2021). Implicit vs explicit evaluation: How English-speaking Twitter users discuss migration problems. Russian Journal of Linguistics, 25(1), 105–124. https://journals.rudn.ru/linguistics/article/view/26000
Gastil, J. (2014). Beyond endorsements and partisan cues: Giving voters viable alternatives to unreliable cognitive shortcuts. The Good Society, 23(2), 145–159. Retrieved September 2, 2025, from http://www.jstor.org/stable/10.5325/goodsociety.23.2.0145
Halliday, M., & Matthiessen, C. (1994). An introduction to functional grammar (2nd ed.). Arnold.
Hansson, S., Page, R., & Fuoli, M. (2022). Discursive strategies of blaming: The language of judgment and political protest online. Social Media and Society, 8. https://doi.org/10.1177/20563051221138753
Haselmayer, M., & Jenny, M. (2017). Sentiment analysis of political communication: Combining a dictionary approach with crowdcoding. Quality and Quantity, 51, 2623–2646. https://doi.org/10.1007/s11135-016-0412-4
Hommerberg, C., & Don, A. (2015). Appraisal and the language of wine appreciation: A critical discussion of the potential of the appraisal framework as a tool to analyse specialised genres. Functions of Language, 22. https://doi.org/10.1075/fol.22.2.01hom
Imamovic, M., Deilen, S., Glynn, D., & Lapshinova-Koltunski, E. (2024, March). Using ChatGPT for annotation of attitude within the appraisal theory: Lessons learned. In S. Henning & M. Stede (Eds.), Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII) (pp. 112–123). Association for Computational Linguistics. https://aclanthology.org/2024.law-1.11/
Klingelhöfer, T., & Müller, J. (2024). When do voters perceive intra-party conflict? A democratic life cycle perspective. European Political Science Review, 16(2), 207–224. https://doi.org/10.1017/S1755773923000243
Kmainasi, M. B., Khan, R., Shahroor, A. E., Bendou, B., Hasanain, M., & Alam, F. (2024). Native vs non-native language prompting: A comparative analysis. https://arxiv.org/abs/2409.07054
Kölln, A.-K. (2024). When do citizens consider political parties legitimate? British Journal of Political Science, 54(1), 110–128. https://doi.org/10.1017/S0007123423000364
Kushin, M. J., & Yamamoto, M. (2010). Did social media really matter? College students’ use of online media and political decision making in the 2008 election. Mass Communication and Society, 13(5), 608–630. https://doi.org/10.1080/15205436.2010.516863
Lakoff, G., & Johnson, M. (1980). Conceptual metaphor in everyday language. The Journal of Philosophy, 77(8), 453–486. https://doi.org/10.2307/2025464
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. Retrieved October 1, 2025, from http://www.jstor.org/stable/2529310
Li, L., Dreyfus, S., & Don, A. (2025). Text & Talk, 45(5), 633–655. https://doi.org/10.1515/text-2023-0177
Li, M., Suk, J., Zhang, Y., Pevehouse, J. C., Sun, Y., Kwon, H., Lian, R., Wang, R., Dong, X., & Shah, D. V. (2024). Platform affordances, discursive opportunities, and social media activism: A cross-platform analysis of #MeToo on Twitter, Facebook, and Reddit, 2017–2020. New Media & Society, 0(0), 14614448241285562. https://doi.org/10.1177/14614448241285562
Littlemore, J., & Tagg, C. (2016). Metonymy and text messaging: A framework for understanding creative uses of metonymy. Applied Linguistics, 39(4), 481–507. https://doi.org/10.1093/applin/amw018
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. CoRR, abs/2107.13586. https://arxiv.org/abs/2107.13586
Markert, K., & Nissim, M. (2007). Metonymic proper names: A corpus-based account. In A. Stefanowitsch & S. T. Gries (Eds.), Corpus-based approaches to metaphor and metonymy (pp. 152–174). De Gruyter Mouton. https://doi.org/10.1515/9783110199895.152
Marsh, M., & Tilley, J. (2010). The attribution of credit and blame to governments and its impact on vote choice. British Journal of Political Science, 40(1), 115–134. https://doi.org/10.1017/S0007123409990275
Martin, J. R., & White, P. R. R. (2005). The language of evaluation: Appraisal in english (1st ed.) [Palgrave Language & Linguistics Collection, Education (R0)]. Palgrave Macmillan. https://doi.org/10.1057/9780230511910
Mendes, K., Hollingshead, W., Nau, C., Zhang, J., & Quan-Haase, A. (2023). The evolution of #metoo: A comparative analysis of vernacular practices over time and across languages. Social Media + Society, 9(3), 20563051231196692. https://doi.org/10.1177/20563051231196692
Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., & Zettlemoyer, L. (2022). Rethinking the role of demonstrations: What makes in-context learning work? https://arxiv.org/abs/2202.12837
Mollá, D. (2020, December). Overview of the 2020 ALTA shared task: Assess human behaviour. In M. Kim, D. Beck, & M. Mistica (Eds.), Proceedings of the 18th annual workshop of the australasian language technology association (pp. 127–130). Australasian Language Technology Association. https://aclanthology.org/2020.alta-1.17/
Nguyen, X.-P., Aljunied, S. M., Joty, S., & Bing, L. (2024). Democratizing llms for low-resource languages by leveraging their english dominant abilities with linguistically-diverse prompts. https://arxiv.org/abs/2306.11372
O’Connor, C., & Joffe, H. (2020). Intercoder reliability in qualitative research: Debates and practical guidelines. International Journal of Qualitative Methods, 19, 1609406919899220. https://doi.org/10.1177/1609406919899220
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. https://arxiv.org/abs/2203.02155
Paszenda, J., & Góralczyk, I. (2018). Metonymic motivations behind paragonic uses of proper names in political discourse: A cognitive linguistic approach. LINGUISTICA SILESIANA, vol. 39. online. https://doi.org/10.24425/linsi.2018.124578
Rawnsley, M.-Y., & Sullivan, J. (2016). Taiwanese media reform. Journal of the British Association for Chinese Studies, 6.
Suler, J. (2004). The online disinhibition effect. Cyberpsychology & behavior : the impact of the Internet, multimedia and virtual reality on behavior and society, 7, 321–6. https://doi.org/10.1089/1094931041291295
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). Llama: Open and efficient foundation language models. https://arxiv.org/abs/2302.13971
Tran, G. H., & Ngo, X. M. (2018). News comments on facebook - a systemic functional linguistic analysis of moves and appraisal language in reader-reader interaction. Journal of World Languages, 5, 46–80. https://doi.org/10.1080/21698252.2018.1504856
Vitak, J., Zube, P., Smock, A., Ellison, N., & Lampe, C. (2011). It’s complicated: Facebook users’ political participation in the 2008 election. Cyberpsychology, behavior and social networking, 14, 107–14. https://doi.org/10.1089/cyber.2009.0226
von Sikorski, C. (2018). Political scandals as a democratic challenge| the aftermath of political scandals: A meta-analysis. International Journal of Communication, 12(0), 25. https://ijoc.org/index.php/ijoc/article/view/7100
Walter, A. S., & Redlawsk, D. P. (2019). Voters’ partisan responses to politicians’ immoral behavior. Political Psychology, 40(5), 1075–1097. https://doi.org/https://doi.org/10.1111/pops.12582
Weaver, R. K. (2018). The nays have it: How rampant blame generating distorts american policy and politics. Political Science Quarterly, 133(2), 259–289. https://doi.org/10.1002/polq.12771
Yang, J. (2024). How new media mobilized resources and political opportunities for social movements-a case study based on the sunflower movement in taiwan. International Journal of Social Science and Education Research, 7(2), 190–194. https://doi.org/10.6918/IJOSSER.202402_7(2).0025
Zappavigna, M. (2017, January). 16. evaluation. https://doi.org/10.1515/9783110431070-016
Zhang, J. (2024). A sentiment analysis framework integrating systemic functional grammar and appraisal theory. Journal of Artificial Intelligence Practice, 7. https://doi.org/10.23977/jaip.2024.070308
林意仁. (2010). 由 ptt gossiping 看板看「網路公眾」. 文化研究月報, (108), 52–70. https://doi.org/10.7012/CSM.201009.0052
全文公開日期 2027/12/23