| 研究生: |
李博逸 Li, Bo-Yi |
|---|---|
| 論文名稱: |
基於領域知識蒸餾之大型語言模型與專案開發代理人的設計與實踐:以教育場域為例 Design and Implementation of a Large Language Model and Project Development Agent Based on Domain Knowledge Distillation: A Case Study in the Educational Field |
| 指導教授: |
楊亨利
Yang, Heng-Li 王貞淑 Wang, Chen-Shu |
| 口試委員: |
翁頌舜
Weng, Sung-Shun 林湘霖 Lin, Shiang-Lin 張欣綠 Chang, Hsin-Lu 楊亨利 Yang, Heng-Li 王貞淑 Wang, Chen-Shu |
| 學位類別: |
博士
Doctor |
| 系所名稱: |
商學院 - 資訊管理學系 Department of Management Information System |
| 論文出版年: | 2025 |
| 畢業學年度: | 114 |
| 語文別: | 中文 |
| 論文頁數: | 232 |
| 中文關鍵詞: | 大型語言模型 、專案開發代理人(AI Agent) 、知識蒸餾 、教育中人工智慧 、自我調節學習 |
| 外文關鍵詞: | Large Language Model, Project Development Agent(AI Agent), Knowledge Distillation, Artificial Intelligence in Education, Self-Regulated Learning |
| 相關次數: | 點閱:17 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著大型語言模型技術的快速發展,高等教育的教與學正被重塑:教師可用其協助備課、回饋與分析,學生則以對話方式獲得解題、除錯與創作支援。然而,近期跨國調查顯示,大多數學生雖頻繁使用,但是高等教育機構仍缺乏完善的指導方針與倫理規範下,導致學生缺乏足夠的素養與學術誠信意識。並且,人工智慧的發展已逐漸從單純問答工具轉向嵌入工作流的代理人。在此脈絡下,如何將人工智慧轉化為一種可教導、可練習且可評量的關鍵素養,並建構能支援長週期專案學習的系統化鷹架,已成為解決上述治理落差、提升教學品質,並確保技術應用能有效對齊教育目標的關鍵課題。
儘管人工智慧在教育應用具高度潛力,但專業課程導入仍面臨三項關鍵缺口:通用模型缺乏課程對齊之領域知識而易產生幻覺或淺薄回應、欠缺能結合自我調節學習與專案導向學習的主動式代理人以支撐長週期任務、以及缺乏同時兼顧高階思維、協作歷程與學習成效的統整性評估架構。基於此,本研究以高等教育資訊科技專題課程為場域並依倫理規範完成知情同意,提出可複製的「PDCA知識蒸餾-主動代理-統整評估」框架,透過三項互補實驗驗證:實驗一以PDCA為核心進行知識蒸餾與開源模型領域微調並由專家從流暢性、連貫性、正確性與有用性評估模型品質;實驗二以準實驗比較領域模型與通用模型(ChatGPT)對學生高階思維與學習成效之差異;實驗三以實驗設計建構基於領域模型之專案開發代理人,檢驗其在長週期課程中對高階思維、專案任務與時程管理及整體學習表現之影響。
在模型建置與基礎成效方面,研究結果顯示,透過PDCA循環進行知識蒸餾與微調之領域模型,在專家評估(實驗一)的正確性與有用性構面上顯著優於通用模型,且在流暢性上維持穩定水準。在真實教學場域的驗證(實驗二)發現,使用領域模型之學生在批判性思考與自我調節學習的表現上,顯著優於使用通用模型之學生,且其問題解決能力在課程後期呈現顯著差異。在進階的人機協作與代理人效益(實驗三)方面,實驗結果指出,具備規劃與追蹤功能的專案開發代理人,能顯著提升學生的專案時程管理能力,顯示主動式的節點提醒與進度追蹤能有效輔助長週期任務的執行,但是代理人組在專案成績與創造力表現上增益不顯著,該結果顯示,深層認知的增益或是原創性的突破仍須人機協作機制的互補。
在理論貢獻上,本研究以階梯式設計累積雙軌證據:先以量化指標與專家評估檢核模型品質,再以準實驗驗證課室成效,並以對照設計檢測代理人對歷程治理與專案表現之影響,形成可在真實高教場域再現的研究程序。基於三階段證據鏈,本研究提出三項貢獻:第一,建構具實證效能的教育領域知識模型架構,將異質資料治理、PDCA迭代式知識蒸餾與多維度對齊評估整合為可驗證循環。第二,闡明「領域知識對齊×任務適配」所形成之知識邊界機制,指出其能降低發散雜訊並釋放學生認知資源,促進高階思維發展。第三,建構以五大核心特性為基礎、並以六項人機協作設計準則為核心的教育代理人理論框架,界定代理人由流程治理走向品質治理的必要條件,補足通用代理架構在教育治理面向之不足。
在實務應用上,本研究連結技術開發與教學現場,提出可複製的系統與全歷程實施指引。針對系統開發者,本研究提供App Inventor課程語域之資料轉化、PDCA微調與混合雲部署流程,以對話介面串接後端服務,提升在教室情境的可用性與可部署性。針對Agent設計者,實驗結果顯示僅具規劃與追蹤功能的代理人可強化時程管理,但難以穩定帶動專案品質與創造力,據此建議將Rubric與品質檢核邏輯內嵌為代理人內在機制,以補足流程效率與品質達標之落差。針對AI教育者,本研究彙整並優化「課前把關-課中鷹架-專案治理-評量稽核」四階段指引:封閉式評量維持界線以掌握個體基礎,開放式專案允許AI介入但須提交可回溯證據鏈,以支援稽核與反思。研究限制仍包括單一課程與單學期場域、非隨機分派與資源限制,且代理人尚難直接評測作品品質,後續需跨課程與跨場域驗證其可移植性。
With the rapid advancement of large language model (LLM) technologies, teaching and learning in higher education are being reshaped: instructors can leverage LLMs for lesson preparation, feedback, and analysis, while students can engage in dialogic interactions to obtain support for problem solving, debugging, and creation. However, recent cross-national surveys indicate a persistent governance gap: although most students use generative AI frequently, many higher education institutions still lack comprehensive guidelines and ethical regulations, leaving students insufficiently equipped with AI literacy and academic integrity awareness. Meanwhile, AI has been evolving from standalone question-answering tools toward agents embedded in workflows. Within this context, a central challenge is to transform AI into a teachable, practiceable, and assessable core competency, and to build systematic scaffolding that can support long-cycle project-based learning. Addressing this challenge is critical for mitigating the governance gap, improving instructional quality, and ensuring that technological adoption aligns with educational objectives.
Despite the considerable potential of AI in education, implementation in professional courses continues to face three key gaps: (1) general-purpose models lack curriculum-aligned domain knowledge and are therefore prone to hallucinations or superficial responses; (2) there is a shortage of proactive agents that integrate self-regulated learning (SRL) and project-based learning (PBL) to support long-cycle tasks; and (3) an integrative evaluation framework that simultaneously captures higher-order thinking, collaborative processes, and learning outcomes remains underdeveloped. Accordingly, this study was conducted in an information-technology capstone course in higher education with informed consent obtained under ethical procedures. We propose a replicable framework—“PDCA-based knowledge distillation, proactive agency, and integrative evaluation”—and validate it through three complementary experiments: Experiment 1 applies a PDCA (Plan–Do–Check–Act)-centered distillation and domain fine-tuning process using an open-source model and evaluates model quality via expert review across fluency, coherence, correctness, and usefulness; Experiment 2 adopts a quasi-experimental design to compare a domain model with a general-purpose model (ChatGPT) in terms of students’ higher-order thinking and learning achievement; and Experiment 3 employs an experimental design to build a project-development agent grounded in the domain model and to examine its effects on higher-order thinking, project task and schedule management, and overall learning performance in a long-cycle course.
Regarding model construction and baseline effectiveness, results show that the domain model produced through PDCA-driven knowledge distillation and fine-tuning significantly outperforms the general-purpose model on correctness and usefulness in expert evaluations (Experiment 1), while maintaining stable fluency. In authentic classroom validation (Experiment 2), students using the domain model perform significantly better than those using the general-purpose model in critical thinking and SRL, and their problem-solving ability exhibits significant differences in the later stages of the course. For advanced human-AI collaboration and agent effectiveness (Experiment 3), findings indicate that the project-development agent with planning and tracking functions significantly improves students’ schedule management capability, suggesting that proactive milestone prompting and progress monitoring can effectively support the execution of long-cycle tasks. However, the agent group shows no significant gains in project grades or creativity, implying that improvements in deep cognition and breakthroughs in originality still require complementary human-AI collaboration mechanisms.
In terms of theoretical contributions, this study accumulates dual-track evidence through a staged design: model quality is first verified through quantitative indicators and expert evaluation, classroom effects are then validated via a quasi-experiment, and agent impacts on process governance and project performance are finally examined through a controlled comparison, yielding a replicable research procedure in authentic higher education settings. Based on this three-stage evidence chain, the study makes three contributions. First, it constructs an empirically grounded educational domain-knowledge model architecture that integrates heterogeneous data governance, PDCA-based iterative knowledge distillation, and multi-dimensional alignment evaluation into a verifiable cycle. Second, it explicates a knowledge-boundary mechanism formed by “domain alignment × task fit,” arguing that it reduces divergent noise and releases learners’ cognitive resources to facilitate higher-order thinking development. Third, it advances a theoretical framework for educational agents grounded in five core characteristics and operationalized through six human–AI collaboration design principles, specifying the necessary conditions for agents to move from process governance to quality governance and addressing limitations of general-purpose agent architectures in educational governance.
In practical terms, this study bridges technical development and instructional practice by delivering replicable systems and end-to-end implementation guidance. For system developers, we provide procedures for data transformation in the App Inventor course discourse, PDCA-based fine-tuning, and hybrid-cloud deployment, connecting a conversational interface to backend services to improve classroom usability and deployability. For agent designers, results indicate that agents equipped only with planning and tracking functions can strengthen schedule management but may not reliably improve project quality or creativity; accordingly, we recommend embedding rubrics and quality-checking logic as internal mechanisms to close the gap between process efficiency and quality attainment. For AI educators, we refine a four-phase guideline—pre-class validation, in-class scaffolding, project governance, and assessment auditing—maintaining clear boundaries for closed-book assessments to capture individual foundations while allowing AI involvement in open-ended projects contingent on submission of a traceable evidence chain to support auditing and reflection. Limitations include the single-course, single-semester setting, potential confounds due to non-random assignment and resource constraints, and the agent’s current inability to directly evaluate artifact quality; future work should conduct cross-course and cross-context validation to assess transferability.
謝辭 I
摘要 III
Abstract V
目錄 IX
表目錄 XIV
圖目錄 XVII
第一章 緒論(Introduction) 1
第一節 研究背景動機(Research Background and Motivations) 1
第二節 研究目的 (Research Objectives) 6
第三節 論文架構(Research Process and Thesis Structure) 9
第二章 文獻探討 (Literature Review) 11
第一節 大型語言模型(Large Language Model) 11
一、 簡介 11
二、 模型預訓練與微調 12
三、 模型評估 15
第二節 知識蒸餾(Knowledge Distillation) 18
第三節 自我調節學習(Self-Regulated Learning) 20
第四節 大型語言模型在教育領域的應用(Applications of Large Language Models in the Educational Domain) 22
第五節 提示工程(Prompt Engineering) 30
第六節 教育領域的人工智慧代理(AI Agent in the Educational Domain) 34
第三章 研究設計與教學方法(Research Design and Teaching Methods) 42
第一節 研究流程(Research Process) 43
第二節 教學工具說明(Description of Teaching Tools) 45
第三節 人工智慧工具之設計與建置(Design and Development of Artificial Intelligence Tools) 48
一、 基於PDCA之知識蒸餾的領域模型(Domain LLM) 48
二、 專案開發代理人(Project-Development Agent) 57
第四節 教案與課程設計(Lesson Plans and Curriculum Design) 65
第四章 實驗設計(Experimental Design) 71
第一節 實驗流程(Experimental Procedure) 71
第二節 生成式AI導入課程專案之前導實驗(Pilot Study) 73
第三節 實驗一:系統評估-專家評估(Experiment I: System Evaluation - Expert Assessment) 76
第四節 實驗二:領域模型之教學設計評估(Experiment II: Evaluation of Domain LLM in Instructional Design) 81
一、 實驗設計與對象 83
二、 參考模型 84
三、 評量工具與問卷內容 89
四、 實驗流程 91
第五節 實驗三:專家開發代理人之教學設計評估(Experiment III: Evaluation of Project-Development Agent in Instructional Design) 98
一、 實驗設計與對象 100
二、 參考模型 101
三、 評量工具與問卷內容 106
四、 實驗流程 107
第五章 實驗結果(Experimental Results) 112
第一節 實驗一結果:系統評估-專家評估(Results of Experiment I: System Evaluation - Expert Assessment) 112
第二節 實驗二結果:領域模型之教學設計評估(Results of Experiment II: Evaluation of Domain LLM in Instructional Design) 115
一、 前測分析 116
二、 高階思維能力分析(H1、H2、H3) 116
三、 學習成效與交互作用分析(H4、H5、H6) 118
四、 實驗二之假說驗證結果彙總 122
第三節 實驗三結果:專家開發代理人之教學設計評估(Results of Experiment III: Evaluation of Project-Development Agent in Instructional Design) 124
一、 前測分析 124
二、 高階思維能力分析(H1、H2、H3) 125
三、 專案管理能力分析(H4、H5) 128
四、 學習成效與交互作用分析(H6、H7、H8) 128
五、 實驗三之假說驗證結果彙總 132
六、 其他分析結果 135
第六章 討論(Discussion) 138
第一節 實驗一(Experiment I) 138
第二節 實驗二(Experiment II) 139
模型運作機制差異探討 140
高階思維能力之探討 141
學習成效之探討 144
教學情境之適配性探討 146
第三節 實驗三(Experiment III) 147
模型運作機制差異探討 148
高階思維能力之探討 150
專案管理能力之探討 155
學習成效之探討 158
教學情境之適配性探討 165
第四節 未來執行的建議(Recommendations for Future Implementation) 167
系統面開發建議(人機協作設計原則) 170
老師面人機協作(課堂運用建議與注意事項) 171
學生面人機協作(使用準則與合作策略) 172
第七章 結論(Conclusion) 174
第一節 理論貢獻(Theoretical Contributions) 175
第二節 實務貢獻(Practical Contributions) 179
第三節 研究限制(Research Limitations) 180
第四節 未來研究方向(Future Research Directions) 182
參考文獻(References) 183
附錄(Appendix) 202
A. 資料處理之提示 202
A1角色扮演 202
A2問答生成1 203
A3問答生成2 203
B. 知識蒸餾之提示 205
B1計畫 205
B2執行 205
B3檢查 205
B4行動 206
C. 教學增能提示 208
C1概念認知 208
C2常規任務 209
C3專案建構 210
D. AI使用自評問卷題項 211
E. AI互動學習工作單 213
F. 人工智慧工具情境說明 216
F1、領域模型情境說明 216
F2、專案開發代理人情境說明 218
G. 使用情境之具體實例 222
G1、領域模型使用情境案例說明-課堂-增能提示與工作單 224
G2、代理人使用情境案例說明-課堂-增能提示與工作單 226
G3、代理人使用情境案例說明-專案-團隊啟動分工與時程規劃(第 7-9 週) 228
G4、代理人使用情境案例說明-專案-核心設計與實作 230
H5、代理人使用情境案例說明-專案-回饋整合與報告 232
1. 蔡育融(2013)。應用 App Inventor 於高中程式設計教學之個案研究(碩士論文)。國立臺灣師範大學。臺灣博碩士論文知識加值系統。https://hdl.handle.net/11296/nhqjf7
2. Akçapınar, G., Altun, A., & Aşkar, P. (2019). Using learning analytics to develop early-warning system for at-risk students. International Journal of Educational Technology in Higher Education, 16(1), 1-20. https://doi.org/10.1186/s41239-019-0172-z
3. Alkhatlan, A., & Kalita, J. (2018). Intelligent tutoring systems: A comprehensive historical survey with recent developments. arXiv preprint arXiv:1812.09628. https://doi.org/10.48550/arXiv.1812.09628
4. Alsafari, B., Atwell, E., Walker, A., & Callaghan, M. (2024). Towards effective teaching assistants: From intent-based chatbots to LLM-powered teaching assistants. Natural Language Processing Journal, 100101. https://doi.org/10.1016/j.nlp.2024.100101
5. Al-Sharafi, M. A., Al-Emran, M., Iranmanesh, M., Al-Qaysi, N., Iahad, N. A., & Arpaci, I. (2023). Understanding the impact of knowledge management factors on the sustainable use of AI-based chatbots for educational purposes using a hybrid SEM-ANN approach. Interactive Learning Environments, 31(10), 7491-7510. https://doi.org/10.1080/10494820.2022.2075014
6. Anisuzzaman, D. M., Malins, J. G., Friedman, P. A., & Attia, Z. I. (2025). Fine-tuning large language models for specialized use cases. Mayo Clinic Proceedings: Digital Health, 3(1), 100184. https://doi.org/10.1016/j.mcpdig.2024.11.005
7. Apoki, U. C., Hussein, A. M. A., Al-Chalabi, H. K. M., Badica, C., & Mocanu, M. L. (2022). The role of pedagogical agents in personalised adaptive learning: A review. Sustainability, 14(11), 6442. https://doi.org/10.3390/su14116442
8. Athaluri, S. A., Manthena, S. V., Kesapragada, V. K. M., Yarlagadda, V., Dave, T., & Duddumpudi, R. T. S. (2023). Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Cureus, 15(4):e37432. https://doi.org/10.7759/cureus.37432
9. Barcaui, A., & Monat, A. (2023). Who is better in project planning? Generative artificial intelligence or project managers?. Project Leadership and Society, 4, 100101. https://doi.org/10.1016/j.plas.2023.100101
10. Barros, C., Gkatzia, D., & Lloret, E. (2017, December). Improving the naturalness and expressivity of language generation for Spanish. In Proceedings of the 10th International Conference on Natural Language Generation, pp.41-50, Santiago de Compostela, Spain. Association for Computational Linguistics.. https://doi.org/10.18653/v1/W17-3505
11. Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, Ö., & Mariman, R. (2024). Generative ai can harm learning. The Wharton School Research Paper. https://dx.doi.org/10.2139/ssrn.4895486
12. Belz, A., Mille, S., & Howcroft, D. M. (2020). Disentangling the properties of human evaluation methods: A classification system to support comparability, meta-evaluation and reproducibility testing. Association for Computational Linguistics (ACL). https://aclanthology.org/2020.inlg-1.24
13. Booth, G. J., Hauert, T., Mynes, M., Hodgson, J., Slama, E., Goldman, A., & Moore, J. (2024). Fine-tuning large language models to enhance programmatic assessment in graduate medical education. The Journal of Education in Perioperative Medicine: JEPM, 26(3), E729. https://doi.org/10.46374/VolXXVI_Issue3_Moore
14. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
15. Brusilovsky, P., & Peylo, C. (2003). Adaptive and intelligent web-based educational systems. International journal of artificial intelligence in education, 13(2-4), 159-172. https://doi.org/10.3233/IRG-2003-13(2-4)02
16. Buçinca, Z., Malaya, M. B., & Gajos, K. Z. (2021). To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-computer Interaction, 5(CSCW1), 1-21. https://doi.org/10.1145/3449287
17. Cai, Q., Lin, Y., & Yu, Z. (2023). Factors influencing learner attitudes towards ChatGPT-assisted language learning in higher education. International Journal of Human-Computer Interaction, 1-15. https://doi.org/10.1080/10447318.2023.2261725
18. Cain, W. (2024). Prompting Change: Exploring Prompt Engineering in Large Language Model AI and Its Potential to Transform Education. TechTrends, 68(1), 47-57. https://doi.org/10.1007/s11528-023-00896-0
19. Carpenter, D., Min, W., Lee, S., Ozogul, G., Zheng, X., & Lester, J. (2024, June). Assessing Student Explanations with Large Language Models Using Fine-Tuning and Few-Shot Learning. In Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024) (pp. 403-413). https://aclanthology.org/2024.bea-1.33
20. Cartwright, C., & Yinger, M. (2007, May). Project management competency development framework-. In PMI Global Congress, Budapest-Hungary.
21. Castleman, B., & Turkcan, M. K. (2024). Examining the Influence of Varied Levels of Domain Knowledge Base Inclusion in GPT-based Intelligent Tutors. In Proceedings of the 17th International Conference on Educational Data Mining (pp. 649-657). https://educationaldatamining.org/edm2024/proceedings/2024.EDM-posters.68/
22. Celikyilmaz, A., Clark, E., & Gao, J. (2020). Evaluation of text generation: A survey. arXiv preprint arXiv:2006.14799. https://doi.org/10.48550/arXiv.2006.14799
23. Chang, J. L., Hung, H. T., & Yang, Y. T. C. (2024)b. Effects of an annotation‐supported Socratic questioning approach on students' argumentative writing performance and critical thinking skills in flipped language classrooms. Journal of Computer Assisted Learning, 40(1), 37-48. https://doi.org/10.1111/jcal.12856
24. Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., ... & Xie, X. (2024)a. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3), 1-45. https://doi.org/10.1145/3641289
25. Chaudhry, I. S., Sarwary, S. A. M., El Refae, G. A., & Chabchoub, H. (2023). Time to revisit existing student’s performance evaluation approach in higher education sector in a new era of ChatGPT-a case study. Cogent Education, 10(1), 2210461. https://doi.org/10.1080/2331186X.2023.2210461
26. Chiang, W.-L., Zheng, L., Sheng, Y., Angelopoulos, A. N., Li, T., Li, D., Zhu, B., Zhang, H., Jordan, M., Gonzalez, J. E., & Stoica, I. (2024). Chatbot Arena: An open platform for evaluating LLMs by human preference. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, & F. Berkenkamp (Eds.), Proceedings of the 41st International Conference on Machine Learning (PMLR), 235, pp. 8359-8388. https://proceedings.mlr.press/v235/chiang24b.html?utm_source=chatgpt.com
27. Chiu, C. F. (2020). Facilitating K-12 teachers in creating apps by visual programming and project-based learning. International Journal of Emerging Technologies in Learning (iJET), 15(1), 103-118. https://doi.org/10.3991/ijet.v15i01.11013
28. Chocarro, R., Cortinas, M., & Marcos-Matás, G. (2023). Teachers’ attitudes towards chatbots in education: a technology acceptance model approach considering the effect of social language, bot proactiveness, and users’ characteristics. Educational Studies, 49(2), 295-313. https://doi.org/10.1080/03055698.2020.1850426
29. Chu, Z., Wang, S., Xie, J., Zhu, T., Yan, Y., Ye, J., ... & Wen, Q. (2025). Llm agents for education: Advances and applications. arXiv preprint arXiv:2503.11733. https://doi.org/10.48550/arXiv.2503.11733
30. Córdova-Esparza, D. M. (2025). Ai-powered educational agents: Opportunities, innovations, and ethical challenges. Information, 16(6), 469. https://doi.org/10.3390/info16060469
31. Cui, Y., Firdousi, S. F., Afzal, A., Awais, M., & Akram, Z. (2022). The influence of big data analytic capabilities building and education on business model innovation. Frontiers in Psychology, 13, 999944. https://doi.org/10.3389/fpsyg.2022.999944
32. Dahl, M., Magesh, V., Suzgun, M., & Ho, D. E. (2024). Hallucinating law: Legal mistakes with large language models are pervasive. Law, regulation, and policy. https://nacmnet.org/wp-content/uploads/Stanford-HAI-Dahl-et-al.-Hallucinating-Law-Legal-Mistakes-with-Large-Language-Models-are-Pervasive-2024JAN11-6pp.pdf
33. Dai, C. P., & Ke, F. (2022). Educational applications of artificial intelligence in simulation-based learning: A systematic mapping review. Computers and Education: Artificial Intelligence, 3, 100087. https://doi.org/10.1016/j.caeai.2022.100087
34. Davis, R., Eppler, M., Ayo-Ajibola, O., Loh-Doyle, J. C., Nabhani, J., Samplaski, M., ... & Cacciamani, G. E. (2023). Evaluating the effectiveness of artificial intelligence-powered large language models application in disseminating appropriate and readable health information in urology. The Journal of urology, 210(4), 688-694. https://doi.org/10.1097/JU.0000000000003615
35. Dawson, J. Q., Allen, M., Campbell, A., & Valair, A. (2018, February). Designing an introductory programming course to improve non-majors' experiences. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (pp. 26-31). https://doi.org/10.1145/3159450.3159548
36. Dawson, P. (2017). Assessment rubrics: towards clearer and more replicable design, research and practice. Assessment & Evaluation in Higher Education, 42(3), 347-360. https://doi.org/10.1080/02602938.2015.1111294
37. de Araujo, A., Papadopoulos, P. M., McKenney, S., & de Jong, T. (2025). Investigating the impact of a collaborative conversational agent on dialogue productivity and knowledge acquisition. International Journal of Artificial Intelligence in Education, 1-27. https://doi.org/10.1007/s40593-025-00469-7
38. Demartini, C. G., Sciascia, L., Bosso, A., & Manuri, F. (2024). Artificial Intelligence Bringing Improvements to Adaptive Learning in Education: A Case Study. Sustainability, 16(3), 1347. https://doi.org/10.3390/su16031347
39. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
40. Dieker, L., Hines, R., Wilkins, I., Hughes, C., Scott, K. H., Smith, S., ... & Shah, S. (2024). Using an Artificial Intelligence (AI) Agent to Support Teacher Instruction and Student Learning. Journal of Special Education Preparation, 4(2), 78-88. https://doi.org/10.33043/d8xb94q7
41. Diziol, D., Walker, E., Rummel, N., & Koedinger, K. R. (2010). Using intelligent tutor technology to implement adaptive support for student collaboration. Educational Psychology Review, 22(1), 89-102. https://doi.org/10.1007/s10648-009-9116-9
42. Doo, M. Y., & Bonk, C. J. (2020). The effects of self‐efficacy, self‐regulation and social presence on learning engagement in a large university class using flipped Learning. Journal of Computer Assisted Learning, 36(6), 997-1010. https://doi.org/10.1111/jcal.12455
43. Doo, M. Y., Bonk, C., & Heo, H. (2020). A meta-analysis of scaffolding effects in online learning in higher education. International Review of Research in Open and Distributed Learning, 21(3), 60-80. https://doi.org/10.19173/irrodl.v21i3.4638
44. Doshi, A. R., & Hauser, O. P. (2024). Generative AI enhances individual creativity but reduces the collective diversity of novel content. Science advances, 10(28), eadn5290. https://doi.org/10.1126/sciadv.adn5290
45. Du, X., Du, M., Zhou, Z., & Bai, Y. (2025). Facilitator or hindrance? The impact of AI on university students' higher-order thinking skills in complex problem solving. International Journal of Educational Technology in Higher Education, 22(1), 39, 1-26. https://doi.org/10.1186/s41239-025-00534-0
46. Duffy, T. M., & Kirkley, J. R. (Eds.). (2003). Learner-centered theory and practice in distance education: Cases from higher education. Routledge. https://doi.org/10.4324/9781410609489
47. European Union (2024). REGULATION (EU) 2024/2847 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL. Regulation (eu). http://hctinsight.com/webzine/webzine/202501/file/ce/ce5.pdf
48. Filgueiras, F. (2024). Artificial intelligence and education governance. Education, Citizenship and Social Justice, 19(3), 349-361. https://doi.org/10.1177/17461979231160674
49. Frank, L., Herth, F., Stuwe, P., Klaiber, M., Gerschner, F., & Theissler, A. (2024, May). Leveraging GenAI for an Intelligent Tutoring System for R: A Quantitative Evaluation of Large Language Models. In 2024 IEEE Global Engineering Education Conference (EDUCON) (pp. 1-9). IEEE. https://doi.org/10.1109/EDUCON60312.2024.10578933
50. Franklin, S., & Graesser, A. (1996, August). Is it an Agent, or just a Program?: A Taxonomy for Autonomous Agents. In International workshop on agent theories, architectures, and languages (pp. 21-35). Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/BFb0013570
51. Freeman, J. (2025). Student generative ai survey 2025. Higher Education Policy Institute: London, UK. https://www.hepi.ac.uk/wp-content/uploads/2025/02/HEPI-Kortext-Student-Generative-AI-Survey-2025.pdf
52. Gao, L., Lu, J., Shao, Z., Lin, Z., Yue, S., Ieong, C., ... & Chen, S. (2024)b. Fine-tuned large language model for visualization system: A study on self-regulated learning in education. IEEE Transactions on Visualization and Computer Graphics, 31(1), 514-524. https://doi.org/10.1109/TVCG.2024.3456145
53. Gao, M., Hu, X., Ruan, J., Pu, X., & Wan, X. (2024)a. Llm-based nlg evaluation: Current status and challenges. arXiv preprint arXiv:2402.01383. https://doi.org/10.48550/arXiv.2402.01383
54. Garg, A., & Rajendran, R. (2024, June). Analyzing the Role of Generative AI in Fostering Self-directed Learning Through Structured Prompt Engineering. In International Conference on Intelligent Tutoring Systems (pp. 232-243). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-63028-6_18
55. Giray, L. (2023). Prompt engineering with ChatGPT: a guide for academic writers. Annals of biomedical engineering, 51(12), 2629-2633. https://doi.org/10.1007/s10439-023-03272-4
56. Golchin, K., & Roudsari, A. (2011). Study of the effects of clinical decision support system's incorrect advice and clinical case difficulty on users' decision making accuracy. International Perspectives in Health Informatics, 164, 13-16. PMID: 21335681.
57. Gong, X., Yu, S., Xu, J., Qiao, A., & Han, H. (2024). The effect of PDCA cycle strategy on pupils’ tangible programming skills and reflective thinking. Education and Information Technologies, 29(5), 6383-6405. https://doi.org/10.1007/s10639-023-12037-4
58. Gopalan, M., Rosinger, K., & Ahn, J. B. (2020). Use of quasi-experimental research designs in education research: Growth, promise, and challenges. Review of Research in Education, 44(1), 218-243.
59. Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2021). Knowledge distillation: A survey. International Journal of Computer Vision, 129(6), 1789-1819. https://doi.org/10.1007/s11263-021-01453-z
60. Guilherme, A. (2019). AI and education: the importance of teacher and student relations. AI & society, 34(1), 47-54. https://doi.org/10.1007/s00146-017-0693-8
61. Guo, Y., & Lee, D. (2023). Leveraging chatgpt for enhancing critical thinking skills. Journal of Chemical Education, 100(12), 4876-4883. https://pubs.acs.org/doi/10.1021/acs.jchemed.3c00505
62. Hadi, M. U., Qureshi, R., Shah, A., Irfan, M., Zafar, A., Shaikh, M. B., ... & Mirjalili, S. (2023). Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Preprints, 1(3), 1-26 https://doi.org/10.36227/techrxiv.23589741.v4
63. Hair, J., Hollingsworth, C. L., Randolph, A. B., & Chong, A. Y. L. (2017). An updated and expanded assessment of PLS-SEM in information systems research. Industrial Management & Data Systems, 117(3), 442-458. https://doi.org/10.1108/IMDS-04-2016-0130
64. Hall, 1 edition, 1995. ISBN 978-0-13-103805-9.
65. Hall, C. C., Ariss, L., & Todorov, A. (2007). The illusion of knowledge: When more information reduces accuracy and increases confidence. Organizational Behavior and Human Decision Processes, 103(2), 277-290. https://doi.org/10.1016/j.obhdp.2007.01.003
66. Han, Z., Gao, C., Liu, J., & Zhang, S. Q. (2024). Parameter-efficient fine-tuning for large models: A comprehensive survey. arXiv preprint arXiv:2403.14608. https://doi.org/10.48550/arXiv.2403.14608
67. Heston, T. F., & Khun, C. (2023). Prompt engineering in medical education. International Medical Education, 2(3), 198-205. https://doi.org/10.3390/ime2030019
68. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. https://doi.org/10.48550/arXiv.1503.02531
69. Hooshyar, D., Pedaste, M., Saks, K., Leijen, Ä., Bardone, E., & Wang, M. (2020). Open learner models in supporting self-regulated learning in higher education: A systematic literature review. Computers & education, 154, 103878. https://doi.org/10.1016/j.compedu.2020.103878
70. Howcroft, D. M., Belz, A., Clinciu, M., Gkatzia, D., Hasan, S. A., Mahamood, S., ... & Rieser, V. (2020, December). Twenty years of confusion in human evaluation: NLG needs evaluation sheets and standardised definitions. In 13th International Conference on Natural Language Generation 2020 (pp. 169-182). Association for Computational Linguistics. https://aclanthology.org/2020.inlg-1.23
71. Huang, J., Gu, S. S., Hou, L., Wu, Y., Wang, X., Yu, H., & Han, J. (2022). Large language models can self-improve. arXiv preprint arXiv:2210.11610. https://arxiv.org/abs/2210.11610
72. Huang, K., Zhang, J., Bao, X., Wang, X., & Liu, Y. (2025). Comprehensive Fine-Tuning Large Language Models of Code for Automated Program Repair. IEEE Transactions on Software Engineering, 51(4), 904-928. https://doi.org/10.1109/TSE.2025.3532759
73. Hwang, G. J., Lai, C. L., Liang, J. C., Chu, H. C., & Tsai, C. C. (2018). A long-term experiment to investigate the relationships between high school students’ perceptions of mobile learning and peer interaction and higher-order thinking tendencies. Educational Technology Research and Development, 66, 75-93. https://doi.org/10.1007/s11423-017-9540-3
74. Jang, J., Kim, S., Ye, S., Kim, D., Logeswaran, L., Lee, M., ... & Seo, M. (2023, July). Exploring the benefits of training expert language models over instruction tuning. In International Conference on Machine Learning (pp. 14702-14729). PMLR. Available: https://proceedings.mlr.press/v202/jang23a.html
75. Jing, Y., Wang, H., Chen, X., & Wang, C. (2024). What factors will affect the effectiveness of using ChatGPT to solve programming problems? A quasi-experimental study. Humanities and Social Sciences Communications, 11(1), 1-12. https://doi.org/10.1057/s41599-024-02751-w
76. Jöreskog, K.G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36(4), 409-426. https://doi.org/10.1007/BF02291366
77. Kapoor, S., Stroebl, B., Siegel, Z. S., Nadgir, N., & Narayanan, A. (2024). AI agents that matter. arXiv preprint arXiv:2407.01502. https://doi.org/10.48550/arXiv.2407.01502
78. Karamthulla, M. J., Muthusubramanian, M., Tadimarri, A., & Tillu, R. (2024). Navigating the Future: AI-Driven Project Management in the Digital Era. International Journal for Multidisciplinary Research, 6(2), 1-11. https://doi.org/10.17613/7zxw-hw37
79. Kim, D. Y., Ravi, P., Williams, R., & Yoo, D. (2024, June). App Planner: Utilizing Generative AI in K-12 Mobile App Development Education. In Proceedings of the 23rd Annual ACM Interaction Design and Children Conference (pp. 770-775). https://doi.org/10.1145/3628516.3659392
80. Kirova, V. D., Ku, C. S., Laracy, J. R., & Marlowe, T. J. (2024, March). Software engineering education must adapt and evolve for an llm environment. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (pp. 666-672). https://doi.org/10.1145/3626252.3630927
81. Knoth, N., Tolzin, A., Janson, A., & Leimeister, J. M. (2024). AI literacy and its implications for prompt engineering strategies. Computers and Education: Artificial Intelligence, 100225. https://doi.org/10.1016/j.caeai.2024.100225
82. Köhler, C., Hartig, J., & Naumann, A. (2021). Detecting instruction effects-deciding between covariance analytical and change-score approach. Educational psychology review, 33(3), 1191-1211. https://doi.org/10.1007/s10648-020-09590-6
83. Krathwohl, D. R. (2002). A revision of Bloom's taxonomy: An overview. Theory into practice, 41(4), 212-218. https://www.jstor.org/stable/1477405
84. Kwak, Y., & Pardos, Z. A. (2024). Bridging large language model disparities: Skill tagging of multilingual educational content. British Journal of Educational Technology, 55(5), 2039-2057. https://doi.org/10.1111/bjet.13465
85. Lamb, R., & Firestone, J. (2022). The moderating role of creativity and the effect of virtual reality on stress and cognitive demand during preservice teacher learning. Computers & Education: X Reality, 1, 100003. https://doi.org/10.1016/j.cexr.2022.100003
86. Lamsiyah, S., El Mahdaouy, A., Nourbakhsh, A., & Schommer, C. (2024, July). Fine-tuning a large language model with reinforcement learning for educational question generation. In International Conference on Artificial Intelligence in Education (pp. 424-438). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-64302-6_30
87. Laun, M., & Wolff, F. (2025). Chatbots in education: hype or help? A meta-analysis. Learning and Individual Differences, 119, 102646. https://doi.org/10.1016/j.lindif.2025.102646
88. Lee, H. Y., Chen, P. H., Wang, W. S., Huang, Y. M., & Wu, T. T. (2024). Empowering ChatGPT with guidance mechanism in blended learning: effect of self-regulated learning, higher-order thinking skills, and knowledge construction. International Journal of Educational Technology in Higher Education, 21(1), 16. https://doi.org/10.1186/s41239-024-00447-4
89. Lee, U., Jung, H., Jeon, Y., Sohn, Y., Hwang, W., Moon, J., & Kim, H. (2023). Few-shot is enough: exploring ChatGPT prompt engineering method for automatic question generation in english education. Education and Information Technologies, 29, 11483-11515. https://doi.org/10.1007/s10639-023-12249-8
90. Létourneau, A., Deslandes Martineau, M., Charland, P., Karran, J. A., Boasen, J., & Léger, P. M. (2025). A systematic review of AI-driven intelligent tutoring systems (ITS) in K-12 education. npj Science of Learning, 10(1), 29, 1-13. https://doi.org/10.1038/s41539-025-00320-7
91. Li, G., Zhi, C., Chen, J., Han, J., & Deng, S. (2024, October)b. Exploring parameter-efficient fine-tuning of large language model on automated program repair. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (pp. 719-731). https://doi.org/10.1145/3691620.3695066
92. Li, J., Sangalay, A., Cheng, C., Tian, Y., & Yang, J. (2024, April)a. Fine tuning large language model for secure code generation. In Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering (pp. 86-90). https://doi.org/10.1145/3650105.3652299
93. Li, S. (2024). Using Generative AI for Teaching Game Development Using Unity (Doctoral dissertation, ResearchSpace@ Auckland). https://hdl.handle.net/2292/69793
94. Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81). https://aclanthology.org/W04-1013.pdf
95. Lin, Y. T., & Chen, Y. N. (2023). Taiwan llm: Bridging the linguistic divide with a culturally aligned language model. arXiv preprint arXiv:2311.17487. https://doi.org/10.48550/arXiv.2311.17487
96. Ling, C., Zhao, X., Lu, J., Deng, C., Zheng, C., Wang, J., ... & Zhao, L. (2023). Domain specialization as the key to make large language models disruptive: A comprehensive survey. arXiv preprint arXiv:2305.18703. https://doi.org/10.48550/arXiv.2305.18703
97. Liu, G., & Ma, C. (2024). Measuring EFL learners’ use of ChatGPT in informal digital learning of English based on the technology acceptance model. Innovation in Language Learning and Teaching, 18(2), 125-138. https://doi.org/10.1080/17501229.2023.2240316
98. Liu, Y., Tao, S., Zhao, X., Zhu, M., Ma, W., Zhu, J., ... & Jiang, Y. (2024, May). CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) (pp. 5184-5197). IEEE. https://doi.org/10.1109/ICDE60146.2024.00390
99. Liu, Z., Yu, P., Liu, J., Pi, Z., & Cui, W. (2023). How do students' self‐regulation skills affect learning satisfaction and continuous intention within desktop‐based virtual reality? A structural equation modelling approach. British Journal of Educational Technology, 54(3), 667-685. https://doi.org/10.1111/bjet.13278
100. Loksa, D., Margulieux, L., Becker, B. A., Craig, M., Denny, P., Pettit, R., & Prather, J. (2022). Metacognition and self-regulation in programming education: Theories and exemplars of use. ACM Transactions on Computing Education (TOCE), 22(4), 1-31. https://doi.org/10.1145/3487050
101. Lu, R. S., Lin, C. C., & Tsao, H. Y. (2024). Empowering large language models to leverage domain-specific knowledge in e-learning. Applied Sciences, 14(12), 5264. https://doi.org/10.3390/app14125264
102. Lu, W., Luu, R. K., & Buehler, M. J. (2025). Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities. npj Computational Materials, 11(1), 84. https://doi.org/10.1038/s41524-025-01564-y
103. Lyu, B., Li, C., Li, H., Oh, H., Song, Y., Zhu, W., & Xing, W. (2025). The role of teachable agents’ personality traits on student-AI interactions and math learning. Computers & Education, 234, 105314. https://doi.org/10.1016/j.compedu.2025.105314
104. Masrurah, E. (2025). The Effect of Problem Oriented Project Based Learning (POPBL) Model Assisted by Artificial Intelligence (AI) on Creative Thinking Skills and Collaboration Skills of MA Students. BIOEDUKASI: Jurnal Biologi dan Pembelajarannya, 23(2), 143-155. https://doi.org/10.19184/bioedu.v23i2.53695
105. Merelo, J. J., Castillo, P. A., Mora, A. M., Barranco, F., Abbas, N., Guillén, A., & Tsivitanidou, O. (2024). Chatbots and messaging platforms in the classroom: an analysis from the teacher’s perspective. Education and Information Technologies, 29(2), 1903-1938. https://doi.org/10.1007/s10639-023-11703-x
106. MIT App Inventor. (2025). About MIT App Inventor. Retrieved from https://appinventor.mit.edu
107. Molina, I. V., Montalvo, A., Ochoa, B., Denny, P., & Porter, L. (2024). Leveraging llm tutoring systems for non-native english speakers in introductory cs courses. arXiv preprint arXiv:2411.02725. https://doi.org/10.48550/arXiv.2411.02725
108. Naamati-Schneider, L. (2024). Enhancing AI competence in health management: students’ experiences with ChatGPT as a learning Tool. BMC Medical Education, 24(1), 598. https://doi.org/10.1186/s12909-024-05595-9
109. Ng, D. T. K., Tan, C. W., & Leung, J. K. L. (2024). Empowering student self‐regulated learning and science education through ChatGPT: A pioneering pilot study. British Journal of Educational Technology, 55, 4, 1281-1289. https://doi.org/10.1111/bjet.13454
110. Nouira, A., Cheniti-Belcadhi, L., & Braham, R. (2018). An enhanced xapi data model supporting assessment analytics. Procedia Computer Science, 126, 566-575. https://doi.org/10.1016/j.procs.2018.07.291
111. Ouyang, F., & Jiao, P. (2021). Artificial intelligence in education: The three paradigms. Computers and Education: Artificial Intelligence, 2, 100020. https://doi.org/10.1016/j.caeai.2021.100020
112. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35, 27730-27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
113. Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002, July). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311-318). https://aclanthology.org/P02-1040.pdf
114. Papke-Shields, K. E., Beise, C., & Quan, J. (2010). Do project managers practice what they preach, and does it matter to project success?. International journal of project management, 28(7), 650-662. https://doi.org/10.1016/j.ijproman.2009.11.002
115. Paramasivam, S., Saad, N. H., Han, F. P., Thing, G. T., Sharmilla, Z., & Krishnan, T. N. H. (2023, June). Applying the PDCA continuous improvement cycle on STEM education among secondary students: An experimental study. In AIP Conference Proceedings (Vol. 2571, No. 1), pp. 1-11. AIP Publishing. https://doi.org/10.1063/5.0117511
116. Park, J. S., O'Brien, J., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023, October). Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology (pp. 1-22). https://doi.org/10.1145/3586183.3606763
117. Peng, B., Li, C., He, P., Galley, M., & Gao, J. (2023). Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277. https://doi.org/10.48550/arXiv.2304.03277
118. Pintrich, P. R. (2000). The role of goal orientation in self-regulated learning. In Handbook of self-regulation (pp. 451-502). Academic Press. https://doi.org/10.1016/B978-012109890-2/50043-3
119. Pintrich, P. R., & De Groot, E. V. (1990). Motivational and self-regulated learning components of classroom academic performance. Journal of educational psychology, 82(1), 33-40. https://doi.org/10.1037/0022-0663.82.1.33
120. Porter, Z., Calinescu, R., Lim, E., Hodge, V., Ryan, P., Burton, S., ... & Zou, J. (2025). INSYTE: a classification framework for traditional to agentic AI systems. ACM Transactions on Autonomous and Adaptive Systems, 20(3), 1-39. https://doi.org/10.1145/3760424
121. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. https://www.mikecaptain.com/resources/pdf/GPT-1.pdf
122. Ray, P. P. (2023). ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems, 3, 121-154. https://doi.org/10.1016/j.iotcps.2023.04.003
123. Rezaei, D., & Mohseni, F. (2024). MIT App Inventor: A Tool for Enhancing Technological Pedagogical Content Knowledge. Interdisciplinary Journal of Virtual Learning in Medical Sciences, 15(1), 107-115. https://doi.org/10.30476/IJVLMS.2024.101903.1295
124. Sabado, W. B. (2024). Education 4.0: Using Web-based Massachusetts Institute of Technology (MIT) App Inventor 2 in Android Application Development. International Journal of Computing Sciences Research, 8, 2766-2780. https://doi.org/10.25147/ijcsr.2017.001.1.188
125. Segbenya, M., Senyametor, F., Aheto, S. P. K., Agormedah, E. K., Nkrumah, K., & Kaedebi-Donkor, R. (2024). Modelling the influence of antecedents of artificial intelligence on academic productivity in higher education: a mixed method approach. Cogent Education, 11(1), 2387943. https://doi.org/10.1080/2331186X.2024.2387943
126. Shankar, S., Zamfirescu-Pereira, J. D., Hartmann, B., Parameswaran, A. G., & Arawjo, I. (2024). Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences. arXiv preprint arXiv:2404.12272. https://doi.org/10.48550/arXiv.2404.12272
127. Shao, J., Chen, Y., Wei, X., Li, X., & Li, Y. (2023). Effects of regulated learning scaffolding on regulation strategies and academic performance: A meta-analysis. Frontiers in Psychology, 14, 1110086. https://doi.org/10.3389/fpsyg.2023.1110086
128. Shi, Y., Yuan, T., Bell, R., & Wang, J. (2020). Investigating the relationship between creativity and entrepreneurial intention: the moderating role of creativity in the theory of planned behavior. Frontiers in Psychology, 11, 1209. https://doi.org/10.3389/fpsyg.2020.01209
129. Shimorina, A., & Belz, A. (2021). The human evaluation datasheet 1.0: A template for recording details of human evaluation experiments in nlp. arXiv preprint arXiv:2103.09710. https://doi.org/10.48550/arXiv.2103.09710
130. Shneiderman, B. (2020). Human-centered artificial intelligence: Reliable, safe & trustworthy. International Journal of Human-Computer Interaction, 36(6), 495-504. https://doi.org/10.1080/10447318.2020.1741118
131. Singh, A., Guan, Z., & Rieh, S. Y. (2025). Enhancing Critical Thinking in Generative AI Search with Metacognitive Prompts. arXiv preprint arXiv:2505.24014. https://doi.org/10.48550/arXiv.2505.24014
132. Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., ... & Natarajan, V. (2023). Large language models encode clinical knowledge. Nature, 620(7972), 172-180. https://doi.org/10.1038/s41586-023-06291-2
133. Sottana, A., Liang, B., Zou, K., & Yuan, Z. (2023). Evaluation metrics in the era of GPT-4: reliably evaluating large language models on sequence to sequence tasks. arXiv preprint arXiv:2310.13800. https://doi.org/10.48550/arXiv.2310.13800
134. Spreckelsen, C., & Jünger, J. (2017). Repeated testing improves achievement in a blended learning approach for risk competence training of medical students: results of a randomized controlled trial. BMC Medical Education, 17(1), 177. https://doi.org/10.1186/s12909-017-1016-y
135. Starke, S. D., & Baber, C. (2020). The effect of known decision support reliability on outcome quality and visual information foraging in joint decision making. Applied Ergonomics, 86, 103102. https://doi.org/10.1016/j.apergo.2020.103102
136. Strzelecki, A. (2023). To use or not to use ChatGPT in higher education? A study of students’ acceptance and use of technology. Interactive Learning Environments, 1-14. https://doi.org/10.1080/10494820.2023.2209881
137. Sun, Y., Sheng, D., Zhou, Z., & Wu, Y. (2024). AI hallucination: towards a comprehensive classification of distorted information in artificial intelligence-generated content. Humanities and Social Sciences Communications, 11(1), 1-14. https://doi.org/10.1057/s41599-024-03811-x
138. Susnjak, T., Hwang, P., Reyes, N., Barczak, A. L., McIntosh, T., & Ranathunga, S. (2025). Automating research synthesis with domain-specific large language model fine-tuning. ACM Transactions on Knowledge Discovery from Data, 19(3), 1-39. https://doi.org/10.1145/3715964
139. Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive load theory. Springer. https://doi.org/10.1007/978-1-4419-8126-4
140. Tassoti, S. (2024). Assessment of Students Use of Generative Artificial Intelligence: Prompting Strategies and Prompt Engineering in Chemistry Education. Journal of Chemical Education, 101, 6, 2475-2482. https://doi.org/10.1021/acs.jchemed.4c00212
141. Team, G., Anil, R., Borgeaud, S., Wu, Y., Alayrac, J. B., Yu, J., ... & Ahn, J. (2023). Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805. https://doi.org/10.48550/arXiv.2312.11805
142. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021, July). Training data-efficient image transformers & distillation through attention. In International conference on machine learning (pp. 10347-10357). PMLR.https://proceedings.mlr.press/v139/touvron21a
143. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., ... & Lample, G. (2023)a. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971. https://doi.org/10.48550/arXiv.2302.13971
144. Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., ... & Scialom, T. (2023)b. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
145. Tsai, M. J., Liang, J. C., Lee, S. W. Y., & Hsu, C. Y. (2022). Structural validation for the developmental model of computational thinking. Journal of Educational Computing Research, 60(1), 56-73. https://doi.org/10.1177/07356331211017794
146. Twabu, K. (2025). Enhancing the cognitive load theory and multimedia learning framework with AI insight. Discover Education, 4(1), 160. https://doi.org/10.1007/s44217-025-00592-6
147. Van de Cruys, T. (2020, July). Automatic poetry generation from prosaic text. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 2471-2480). https://doi.org/10.18653/v1/2020.acl-main.223
148. Van Der Lee, C., Gatt, A., Van Miltenburg, E., Wubben, S., & Krahmer, E. (2019). Best practices for the human evaluation of automatically generated text. In Proceedings of the 12th International Conference on Natural Language Generation (pp. 355-368). https://doi.org/10.18653/v1/W19-8643
149. Vandewaetere, M., Desmet, P., & Clarebout, G. (2011). The contribution of learner characteristics in the development of computer-based adaptive learning environments. Computers in Human Behavior, 27(1), 118-130. https://doi.org/10.1016/j.chb.2010.07.038
150. Velásquez-Henao, J. D., Franco-Cardona, C. J., & Cadavid-Higuita, L. (2023). Prompt Engineering: a methodology for optimizing interactions with AI-Language Models in the field of engineering. Dyna, 90(230), 9-17. https://doi.org/10.15446/dyna.v90n230.111700
151. Veletsianos, G., & Russell, G. S. (2013). Pedagogical agents. In Handbook of research on educational communications and technology (pp. 759-769). New York, NY: Springer New York. https://doi.org/10.1007/978-1-4614-3185-5_61
152. Walter, Y. (2024). Embracing the future of Artificial Intelligence in the classroom: the relevance of AI literacy, prompt engineering, and critical thinking in modern education. International Journal of Educational Technology in Higher Education, 21(1), 15. https://doi.org/10.1186/s41239-024-00448-3
153. Wang, H., Wang, C., Chen, Z., Liu, F., Bao, C., & Xu, X. (2025)b. Impact of AI-agent-supported collaborative learning on the learning outcomes of University programming courses. Education and Information Technologies, 1-33. https://doi.org/10.1007/s10639-025-13487-8
154. Wang, J., & Fan, W. (2025). The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis. Humanities and Social Sciences Communications, 12(1), 1-21. https://doi.org/10.1057/s41599-025-04787-y
155. Wang, M., Wang, M., Xu, X., Yang, L., Cai, D., & Yin, M. (2023)a. Unleashing ChatGPT's power: A case study on optimizing information retrieval in flipped classrooms via prompt engineering. IEEE Transactions on Learning Technologies, 17, 629-641. https://doi.org/10.1109/TLT.2023.3324714
156. Wang, S., Xu, T., Li, H., Zhang, C., Liang, J., Tang, J., ... & Wen, Q. (2024)a. Large language models for education: A survey and outlook. arXiv preprint arXiv:2403.18105. https://doi.org/10.48550/arXiv.2403.18105
157. Wang, T., Zhou, N., & Chen, Z. (2024)b. Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation. arXiv preprint arXiv:2407.05437. https://doi.org/10.48550/arXiv.2407.05437
158. Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., ... & Zhou, D. (2022)a. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171. https://arxiv.org/abs/2203.11171
159. Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N. A., Khashabi, D., & Hajishirzi, H. (2022)b. Self-instruct: Aligning language models with self-generated instructions. arXiv preprint arXiv:2212.10560. https://arxiv.org/abs/2212.10560
160. Wang, Y., Yu, Z., Wang, Z., Yu, Z., & Wang, J. (2025)a. Multi-Examiner: A Knowledge Graph-Driven System for Generating Comprehensive IT Questions with Higher-Order Thinking. Applied Sciences, 15(10), 5719. https://doi.org/10.3390/app15105719
161. Wang, Z. M., Peng, Z., Que, H., Liu, J., Zhou, W., Wu, Y., ... & Peng, J. (2023)b. Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv preprint arXiv:2310.00746. https://doi.org/10.48550/arXiv.2310.00746
162. Wei, L. (2023). Artificial intelligence in language instruction: impact on English learning achievement, L2 motivation, and self-regulated learning. Frontiers in Psychology, 14, 1261955. https://doi.org/10.3389/fpsyg.2023.1261955
163. Weng, Lilian. (Jun 2023). “LLM-powered Autonomous Agents”. Lil’Log. Retrieve from https://lilianweng.github.io/posts/2023-06-23-agent/.
164. Wilcox, R. R. (2011). Introduction to robust estimation and hypothesis testing (3rd ed.). Academic Press. https://doi.org/10.1016/C2010-0-67044-1
165. Wixom, B.H. and Watson, H.J. (2001). An empirical investigation of the factors affecting data warehousing success. MIS Quarterly, 25(1), 17-41. https://doi.org/10.2307/3250957
166. Xie, Y., Xia, W., & Qiu, Y. (2024, June). Construction and Implementation of Generative AI-Based Human-Machine Collaborative Classroom Teaching Model in Universities. In International Conference on Blended Learning (pp. 102-116). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-97-4442-8_8
167. Xu, J., Zhan, Z., & Liang, S. (2024, June). The Role of AI Agents in the Reconfiguration of Interdisciplinary Educational Design: An Examination of Applied Practices and Outcomes. In Proceedings of the 2024 9th International Conference on Distance Education and Learning (pp. 1-9). https://doi.org/10.1145/3675812.3675833
168. Yang, H., Yue, S., & He, Y. (2023). Auto-gpt for online decision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224. https://doi.org/10.48550/arXiv.2306.02224
169. Yang, Q., Chen, J., Sun, Y., Wang, Y., & Tan, T. (2025). Fine-tuning medical language models for enhanced long-contextual understanding and domain expertise. Quantitative Imaging in Medicine and Surgery, 15(6), 5450. https://doi.org/10.21037/qims-2024-2655
170. Yu, L., Chen, S., & Recker, M. (2021). Structural relationships between self-regulated learning, teachers’ credibility, information and communications technology literacy and academic performance in blended learning. Australasian Journal of Educational Technology, 37(4), 33-50. https://doi.org/10.14742/ajet.5783
171. Yu, Q., Bing, L., Zhang, Q., Lam, W., & Si, L. (2020). Review-based Question Generation with Adaptive Instance Transfer and Augmentation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp.280-290). https://doi.org/10.18653/v1/2020.acl-main.26
172. Zahroh, M., Kristanto, A., & Dewi, U. (2025). How Can AI-Enhanced Case-based Learning Improve Problem-Solving in Cyberbullying Education?: A Literature Review. Jurnal Teknologi Pendidikan: Jurnal Penelitian dan Pengembangan Pembelajaran, 10(2), 200-213. https://doi.org/10.33394/jtp.v10i2.14704
173. Zainudin, M., & Istiyono, E. (2019). Scientific approach to promote response fluency viewed from social intelligence: Is it effective?. European Journal of Educational Research, 8(3), 801-808. https://doi.org/10.12973/eu-jer.8.3.801
174. Zhai, C., Wibowo, S., & Li, L. D. (2024). The effects of over-reliance on AI dialogue systems on students' cognitive abilities: a systematic review. Smart Learning Environments, 11(1), 28. https://doi.org/10.1186/s40561-024-00316-7
175. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675. https://doi.org/10.48550/arXiv.1904.09675
176. Zhou, M., & Peng, S. (2025). The Usage of AI in Teaching and Students’ Creativity: The Mediating Role of Learning Engagement and the Moderating Role of AI Literacy. Behavioral Sciences, 15(5), 587. https://doi.org/10.3390/bs15050587
177. Zhu, Y., Zhang, J. H., Au, W., & Yates, G. (2020). University students’ online learning attitudes and continuous intention to undertake online courses: A self-regulated learning perspective. Educational technology research and development, 68, 1485-1519. https://doi.org/10.1007/s11423-020-09753-w
178. Zimmerman, B. J. (1989). A social cognitive view of self-regulated academic learning. Journal of educational psychology, 81(3), 329. https://psycnet.apa.org/doi/10.1037/0022-0663.81.3.329
179. Zimmerman, B. J., & Moylan, A. R. (2009). Self-regulation: Where metacognition and motivation intersect. In Handbook of metacognition in education (pp. 299-315). Routledge. https://doi.org/10.4324/9780203876428