| 研究生: |
陳冠群 Chen, Kuan-Chun |
|---|---|
| 論文名稱: |
中文裁判書之要旨擷取:以最高法院裁判書為例 Automatic Extraction of Gist of Chinese Judgments of the Supreme Court |
| 指導教授: |
劉昭麟
Liu, Chao-Lin |
| 口試委員: |
洪振洲
Hung, Jen-Jou 王昱鈞 Wang, Yu-Chun 劉昭麟 Liu, Chao-Lin |
| 學位類別: |
碩士
Master |
| 系所名稱: |
理學院 - 資訊科學系 |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 69 |
| 中文關鍵詞: | 法資訊學 、自動摘要 、自然語言處理 |
| 外文關鍵詞: | Legal informatics, Automatic summarization, Natural language processing |
| DOI URL: | http://doi.org/10.6814/THE.NCCU.CS.003.2018.B02 |
| 相關次數: | 點閱:594 下載:56 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
裁判書為法律實務工作者與研究者在處理法律問題時之重要參考資料。從裁判書中可得知法院於先前案件中對特定法律問題所持之見解。但裁判書中除較具參考價值之法院見解外,尚包含其他較無法適用至其他案件之資訊。因此導致閱讀裁判書時經常需耗費大量時間與精力。
目前雖有部份裁判書具有由法院所製作之裁判要旨,將裁判書中較具參考價值部份摘錄為裁判要旨。但人工製作裁判要旨之效率不佳,因此僅有少數裁判書具有由法院所製作之裁判要旨。且具有裁判要旨之裁判書多為最高法院之裁判書,下級審法院之裁判書幾乎皆不具有裁判要旨。若能從裁判書中自動擷取裁判要旨,將可改善由人工製作裁判要旨效率不佳之問題。
本研究之目的為應用機器學習技術從裁判書中自動擷取裁判要旨,並分別提出自動擷取裁判要旨之方法以及評估裁判要旨擷取結果之方法。
關於裁判要旨擷取方法部份,本研究將擷取裁判要旨之工作轉換為序列標記問題,利用深度學習及 gradient boosting 等機器學習技術建立分類模型,對裁判書之裁判理由進行分句標記,從裁判理由中擷取裁判要旨。本研究亦使用不同之特徵、分類模型、模型訓練方法改善裁判要旨擷取之結果。
關於擷取結果評估方法部份,本研究以法院製作之裁判要旨與自動擷取之裁判要旨進行比對,並且計算 precision、recall 及 F1 score 等指標,藉以評估裁判要旨擷取之結果。
關於實驗設計部份,以最高法院之裁判書做為實驗語料,分別進行與特徵相關之實驗以及與裁判要旨擷取模型相關之實驗。與特徵相關之各項實驗,目的為觀察加入各項特徵對裁判要旨擷取結果之影響。與裁判要旨擷取模型相關之各項實驗,目的則為觀察使用不同之機器學習方法及模型訓練方法建立分類模型對裁判要旨擷取結果之影響。
關於實驗結果部份,在與特徵相關之各項實驗中,可驗證本研究所使用之各項特徵皆有助於改善裁判要旨擷取之結果。在與裁判要旨擷取模型相關之實驗中,做為比較基準之 random forest 方法在 F1 score 上可達到 0.56。本研究所使用之深度學習方法及 gradient boosting 方法,在 F1 score 上則可分別達到 0.91 及 0.85。利用 ensemble 方法結合多種模型後,更可進一步將 F1 score 提昇至 0.93。
Judgments of the courts are important judicial references for legal practitioners and researchers in the practice of the legal issues. Previous decision-making judgments of the court for specific cases can be found in the judgments. Reading judgments often takes much time and effort since it contains too much information which is less applicable to case by case.
At present, the gist of judgments has been extracted by the senior judges hired by the court. However, this working is not common since it is not efficient done by human beings. In addition, most of the extraction has been done for the judgments of the Supreme Court. Only little extraction has been done for the judgments of the lower courts. If the gist of judgments could be automatically extracted from the judgments, it will effectively improve the insufficient quality when it done by human beings.
The objective of this study is using machine learning method to extract the gist of judgments. In this study, we propose the approach to extracting the gist of judgments and evaluate the performance of extraction results.
With the approach to extracting the gist of judgments, this study transfer gist extracting task for sequence labeling task. Using machine learning based approach, e.g. deep learning and gradient boosting, to establish classifiers. And then, we use the classifier to extract the gist. This study also uses different features, classifiers and machine learning methods to improve the results of gist extraction.
This study compares automatic extraction with artificial extraction by calculating the indicator such as precision, recall, and F1 score to evaluate the results of gist extraction with automatic method.
We run feature related experiments and gist extraction model related experiments on the corpora of judgments of the Supreme Court. The aim of feature related experiments is to observe the extraction results with adding different features. The aim of gist extraction model related experiments is to observe the extraction results with using different classifiers and machine learning methods.
In our feature related experiments, we observed that all the proposed features in this study could improve the performance of gist extraction results.
In our gist extraction model related experiments, we use random forest method as our baseline with a F1 of 0.56. In our study, we obtained a F1 of 0.91 by deep learning-based model and a F1 of 0.85 by gradient boosting-based model, respectively. Furthermore, the results show that using ensemble learning method with multiple classifiers could achieves a F1 of 0.93.
1 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 1
1.3 主要貢獻 1
1.4 論文架構 2
2 文獻回顧 3
2.1 應用自然語言處理技術於中文裁判書 3
2.1.1 裁判檢索系統 3
2.1.2 案件分類或分群 3
2.1.3 裁判因素分析與結果預測 4
2.2 以機器學習為基礎之自動摘要 5
3 語料來源與系統架構 6
3.1 語料來源 6
3.1.1 最高法院裁判書 6
3.1.2 司法院各級法院裁判書 6
3.2 系統架構 7
4 語料前處理 9
4.1 原始資料解析 9
4.1.1 HTML文字區塊擷取 10
4.1.2 裁判全文段落切割 10
4.1.3 裁判要旨註解去除 11
4.1.4 裁判要旨及裁判理由之分句切割 12
4.1.5 JSON格式轉換與儲存 13
4.2 斷詞及詞性標記 14
4.2.1 自動斷詞系統 14
4.2.2 斷詞結果 14
4.2.3 詞性標記結果 15
4.3 裁判要旨與裁判理由之對應 15
4.3.1 分句對應關係計算方法 15
4.3.2 分句異同判斷方式之改良 17
5 特徵擷取 20
5.1 基本特徵 20
5.2 裁判特徵 21
5.2.1 裁判類型 21
5.2.2 裁判性質 21
5.2.3 案件字別 21
5.2.4 裁判時間 22
5.3 分句標記特徵 23
5.3.1 規則式分句標記 23
5.3.2 法規名稱標記 25
5.4 詞彙特徵 27
5.4.1 Word embedding模型 27
5.4.2 特徵擷取方式 27
5.5 詞性特徵 28
5.6 句首詞彙特徵 28
5.7 特徵型態 29
6 裁判要旨擷取模型 30
6.1 裁判要旨擷取模型之建立 30
6.2 深度學習模型 30
6.2.1 Fully-connected neural networks 30
6.2.2 Recurrent neural networks 31
6.2.3 混合式模型 32
6.3 Gradient boosting模型 33
6.4 兩階段學習方法 34
6.5 半監督式學習方法 35
6.6 Ensemble學習方法 35
7 實驗設計與結果分析 37
7.1 實驗語料 37
7.2 實驗結果評估方法 37
7.3 實驗參數 38
7.4 基本特徵及裁判特徵實驗 39
7.4.1 實驗設計 39
7.4.2 結果分析 40
7.5 分句標記特徵實驗 42
7.5.1 實驗設計 42
7.5.2 結果分析 42
7.6 詞彙特徵實驗 44
7.6.1 實驗設計 44
7.6.2 結果分析 44
7.7 詞性特徵實驗 45
7.7.1 實驗設計 45
7.7.2 結果分析 46
7.8 句首詞彙特徵實驗 47
7.8.1 實驗設計 47
7.8.2 結果分析 48
7.9 分類模型實驗 49
7.9.1 實驗設計 49
7.9.2 結果分析 49
7.10 兩階段學習方法實驗 50
7.10.1 實驗設計 50
7.10.2 結果分析 51
7.11 半監督式學習方法實驗 52
7.11.1 實驗設計 52
7.11.2 結果分析 53
7.12 Ensemble學習方法實驗 54
7.12.1 實驗設計 54
7.12.2 結果分析 54
8 結論與未來展望 56
8.1 結論 56
8.2 未來展望 56
參考文獻 58
附錄 61
[1] Jerome H. Friedman. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis. 38, 4 (February 2002), 367-378. DOI: http://dx.doi.org/10.1016/S0167- 9473(01)00065-2
[2] Ethem Alpaydin. 2010. Introduction to Machine Learning (2nd ed.). The MIT Press. 489-493.
[3] 謝淳達。2005。利用詞組檢索中文訴訟文書之研究。碩士論文。國立政治大學,台北市,台灣。
[4] 藍家樑。2009。中文訴訟文書檢索系統雛形實作。碩士論文。國立政治大學,台北市,台灣。
[5] 廖鼎銘。2004。觸犯多款法條之賭博與竊盜案件的法院文書的分類與分析。碩士論文。國立政治大學,台北市,台灣。
[6] 何君豪。2006。階層式分群法在民事裁判要旨分群上之應用。碩士論文。國立政治大學,台北市,台灣。
[7] 鄭人豪。2006。中文詞彙集的來源與權重對中文裁判書分類成效的影響。碩士論文。國立政治大學,台北市,台灣。
[8] 林琬真。2012。機器學習於中文法律文件之標記與分類。碩士論文。國立臺灣大學,台北市,台灣。
[9] 黃玉婷。2012。以文字探勘技術產製求量刑因子之研究—以我國智慧財產權法律為中心探討。碩士論文。東吳大學,台北市,台灣。
[10]Stephen Cole Kleene. 1956. Representation of events in nerve nets and finite automata. Automata Studies. Princeton University Press, Princeton, New Jersey, 3-42.
[11]林筱瓴。2013。文字探勘在判決書上之應用—以著作權法民事賠償為中心。碩士論文。國立臺灣大學,台北市,台灣。
[12]陳政瑜。2015。基於文字探勘技術探討司法裁判書之撰寫一致性:以刑事訴訟停止羈押聲請裁定書為例。碩士論文。國立臺灣大學,台北市,台灣。
[13]黃詩淳及邵軒磊。2017。運用機器學習預測法院裁判─法資訊學之實踐。月旦法學雜誌,第 270 期,86-96。DOI: http://doi.org/10.3966/102559312017110270006
[14]黃詩淳及邵軒磊。2018。酌定子女親權之重要因素:以決策樹方法分析相關裁判。臺大法學論叢,第 47 卷,第 1 期,299-344。
[15]Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, and Zheng Chen. 2007. Document summarization using conditional random fields. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI'07), Rajeev Sangal, Harish Mehta, and R. K. Bagga (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2862-2867.
[16]Kam-Fai Wong, Mingli Wu, and Wenjie Li. 2008. Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1 (COLING '08), Vol. 1. Association for Computational Linguistics, Stroudsburg, PA, USA, 985-992.
[17]Jianpeng Cheng and Mirella Lapata. 2016. Neural summarization by extracting sentences and words. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Berlin, Germany, 484-494.
[18]Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324. DOI:http://dx.doi.org/10.1109/5.726791
[19]John J. Hopfield. 1988. Neural networks and physical systems with emergent collective computational abilities. In Neurocomputing: foundations of research, James A. Anderson and Edward Rosenfeld (Eds.). MIT Press, Cambridge, MA, USA, 457-464.
[20]Wei-Yun Ma and Keh-Jiann Chen. 2003. Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff. In Proceedings of the Second SIGHAN Workshop on Chinese Language Processing - Volume 17 (SIGHAN '03), Vol. 17. Association for Computational Linguistics, Stroudsburg, PA, USA, 168-171.
[21]Jeffrey D. Ullman, Alfred V Aho, and Daniel S Hirschberg. 1976. Bounds on the complexity of the longest common subsequence problem. Journal of the ACM 23, 1 (January 1976), 1-12. DOI=http://dx.doi.org/10.1145/321921.321922
[22]Lee R. Dice. 1945. Measures of the amount of ecologic association between species. Ecology 26, 3 (1945), 297–302. DOI: http://dx.doi.org/10.2307/1932409
[23]David Harris and Sarah Harris. 2012. Digital design and computer architecture (2nd. ed.). Morgan Kaufmann, San Francisco, CA, USA, 129.
[24]Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research, 3 (March 2003), 1137-1155.
[25]Tomas Mikolov, Kai Chen, Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https://arxiv.org/abs/1301.3781
[26]Piotr Bojanowski, Edouard Grave, Armand Joulin and Tomas Mikolov. 2016. Enriching word vectors with subword information. arXiv:1607.04606. Retrieved from https://arxiv.org/abs/1607.04606
[27]Radim Rehurek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, European Language Resources Association, Valletta, Malta, 45-50.
[28]Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12 (November 2011), 2493-2537.
[29]Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (November 1997), 1735-1780. DOI: http://dx.doi.org/10.1162/neco.1997.9.8.1735
[30]Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 30 (NIPS 2017). 3149-3157.
[31]Ethem Alpaydin. 2010. Introduction to Machine Learning (2nd ed.). The MIT Press. 220-223.
[32]Diederik P. Kingma, Jimmy Ba. 2016. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from https://arxiv.org/abs/1412.6980
[33]Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673-2681.
[34]Tin Kam Ho. 1995. Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR '95), Vol. 1. IEEE Computer Society, Washington, DC, USA, 278-282.