跳到主要內容

簡易檢索 / 詳目顯示

研究生: 林銘凱
Lin , Ming-Kai
論文名稱: 基於少樣本學習之中文字型風格生成研究
Few-Shot Learning for Chinese Font Style Generation
指導教授: 蔡炎龍
口試委員: 陳天進
張宜武
學位類別: 碩士
Master
系所名稱: 理學院 - 應用數學系
Department of Mathematical Sciences
論文出版年: 2026
畢業學年度: 114
語文別: 英文
論文頁數: 43
中文關鍵詞: 少樣本學習元學習中文字體生成CAVIAU-Net
外文關鍵詞: Few-Shot Learning, Meta-Learning, Chinese Font Generation, CAVIA, U-Net
相關次數: 點閱:5下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 中文字體生成在計算機視覺領域中是一項具備高度挑戰性的任務。由於中 文字形結構複雜、筆畫繁多,且字體風格的轉換不僅涉及像素層面的樣式 遷移,更包含筆畫骨架的延伸與空間佈局的動態調整,因此傳統方法難以 僅憑極少量的參考樣本自動生成完整的字體庫。為了克服此一少樣本字體 生成的瓶頸,本研究提出了一個結合元學習(Meta-Learning)與卷積神經 網路的創新生成框架。 本研究採用 CAVIA(Contextual Adaptation via Meta-Learning)演算法。 其核心架構引入了一組高維度的 Context Vector,在內迴圈(Inner Loop)優 化過程中,模型僅需針對特定字體任務更新此變量,而其餘的核心元參 數則留至外迴圈(Outer Loop)進行全局更新,從而大幅增加了模型面對 未看過的新字體的快速適應能力與學習效率。在生成網路架構上,本研究 以 U-Net 為基礎模型,利用其對稱的編碼器-解碼器結構與跳躍連接(Skip Connections)機制,將基準字體(標楷體)的底層空間特徵與高層全局結 構進行有效融合。此外,本研究進一步在跳躍連接中融入仿射變換,由映 射後的上下文向量動態調控特徵圖的縮放、平移與局部空間形變,以在保 留中文字體相對空間佈局的同時,精準實現筆畫幾何結構的風格轉換。 在損失函數的設計上,本研究建構了一個多目標聯合損失函數,整合 了平均絕對誤差(MAE)、Dice 損失、二元交叉熵(BCE)、結構相似性 (SSIM)以及拉普拉斯銳化損失(Laplacian Sharpness Loss),全面優化生 成字體的邊緣清晰度與視覺逼真度。實驗部分採用了 470 種中文目標字 體資料集進行訓練,在 5-shot(僅需 5 個參考字)的嚴苛少樣本設定下進 行字體生成。最終的實驗結果顯示,本模型在未參與訓練的全新字體的 測試下,依然保持優異的泛化彈性。透過後處理的雙三次插值(Bicubic Interpolation)技術,成功將低解析度輸出擴展為兼具清晰筆畫骨架與豐富 字體風格的高解析度字體圖像。


    Chinese font generation is a very difficult task in computer vision. Because Chinese characters have complex structures and many strokes, changing font styles is not just about changing pixels. It requires changing the shapes of strokes and their positions. Therefore, traditional methods cannot automatically generate a full font library when they only have a few reference characters. To solve this few-shot learning problem, this thesis proposes a new generation framework that combines meta-learning with convolutional neural networks. Specifically, we introduce the CAVIA (Contextual Adaptation via Meta-Learning) algorithm. The main idea of CAVIA is to use high-dimensional context parameters. During the inner-loop optimization, the model only updates these context variables for each font task. The other core parameters are updated globally in the outer loop. This design makes the model much faster and better at learning new font tasks. For the main network, we use a U-Net structure with an encoder, a decoder, and skip connections. It combines the low-level fine spatial details and high-level structural features of the baseline font (KaiTi). More importantly, we add affine transformations into the skip connections using the context vector. This allows the network to control spatial changes at the pixel level, changing stroke shapes while keeping the correct layout of the characters. To guide the network training, we create a joint loss function. It combines several loss metrics: Mean Absolute Error (MAE), Dice Loss, Binary Cross Entropy (BCE), Structural Similarity Index (SSIM), and Laplacian Sharpness Loss. We test and evaluate our model on a dataset with 470 different Chinese fonts under a strict 5-shot setting. The experiments show that our model has strong generalization abilities on new fonts that were not in the training set. Finally, we use a bicubic interpolation method to upscale the raw outputs into high-resolution character images with sharp stroke lines and clear styles.

    致謝i
    中文摘要ii
    Abstract iii
    Chapter1 Deep-learning 1
    1.1 NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . 1
    1.2 ActivationFunctions . . . . . . . . . . . . . . . . . . . . . . . . 2
    1.3 LossFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
    1.4 GradientDescentandBackpropagation . . . . . . . . . . . . 7
    1.5 OverfittingandRegularization . . . . . . . . . . . . . . . . . . . 10
    Chapter2 ConvolutionalNeuralNetworks(CNNs) 13
    2.1 ConvolutionalLayer . . . . . . . . . . . . . . . . . . . . . . . . 13
    2.1.1 ConvolutionOperation. . . . . . . . . . . . . . . . . . . 13
    2.1.2 StrideandPadding . . . . . . . . . . . . . . . . . . . . . 13
    2.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 13
    2.2 PoolingLayer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
    2.3 FullyConnectedLayer . . . . . . . . . . . . . . . . . . . . . 15
    2.4 SoftmaxLayer . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
    Chapter3 Meta-Learning 16
    3.1 GeneralRepresentation . . . . . . . . . . . . . . . . . . . . . . . 16
    3.2 ConvolutionalSiameseNeuralNetwork . . . . . . . . . . . . 17
    3.3 MatchingNetworks . . . . . . . . . . . . . . . . . . . . . . . . . 19
    3.4 RelationNetwork . . . . . . . . . . . . . . . . . . . . . . . . . . 20
    3.5 MAML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
    3.6 ContextualAdaptationviaMeta-Learning . . . . . . . . . .. . 22
    3.6.1 Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . 22
    3.6.2 MathematicalFormulation . . . . . . . . . . . . . . . . . 23
    3.6.3 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . 23
    Chapter4 U-Net 24
    4.1 OverviewofAutoencoders . . . . . . . . . . . . . . . . . . . . . 24
    4.2 StructureofanAutoencoder . . . . . . . . . . . . . . . . . . . . 25
    4.3 U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
    4.3.1 SkipConnections . . . . . . . . . . . . . . . . . . . . . . 26
    Chapter5 Experiments 27
    5.1 FontsGenerationproblem . . . . . . . . . . . . . . . . . . . . . 27
    5.2 FontDatasetPreparation . . . . . . . . . . . . . . . . . . . . . . 28
    5.3 GenerationModel . . . . . . . . . . . . . . . . . . . . . . . . . . 29
    5.4 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
    5.5 ConclusionandFutureWork . . . . . . . . . . . . . . .. . . 39
    5.5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 39
    5.5.2 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . 39
    Bibliography 41

    [1] Dor Bank, Noam Koenigstein, and Raja Giryes. Autoencoders. In Lior Rokach, Oded Maimon, and Erez Shmueli, editors, Machine Learning for Data Science Handbook, pages 353–374. Springer, Cham, 2023.
    [2] Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127, 2023.
    [3] Rodrigo Caye Daudt, Bertr Le Saux, and Alexandre Boulch. Fully convolutional siamese networks for change detection. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 4063–4067, 2018.
    [4] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta learning for fast adaptation of deep networks. In Doina PrecupandYeeWhye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1126–1135. PMLR, 06–11 Aug 2017.
    [5] Rafael C Gonzalez and Richard E Woods. Digital Image Processing. Pearson, 4th edition, 2018.
    [6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
    [7] Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
    [8] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International con ference on machine learning, pages 448–456. pmlr, 2015.
    [9] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
    [10] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer Normal ization. arXiv e-prints, page arXiv:1607.06450, July 2016.
    [11] ShaohuaLi,XinxingXu,LiqiangNie,andTat-SengChua. Laplacian-steered neural style transfer. In Proceedings of the 25th ACM international conference on Multimedia, MM '17, page 1716–1724. ACM, October 2017.
    [12] Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, page 3, 2013.
    [13] Jim Nilsson and Tomas Akenine-Möller. Understanding SSIM. arXiv e prints, page arXiv:2006.13846, June 2020.
    [14] Keiron O’Shea and Ryan Nash. An introduction to convolutional neural networks. Workingpaper, arXiv, November 2015. 10 pages, 5 figures.
    [15] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
    [16] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference onMedicalimagecomputingandcomputer-assisted intervention, pages 234–241. Springer, 2015.
    [17] Carole H. Sudre, Wenqi Li, Tom Vercauteren, Sebastien Ourselin, and M. Jorge Cardoso. Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations, page 240–248. Springer International Publishing, 2017.
    [18] Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1199–1208, 2018.
    [19] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. Advances in Neural Information Processing Systems, 29, 2016.
    [20] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
    [21] Guoping Xu, Xiaxia Wang, Xinglong Wu, Xuesong Leng, and Yongchao Xu. Development of residual learning in deep neural networks for computer vision: A survey. Engineering Applications of Artificial Intelligence, 142:109890, February 2025.
    [22] Luisa Zintgraf, Kyriacos Shiarli, Vitaly Kurin, Katja Hofmann, and Shimon Whiteson. Fast context adaptation via meta-learning. In International conference on machine learning, pages 7693–7702. PMLR, 2019.

    無法下載圖示 全文公開日期 2031/06/30
    QR CODE
    :::