| 研究生: |
廖均融 Liao, Chun-Jung |
|---|---|
| 論文名稱: |
基於擴散模型資料增強的道路車道與河道邊線檢測 Road Lane and River Bank Line Detection by Diffusion-based Data Augmentation |
| 指導教授: |
彭彥璁
Peng, Yan-Tsung |
| 口試委員: |
彭彥璁
Peng, Yan-Tsung 紀明德 Chi, Ming-Te 黃士嘉 Huang, Shih-Chia |
| 學位類別: |
碩士
Master |
| 系所名稱: |
資訊學院 - 資訊科學系 Department of Computer Science |
| 論文出版年: | 2026 |
| 畢業學年度: | 114 |
| 語文別: | 英文 |
| 論文頁數: | 49 |
| 中文關鍵詞: | 資料擴增 、擴散模型 、邊緣偵測 、環境監測 |
| 外文關鍵詞: | Data augmentation, Diffusion model, Edge detection, Environmental monitoring |
| 相關次數: | 點閱:30 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,隨著 Stable Diffusion 等先進擴散模型的顯著進步,影像生成技術在深度學習影像任務中發揮了關鍵作用,特別是在資料擴增領域。
本研究以道路與河川空拍影像的邊緣偵測任務為例,探討擴散模型的潛在應用。透過進一步探索擴散模型在資料擴增上的使用,本研究基於現有的空拍數據生成新的空拍影像,同時保留既有數據的統計特徵。這為機器學習邊緣偵測模型提供了更豐富且多樣的訓練樣本,進而提升自動化遙測分析的準確度,以應用於自然陸地環境中針對道路與河川的環境監測與土地測量任務。
In recent years, with the significant advancements in advanced diffusion models such as Stable Diffusion, image generation technologies have played a crucial role in deep learning image tasks, especially in data augmentation.
This study explores the potential applications of diffusion models using edge detection tasks of road and river aerial images as examples. By further exploring the use of diffusion models in data augmentation, new aerial images are generated based on existing aerial data, while preserving the statistical characteristics of the existing data. This provides a richer and more diverse set of training samples for machine learning edge detection models, thereby improving the accuracy of automated remote sensing analysis for environmental monitoring and land surveying tasks focusing on roads and rivers in natural terrestrial environments.
1. Introductin 1
1.1 Motivation and Challenges 1
1.2 Contributions 3
2 Related Work 4
2.1 Edge Detection 4
2.2 Data Augmentation 6
2.3 Conditional Image Generation 9
2.3.1 ControlNet Architecture 9
2.3.2 Integration with Stable Diffusion 11
2.4 Transductive Learning and Active Adaptation 12
2.4.1 Transductive Inference and Test-Time Adaptation (TTA) 12
2.4.2 Active Learning and Diversity Sampling 13
2.4.3 Generative Domain Adaptation and Instance-Specific Optimization 13
3 Approach 15
3.1 Overview: Geometry-Driven Generative Adaptation Framework 15
3.2 Geometric Prior Modeling 16
3.2.1 Centerline Modeling via Cubic Bezier Curves 17
3.2.2 Boundary Construction and Perspective Projection 18
3.2.3 Geometric Neighborhood Expansion (Jittering) 18
3.3 Generative Texture Synthesis 19
3.3.1 Geometry-Conditioned Texture Mapping 19
3.3.2 Stochastic Diversity via Generative Priors 20
3.4 Transductive Test-Time Adaptation 21
3.4.1 Instance-Specific Neighborhood Expansion 21
3.4.2 Batch-Wise Adaptation Strategy 22
4 Experimental Results 24
4.1 Datasets 24
4.1.1 Dataset Construction and Data Collection 24
4.1.2 Annotation Methodology 25
4.1.3 Comparison with Existing Datasets 25
4.1.4 Distinct Contributions 26
4.2 Experimental Settings 27
4.2.1 Adaptation Protocols 27
4.2.2 Comparison Baselines 27
4.3 Experimental Results 28
4.3.1 Quantitative Results 28
4.3.2 Analysis of Results 30
4.3.3 Qualitative Results 31
4.4 Impact of Synthetic Data Volume 33
4.5 Ablation Studies 34
4.6 In-depth Analysis of River Environments 36
4.6.1 Impact of Synthetic River Data Volume 37
4.6.2 The Necessity of Mixed Retraining 37
4.6.3 Fine-grained Analysis on Extreme Topography (Big Jitter Subset) 39
4.7 Limitations and Future Work 40
5 More qualitative results 41
6 Conclusion 48
Reference 49
[1] S. Xie and Z. Tu, “Holistically-nested edge detection,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1395–1403, 2015.
[2] M. Pu, Y. Huang, Y. Liu, Q. Guan, and H. Ling, “Edter: Edge detection with trans-former,” 2022.
[3] K. Islam, M. Z. Zaheer, A. Mahmood, and K. Nandakumar, “Diffusemix: Label-preserving data augmentation with diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 27621–27630, June 2024.
[4] L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3836–3847, October 2023.
[5] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Con-ference on Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695, June 2022.
[6] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” 2024.
[7] S. Matarneh, F. Elghaish, A. Al-Ghraibah, E. Abdellatef, and D. J. Edwards, “An automatic image processing based on hough transform algorithm for pavement crack detection and classification,” in Smart and Sustainable Built Environment, vol. 2, pp. 314–319, 2023.
[8] J. Canny, “A computational approach to edge detection,” Pattern Analysis and Ma-chine Intelligence, IEEE Transactions on, vol. PAMI-8, pp. 679 – 698, 12 1986.
[9] J. Kittler, “On the accuracy of the sobel edge detector,” Image and Vision Computing, vol. 1, no. 1, pp. 37–42, 1983.
[10] J. Yang, B. Price, S. Cohen, H. Lee, and M.-H. Yang, “Object contour detection with a fully convolutional encoder-decoder network,” 2016.
[11] Y. Liu and M. S. Lew, “Learning relaxed deep supervision for better edge detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
[12] Y. Liu, M.-M. Cheng, X. Hu, J.-W. Bian, L. Zhang, X. Bai, and J. Tang, “Richer convolutional features for edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, p. 1939–1946, Aug. 2019.
[13] Y. Wang, X. Zhao, and K. Huang, “Deep crisp boundaries,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[14] J. He, S. Zhang, M. Yang, Y. Shan, and T. Huang, “Bi-directional cascade network for perceptual edge detection,” 2019.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[16] Y. Gao, C. Tang, J. Lang, and J. Lv, “End-to-end edge detection vianbsp;improved transformer model,” in Neural Information Processing: 28th International Confer-ence, ICONIP 2021, Sanur, Bali, Indonesia, December 8–12, 2021, Proceedings, Part IV, (Berlin, Heidelberg), p. 514–525, Springer-Verlag, 2021.
[17] Z. Su, W. Liu, Z. Yu, D. Hu, Q. Liao, Q. Tian, M. Pietikäinen, and L. Liu, “Pixel difference networks for efficient edge detection,” 2021.
[18] R. Deng, S. Liu, J. Wang, H. Wang, H. Zhao, and X. Zhang, “Learning to decode contextual information for efficient contour detection,” in Proceedings of the 29th ACM International Conference on Multimedia, MM ’21, (New York, NY, USA), p. 4435–4443, Association for Computing Machinery, 2021.
[19] M. Pu, Y. Huang, Q. Guan, and H. Ling, “Rindnet: Edge detection for discontinuity in reflectance, illumination, normal and depth,” 2021.
[20] R. Deng, C. Shen, S. Liu, H. Wang, and X. Liu, “Learning to predict crisp bound-aries,” 2018.
[21] R. Deng and S. Liu, “Deep structural contour detection,” pp. 304–312, 10 2020.
[22] L. Huan, N. Xue, X. Zheng, W. He, J. Gong, and G.-S. Xia, “Unmixing convolutional features for crisp edge detection,” 2021.
[23] C. Zhou, Y. Huang, M. Pu, Q. Guan, L. Huang, and H. Ling, “The treasure beneath multiple annotations: An uncertainty-aware edge detector,” 2023.
[24] M. Li, D. Chen, and S. Liu, “Beta network for boundary detection under nondeter-ministic labels,” Knowledge-Based Systems, vol. 266, p. 110389, 2023.
[25] K. T. . F. B. Shorten, C., “Text data augmentation for deep learning.,” in Journal of big Data, 8:1–34., vol. 2, 2021.
[26] S. Azizi, S. Kornblith, C. Saharia, M. Norouzi, and D. J. Fleet, “Synthetic data from diffusion models improves imagenet classification,” 2023.
[27] B. Trabucco, K. Doherty, M. Gurinas, and R. Salakhutdinov, “Effective data aug-mentation with diffusion models,” 2023.
[28] V. Vapnik, “An overview of statistical learning theory,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 988–999, 1999.
[29] D. Wang, E. Shelhamer, S. Liu, B. A. Olshausen, and T. Darrell, “Fully test-time adaptation by entropy minimization,” CoRR, vol. abs/2006.10726, 2020.
[30] O. Sener and S. Savarese, “Active learning for convolutional neural networks: A core-set approach,” in International Conference on Learning Representations, 2018.
[31] H. T. Nguyen and A. W. M. Smeulders, “Active learning using pre-clustering,” in International Conference on Machine Learning, pp. 623–630, 2004.
[32] J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell, “CyCADA: Cycle-consistent adversarial domain adaptation,” in Proceedings of the 35th International Conference on Machine Learning (J. Dy and A. Krause, eds.), vol. 80 of Proceedings of Machine Learning Research, pp. 1989–1998, PMLR, 10–15 Jul 2018.
[33] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[34] A. Shocher, N. Cohen, and M. Irani, “”zero-shot” super-resolution using deep inter-nal learning,” CoRR, vol. abs/1712.06087, 2017.
[35] A. Saleemi et al., “A publicly available dataset for moving object detection in aerial video,” arXiv preprint arXiv:2205.10093, 2022.
[36] J. Smith et al., “Afid: A large-scale aerial image dataset for river and flood segmen-tation,” IEEE Transactions on Geoscience and Remote Sensing, 2021.