跳到主要內容

簡易檢索 / 詳目顯示

研究生: 程品潔
Cheng, Pin-Chieh
論文名稱: 應用生成式資料擴增提升魚眼鏡頭物件偵測模型效能
Enhancing Fisheye Lens Object Detection Using Generative Data Augmentation
指導教授: 廖文宏
Liao, Wen-Hung
口試委員: 廖文宏
Liao, Wen-Hung
紀明德
Chi, Ming-Te
劉遠楨
Liu, Yuan-Chen
學位類別: 碩士
Master
系所名稱: 資訊學院 - 資訊科學系碩士在職專班
Excutive Master Program of Computer Science
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 71
中文關鍵詞: 魚眼鏡頭魚眼校正物件偵測擴散模型生成式資料擴增
外文關鍵詞: Fisheye Camera, Fisheye Correction, Object Detection, Diffusion Model, Generative Data Augmentation
相關次數: 點閱:62下載:34
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 智慧城市旨在利用創新科技提升城市運行效率、安全性及生活品質,先進的監控系統和物件偵測技術是智慧城市的重要組成部分,有助於管理和優化公共空間。頂照式魚眼鏡頭因其超廣角視野,非常適合用於大範圍監控,但也帶來了嚴重的影像失真問題。此外,鑒於隱私保護需求和場景的多樣性,獲取足夠且多樣化的公開影像極其困難,從而阻礙了相關研究的發展。
    針對上述問題,本研究選擇圖書館這一常見且重要的公共場所作為研究對象,旨在解決使用頂照式魚眼鏡頭進行物件偵測時的兩大挑戰:資料稀缺性和魚眼影像失真問題。透過使用文本到圖像的生成模型來擴增訓練資料,並結合基於預設相機內部參數的失真校正方法,我們成功提高了物件偵測的準確率。
    實驗結果顯示,使用生成式AI模型生成的圖像進行訓練,並逐步策略性地增加合成實例,能夠顯著提升模型的偵測性能,將資料集校正後對小尺寸物件的偵測效果尤其顯著。我們提出的魚眼校正微調模型跟YOLOv8基準模型相比,整體的mAP(0.5) 從0.246提升到0.688,mAP(0.5-0.95) 從0.122提升到0.518;在特定的小尺寸物件類別(飲料)上,mAP(0.5) 從0.507提升到0.795,mAP(0.5-0.95) 從0.268提升到0.586。此外,混合適當比例的合成資料與真實資料進行訓練,不僅可以提升訓練過程的穩健性,還有助於進一步優化模型性能。這些發現證實了本研究採用的方法在頂照式魚眼鏡頭物件偵測應用中的潛力。


    Smart cities aim to leverage innovative technologies to enhance urban operational efficiency, safety, and quality of life. Advanced surveillance systems and object detection technologies are crucial components of smart cities, aiding in the management and optimization of public spaces. Overhead fisheye lenses, with their ultra-wide field of view, are well-suited for large-scale surveillance but present significant image distortion challenges. Furthermore, due to privacy protection requirements and the diversity of scenes, acquiring sufficient and diverse public images is extremely difficult, hindering the development of related research.

    Addressing these issues, this study focuses on libraries, a common and important public venue, aiming to address two major challenges in object detection using overhead fisheye images: data scarcity and fisheye lens distortion. By using text-to-image generative models to augment the training data and combining them with distortion correction methods based on preset camera intrinsic parameters, we successfully improved object detection accuracy.

    Experimental results show that training with images generated by generative AI models and gradually and strategically increasing synthetic instances can significantly enhance the model's detection performance, with particularly notable improvements in detecting small objects after correcting the dataset. Our fisheye correction fine-tuning model, compared to the YOLOv8 baseline model, improved the overall mAP(0.5) from 0.246 to 0.688 and mAP(0.5-0.95) from 0.122 to 0.518; for specific small object categories (beverages), mAP(0.5) increased from 0.507 to 0.795, and mAP(0.5-0.95) from 0.268 to 0.586. Additionally, mixing an appropriate proportion of synthetic data with real data for training not only enhances the robustness of the training process but also helps further optimize model performance. These findings confirm the potential of our approach in the application of object detection using overhead fisheye lenses.

    第一章 緒論 1
    1.1 研究背景與動機 1
    1.2 研究目的與貢獻 2
    1.3 論文架構 3

    第二章 相關研究與技術背景 5
    2.1 頂照式魚眼鏡頭之物件偵測資料集 5
    2.1.1 頂照式魚眼公開資料集 5
    2.1.2 資料擴增 6
    2.2 圖像生成模型 7
    2.2.1 基於生成對抗網路的生成模型 7
    2.2.2 基於擴散模型的生成模型 8
    2.2.3 域適應 10
    2.3 魚眼影像失真校正 10
    2.3.1 基於相機參數的魚眼失真校正 11
    2.3.2 基於卷積神經網路的魚眼失真校正 12
    2.4 魚眼鏡頭下的物件偵測技術 13
    2.4.1 應用於原始魚眼影像的物件偵測技術 13
    2.4.2 針對魚眼失真的適應性物件偵測技術 14

    第三章 研究方法 16
    3.1 訓練資料 17
    3.1.1 真實資料CEPDOF與MW-18Mar 17
    3.1.2 合成資料Midjourney與DALL·E生成圖像 19
    3.2 魚眼資料集的校正方法 23
    3.2.1 頂照式魚眼影像的失真校正方法 23
    3.2.2 校正後物件框對應的調整方法 30
    3.3 模型架構 31
    3.3.1 YOLOv8模型架構 31
    3.3.2 YOLOv8x預訓練模型 33
    3.4 模型的性能評估指標 34
    3.4.1 物件偵測模型的性能評估指標 34
    3.4.2 YOLO模型的偵測性能評估指標 36
    3.5 實驗方法 37
    3.5.1 實驗流程 37
    3.5.2 使用生成式AI圖像微調模型的關鍵策略 40
    3.5.3 實驗環境與訓練參數 41

    第四章 實驗過程與結果分析 43
    4.1 實驗1:使用小批量訓練集比較生成式AI工具 43
    4.2 實驗2:使用未校正合成資料訓練 45
    4.2.1 實驗2-1:使用合成資料訓練並以合成圖像驗證 46
    4.2.2 實驗2-2:使用合成資料訓練並以真實影像驗證 47
    4.3 實驗3:使用校正後合成資料訓練並以真實影像驗證 48
    4.4 實驗4:混合合成與真實資料訓練並以真實影像驗證 53
    4.5 未能成功偵測的案例分析 58
    4.6 實驗結果總結 61

    第五章 結論與未來研究方向 63
    5.1 結論 63
    5.2 未來研究方向 63

    參考文獻 65

    附錄 69
    使用Midjourney以圖生圖的測試過程 69
    使用DALL·E 3以圖生圖的測試過程 69

    [1] Xu, J., Han, D.-W., Li, K., Li, J.-J., & Ma, Z.-Y. (2024). A Comprehensive Overview of Fish-Eye Camera Distortion Correction Methods. arXiv.
    [2] Electrical & Computer Engineering, Visual Information Processing, Human-Aligned Bounding Boxes from Overhead Fisheye cameras dataset (HABBOF). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/habbof/.
    [3] Electrical & Computer Engineering, Visual Information Processing, Rotated Bounding-Box Annotations for Mirror Worlds Dataset (MW-R). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/mw-r/.
    [4] Electrical & Computer Engineering, Visual Information Processing, Challenging Events for Person Detection from Overhead Fisheye images (CEPDOF). Retrieved from https://vip.bu.edu/projects/vsns/cossy/datasets/cepdof/.
    [5] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. In Advances in Neural Information Processing Systems (pp. 2672-2680).
    [6] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv preprint arXiv:1703.10593. [Submitted on 30 Mar 2017 (v1), last revised 24 Aug 2020 (this version, v7)].
    [7] Besnier, V., Jain, H., Bursuc, A., Cord, M., & Perez, P. (2019). THIS DATASET DOES NOT EXIST: TRAINING MODELS FROM GENERATED IMAGES. arXiv.
    [8] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved Techniques for Training GANs. arXiv preprint arXiv:1606.03498. [Submitted on 10 Jun 2016].
    [9] Prafulla Dhariwal, Alex Nichol. (2021). Diffusion Models Beat GANs on Image Synthesis. arXiv preprint arXiv:2105.05233. [Submitted on 11 May 2021 (v1), last revised 1 Jun 2021 (this version, v4)].
    [10] Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. arXiv preprint arXiv:2006.11239. [Submitted on 19 Jun 2020 (v1), last revised 16 Dec 2020 (this version, v2)].
    [11] Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative Adversarial Text to Image Synthesis. arXiv preprint arXiv:1605.05396. [Submitted on 17 May 2016 (v1), last revised 5 Jun 2016 (this version, v2)].
    [12] Gerstgrasser, M., Schaeffer, R., Dey, A., Rafailov, R., Sleight, H., Hughes, J., Korbak, T., Agrawal, R., Pai, D., Gromov, A., Roberts, D. A., Yang, D., & Donoho, D. L., & Koyejo, S. (2024). Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data. arXiv preprint arXiv:2404.12345. [Submitted on 1 Apr 2024 (v1), last revised 29 Apr 2024 (this version, v2)].
    [13] Bram Vanherle, Steven Moonen, Frank Van Reeth, Nick Michiels. (2022). Analysis of Training Object Detection Models with Synthetic Data. arXiv preprint arXiv:2211.15432. [Submitted on 29 Nov 2022].
    [14] Seib, V., Roosen, M., Germann, I., Wirtz, S., & Paulus, D. (2024). Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs. arXiv.
    [15] Huibing Wanga, Tianxiang Cuia, Mingze Yaoa, Huijuan Panga, Yushan Dua. (2023). Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos. arXiv preprint arXiv:2308.04322v1 [cs.CV]. [Submitted on 8 Aug 2023].
    [16] Fu, Y., Chen, C., Qiao, Y., & Yu, Y. (2024). DreamDA: Generative Data Augmentation with Diffusion Models. arXiv preprint arXiv:2403.09876. [Submitted on 19 Mar 2024].
    [17] Zhu-Cun Xue, Nan Xue, Gui-Song Xia. (2020). Fisheye Distortion Rectification from Deep Straight Lines. arXiv preprint arXiv:2003.11767. [Submitted on 25 Mar 2020].
    [18] Yang, S., Lin, C., Liao, K., Zhang, C., & Zhao, Y. (2021). Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow. arXiv preprint arXiv:2103.12345. [Submitted on 30 Mar 2021 (v1), last revised 31 Mar 2021 (this version, v2)].
    [19] Yang, S., Lin, C., Liao, K., & Zhao, Y. (2023). Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization. arXiv preprint arXiv:2301.09876. [Submitted on 26 Jan 2023].
    [20] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. arXiv preprint arXiv:1506.02640. [Submitted on 8 Jun 2015 (v1), last revised 9 May 2016 (this version, v5)].
    [21] Gochoo, M., Otgonbold, M., Ganbold, E., Hsieh, J.-W., Chang, M.-C., Chen, P.-Y., Dorj, B., Al Jassmi, H., Batnasan, G., Alnajjar, F., Abduljabbar, M., & Lin, F.-P. (2023). FishEye8K: A Benchmark and Dataset for Fisheye Camera Object Detection. arXiv preprint arXiv:2305.09876. [Submitted on 27 May 2023 (v1), last revised 6 Jun 2023 (this version, v2)].
    [22] Li, S., Tezcan, M. O., Ishwar, P., & Konrad, J. (2019). Supervised People Counting Using An Overhead Fisheye Camera. IEEE Transactions on Image Processing, 28(12), 6142-6157.
    [23] Rashed, H., Mohamed, E., Sistu, G., Kumar, V. R., Eising, C., El-Sallab, A., & Yogamani, S. (2020). Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline. arXiv preprint arXiv:2012.12345. [Submitted on 3 Dec 2020 (v1), last revised 21 Dec 2022 (this version, v2)].
    [24] Z. Duan, M.O. Tezcan, H. Nakamura, P. Ishwar and J. Konrad, “RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images”, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Omnidirectional Computer Vision in Research and Industry (OmniCV) Workshop, June 2020.
    [25] Rashed, H., Mohamed, E., Sistu, G., Kumar, V. R., Eising, C., El-Sallab, A., & Yogamani, S. (2020). FisheyeYOLO: Object Detection on Fisheye Cameras for Autonomous Driving. arXiv preprint arXiv:2004.13621.
    [26] Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Varley, P., O'Dea, D., Uricar, M., Milz, S., Simon, M., Amende, K., Witt, C., Rashed, H., Chennupati, S., Nayak, S., Mansoor, S., Perroton, X., & Perez, P. (2021). WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving. arXiv.
    [27] Peterson, M. (2024). Dos and don’ts when fine-tuning generative AI models. RWS. Retrieved from https://www.rws.com/artificial-intelligence/train-ai-data-services/blog/dos-and-donts-when-fine-tuning-generative-ai-models/?utm_campaign=TrainAI%20Data%20Services%20-%20GenAI%20Campaign&utm_content=281536374&utm_medium=social&utm_source=linkedin&hss_channel=lcp-12582389.

    QR CODE
    :::