基於人物屬性特徵之多視角監控影片檢索管理系統設計

簡易檢索 / 詳目顯示

回結果列表

研究生：	王懷憶 Wang, Huai-Yi
論文名稱：	基於人物屬性特徵之多視角監控影片檢索管理系統設計 Design of a Multi-view Surveillance Video Retrieval and Management System Based on Pedestrian Attributes
指導教授：	廖峻鋒 Liao, Chun-Feng
口試委員:	孫士勝 Sun, Shi-Sheng 陸敬互 Lu, Jing-Hu
學位類別：	碩士 Master
系所名稱：	資訊學院 - 資訊科學系碩士在職專班 Excutive Master Program of Computer Science
論文出版年：	2025
畢業學年度：	113
語文別：	中文
論文頁數：	58
中文關鍵詞：	監控影片檢索、人物屬性識別、YOLOv8n 、PP-Human 、加權餘弦相似度
外文關鍵詞：	Surveillance Video Retrieval and Management, Pedestrian Attribute Recognition, YOLOv8n, PP-Human, Weighted Cosine Similarity
相關次數：	點閱：21 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，隨著監控攝影機技術的蓬勃發展與人工智慧模型的快速演進，智慧型監控系統已逐漸成為城市安全與場域管理的重要工具。這些攝影設備不僅能即時記錄現場畫面，更具備自動辨識能力，能產生包括人物特徵、物件類別、行為偵測與場景語意等結構化數據。然現有監控系統多仍侷限於傳統以時間軸與攝影機為主的檢索方式，無法充分利用所產生的豐富數據資源，導致在龐大的影像資料庫中搜尋特定目標時效率低下，並需大量人工逐一檢視確認，耗時費力且容易誤判。
本研究為了解決上述問題，提出一種以「屬性標籤索引技術」為核心之影片檢索與管理方法。該方法整合目標檢測與行人屬性辨識技術，對監控畫面中的人物進行屬性標註，並轉化為結構化索引資料，使影片能依據內容特徵進行更精確且有效的檢索。本研究同時設計並實作一套完整系統架構，後端模組負責接收與處理AI模型產出的屬性數據，分析多支影片間的語意關聯；前端介面則以視覺化方式呈現檢索結果與影片關聯地圖，提升使用者在檢索與管理過程中的體驗。
透過實證研究與案例測試，本研究驗證了屬性標籤索引技術於影片搜尋效率與管理效能上的顯著提升。相較傳統搜尋方式，使用者可更快速準確地定位目標片段，減少不必要的瀏覽與人力成本，並提高整體系統的操作直覺性與可用性。本研究成果預期能為未來智慧監控系統之資料管理提供參考依據，並拓展影像資料在公安、交通、商業與其他應用領域的價值。

In recent years, with the rapid development of surveillance camera technology and the evolution of artificial intelligence models, intelligent surveillance systems have gradually become essential tools for urban safety and environment management. These camera systems not only provide real-time visual monitoring but also possess automated recognition capabilities, generating structured data such as human attributes, object categories, behavior detection, and scene semantics. However, most existing surveillance systems still rely on conventional time-based and camera-based retrieval methods, failing to fully utilize the rich data produced. As a result, locating specific segments from vast video databases remains inefficient, time-consuming, and heavily dependent on manual inspection, often leading to human errors.
To address these challenges, this study proposes a novel video retrieval and management method based on Attribute Tag Indexing Technology. The proposed approach integrates object detection and pedestrian attribute recognition to automatically annotate human features in surveillance footage and transform them into structured index data. This allows for more accurate and efficient video retrieval based on content characteristics. Furthermore, a complete system architecture is developed: the backend module processes attribute data generated by AI models and analyzes semantic relationships across videos, while the frontend visualizes retrieval results and inter-video relationships through an intuitive and interactive interface.
Through empirical experiments and case testing, the proposed method demonstrates significant improvements in video search efficiency and management performance. Compared to traditional search methods, users can locate target segments more quickly and accurately, reducing browsing time and manual effort, while enhancing overall system usability. The outcomes of this research are expected to contribute to the development of intelligent surveillance data management systems and extend the practical value of video data in fields such as public safety, traffic monitoring, commercial analytics, and beyond.

摘要 2
謝辭 5
目錄 6
表次 9
第一章緒論 10
第一節研究背景與動機 10
第二節研究目的與問題 11
第三節預期貢獻和研究流程 12
第二章文獻探討 13
第一節影像分析概述 13
第二節物件識別 13
第三節行人屬性識別 17
第四節 PAR 主流模型與資料集 18
第五節監控影片中的檢索系統 20
第三章系統設計 21
第一節系統架構 23
第二節資料流與模組互動流程 27
第四章系統實作 30
第一節使用YOLO、PADDLE設計與實作影像分析模組 30
第二節使用POSTGRES SQL 資料庫設計以儲存結構化數據 33
第三節使用FASTAPI建構後端服務與API開發 33
第四節實作前端介面呈現搜尋結果 36
第五章系統評估 39
第一節搜尋時間定義與效率測試結果 39
第二節互動情境 40
第三節檢索需求與預測效果 42
第四節易用性測試 43
第五節研究問題與討論 49
第六章結論 52
參考文獻 53

[1] S. E. Umbaugh, Digital Image Processing and Analysis: Computer Vision and Image Analysis. Boca Raton, FL, USA: CRC Press, n.d.

[2] C. Kastner, Machine Learning in Production: From Models to Products. Cambridge, MA, USA: MIT Press, 2025.

[3] S. J. Prince, Understanding Deep Learning. Cambridge, UK: MIT Press, 2023.

[4] V. Adewopo, N. Elsayed, Z. Elsayed, M. Ozer, A. Abdelgawad, and M. Bayoumi, "Review on action recognition for accident detection in smart city transportation systems," arXiv preprint arXiv:2208.09588, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2208.09588

[5] Y. Zhao and A. Cai, "A novel relative orientation feature for shape-based object recognition" in 2009 IEEE International Conference on Network Infrastructure and Digital Content, Beijing, China, 2009, pp. 686-689, doi: 10.1109/ICNIDC.2009.5360852.

[6] J. Cao et al., "Multi-Task Collaborative Attention Network for Pedestrian Attribute Recognition" in 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 2023, pp. 1-6, doi: 10.1109/IJCNN54540.2023.10191574.

[7] Y. Benezeth, B. Emile, H. Laurent, and C. Rosenberger, "Vision-based system for human detection and tracking in indoor environment," International Journal of Social Robotics, vol. 2, no. 1, pp. 41–52, 2010.

[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proc. 25th Int. Conf. Neural Information Processing Systems (NeurIPS), 2012, pp. 1097–1105. [Online]. Available:https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

[9] J. Redmon and A. Farhadi, "YOLOv3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018. [Online]. Available: https://arxiv.org/abs/1804.02767

[10] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). "You only look once: Unified, real-time object detection. " in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 779–788)

[11] W. Liu et al., "SSD: Single shot multibox detector," in Proc. Eur. Conf. Comput. Vis. (ECCV), B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., vol. 9905, Lecture Notes in Computer Science. Cham, Switzerland: Springer, 2016, pp. 21–37. [Online]. Available: https://doi.org/10.1007/978-3-319-46448-0_2

[12] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017, doi: 10.1109/TPAMI.2016.2577031.

[13] X. Chen, S. Zhuang, X. Zheng and Z. Wang, "Pedestrian Attribute Recognition Based On Deep Learning : A Survey," in 2021 International Conference on Information Technology and Biomedical Engineering (ICITBE), Nanchang, China, 2021, pp. 140-144, doi: 10.1109/ICITBE54178.2021.00039.

[14] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang and Q. Tian, "Scalable Person Re-identification: A Benchmark," in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1116-1124, doi: 10.1109/ICCV.2015.133.

[15] NVIDIA Corporation, "Convolutional neural network (CNN)," NVIDIA Developer. [Online]. Available: https://developer.nvidia.com/discover/convolutional-neural-network

[16] D. Li, X. Chen and K. Huang, "Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios," in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 2015, pp. 111-115, doi: 10.1109/ACPR.2015.7486476.

[17] L. Bourdev, S. Maji and J. Malik, "Describing people: A poselet-based approach to attribute classification," in 2011 International Conference on Computer Vision, Barcelona, Spain, 2011, pp. 1543-1550, doi: 10.1109/ICCV.2011.6126413.

[18] Z. Tianyu, M. Zhenjiang and Z. Jianhu, "Combining CNN with Hand-Crafted Features for Image Classification," in 2018 14th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 2018, pp. 554-557, doi: 10.1109/ICSP.2018.8652428.

[19] Papers with Code, "Pedestrian attribute recognition," [Online]. Available: https://paperswithcode.com/task/pedestrian-attribute-recognition

[20] N. Zhang and J. Kim, "A Survey on Attention mechanism in NLP," in 2023 International Conference on Electronics, Information, and Communication (ICEIC), Singapore, 2023, pp. 1-4, doi: 10.1109/ICEIC57457.2023.10049971.

[21] X. Chen, C. Fu, M. Tie, C.-W. Sham, and H. Ma, "AFFNet: An attention-based feature-fused network for surface defect segmentation," Applied Sciences, vol. 13, no. 11, p. 6428, 2023. [Online]. Available: https://doi.org/10.3390/app13116428

[22] PaddlePaddle, "PaddleDetection: deploy pipeline README, " GitHub repository, release/2.7, 2023. [Online]. Available: https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.7/deploy/pipeline/README.md. [Accessed: Jul. 23, 2025].

[23] Y. Liu, J. Yan and W. Ouyang, "Quality Aware Network for Set to Set Recognition," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 4694-4703, doi: 10.1109/CVPR.2017.499.

[24] D. Li, Z. Zhang, X. Chen and K. Huang, "A Richly Annotated Pedestrian Dataset for Person Retrieval in Real Surveillance Scenarios," in IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1575-1590, April 2019, doi: 10.1109/TIP.2018.2878349.

[25] Y. Deng, P. Luo, C. C. Loy and X. Tang, "Pedestrian attribute recognition at far distance," in *Proc. 22nd ACM Int. Conf. Multimedia (ACM MM)*, Orlando, FL, USA, Nov. 2014, pp. 789–792, doi: 10.1145/2647868.2654966.

[26] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang and Q. Tian, "Scalable person re-identification: A benchmark," *IEEE Trans. Pattern Anal. Mach. Intell.*, vol. 38, no. 9, pp. 1623–1640, Sep. 2016, doi: 10.1109/TPAMI.2015.2491929.

[27] A. Bochkovskiy, C.-Y. Wang and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv preprint arXiv:2004.10934, 2020.

[28] C. Zhang, "A Survey of Visual Traffic Surveillance Using Spatio-Temporal Analysis and Mining, " International Journal of Multimedia Data Engineering and Management, vol. 4, no. 3, pp. 42–60, Jul. 2013, doi: 10.4018/JMDEM.2013070103.

[29] S. H.Y., G. Shivakumar and H. S. Mohana, "Crowd Behavior Analysis: A Survey," in 2017 International Conference on Recent Advances in Electronics and Communication Technology (ICRAECT), Bangalore, India, 2017, pp. 169-178, doi: 10.1109/ICRAECT.2017.66.

[30] D. A. Reid, M. S. Nixon, and S. V. Stevenage, “Soft Biometrics; Human Identification Using Comparative Descriptions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 6, pp. 1216–1228, Jun. 2014, doi: 10.1109/TPAMI.2013.219.

全文公開日期 2030/08/06

簡易檢索 / 詳目顯示

相關論文