| 研究生: |
張哲維 Chang, Che-Wei |
|---|---|
| 論文名稱: |
以多重視覺特徵建立影像檢索 Image Retrieval Based on Multiple Visual Features |
| 指導教授: |
羅崇銘
Lo, Chung-Ming |
| 口試委員: |
林于翔
Lin, Yu-Shiang 陸行 Luh, Hsing |
| 學位類別: |
碩士
Master |
| 系所名稱: |
文學院 - 圖書資訊與檔案學研究所 Graduate Institute of Library, Information and Archival Studies |
| 論文出版年: | 2026 |
| 畢業學年度: | 114 |
| 語文別: | 中文 |
| 論文頁數: | 65 |
| 中文關鍵詞: | 電子商務 、商品分類 、多重特徵 、影像檢索 、深度學習 |
| 外文關鍵詞: | E-commerce, Product classification, Multiple features, Image retrieval, Deep learning |
| 相關次數: | 點閱:33 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著電子商務的蓬勃發展,傳統基於文字的商品分類與檢索方法面臨重大挑戰。現有的分類系統因商品種類繁雜而出現分類不精確問題,且難以及時更新分類結構。同時,傳統檢索方式存在檢索結果相關性不足、跨語言和跨文化表達差異,以及電腦系統與人類認知之間的語意落差等問題。這些挑戰不僅影響消費者的購物體驗,也降低了平台的轉化率。特別在全球化的電子商務環境中,文字檢索方式更難以有效處理跨文化和跨語言的購物需求。本研究提出一種基於多重視覺特徵的商品影像檢索方法,結合Vision Transformer作為骨幹網路,並設計了包含線條複雜度、色彩特徵和商品類別的三層分類架構。在資料集方面,本研究使用Amazon Product Dataset 2023,該資料集經過整理後包含18個主要商品類別,共計294,061張商品影像。在特徵提取方面,本研究透過多尺度熵進行線條複雜度分群,並採用CIELAB色彩空間進行色彩特徵分群,最後整合這些特徵進行商品類別的分類。在模型設計上,本研究採用分階段訓練策略,通過逐步凍結已訓練的特徵層,確保模型能夠有效學習不同層次的視覺特徵。在檢索階段,系統使用餘弦相似度進行特徵匹配,並採用平均精準度(mean Average Precision, mAP)作為檢索效能的評估指標。
With the flourishing of e-commerce, traditional text-based product classification and retrieval methods face significant challenges. Existing classification systems suffer from inaccuracies due to complex product categories and struggle to update their classification structures promptly. Traditional retrieval methods encounter issues including insufficient search relevance, cross-language and cross-cultural expression differences, and the semantic gap between computer systems and human cognition. These challenges not only affect user shopping experiences but also reduce platform conversion rates. In the globalized e-commerce environment, text-based retrieval particularly struggles with cross-cultural and cross-linguistic shopping requirements.This research proposes a product image retrieval method based on multiple visual features, incorporating Vision Transformer as the backbone network and designing a three-layer classification architecture that includes line complexity, color features, and product categories. The study utilizes the Amazon Product Dataset 2023, containing 294,061 product images across 18 major categories. For feature extraction, the research analyzes line complexity through multiscale entropy and employs CIELAB color space for color feature clustering, ultimately integrating these features for product category classification. The model design adopts a staged training strategy, progressively freezing trained feature layers to ensure effective learning of visual features at different levels. During the retrieval phase, the system uses cosine similarity for feature matching and employs mean Average Precision (mAP) as the performance evaluation metric.
謝辭 I
摘要 II
ABSTRACT III
圖目錄 V
表目錄 VII
第一章 緒論 1
第一節 電子商務 1
第二節 商品的分類與檢索 4
第二章 文獻探討 7
第三章 研究材料與方法 10
第一節 商品影像資料集 11
第二節 多重視覺特徵分類網路 25
一、 特徵擷取骨幹網路 25
二、 多重視覺分類 30
第三節 影像檢索方法 39
第四章 結果 44
第一節 分類效能分析 44
第二節 檢索效能分析 48
第五章 結論與討論 56
第六章 未來方向 58
參考文獻 60
Amazon Products Sales Dataset 2023. (2023). https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset
Bagirov, A. M., Aliguliyev, R. M., & Sultanova, N. (2023). Finding compact and well-separated clusters: Clustering using silhouette coefficients. Pattern recognition, 135, 109144.
Chen, J., Ma, L., Li, X., Xu, J., Cho, J. H. D., Nag, K., Korpeoglu, E., Kumar, S., & Achan, K. (2024). Relation labeling in product knowledge graphs with large language models for e-commerce. International Journal of Machine Learning and Cybernetics, 15(12), 5725-5743. https://doi.org/10.1007/s13042-024-02274-5
Chen, J., Zeb, A., Yang, S., Zhang, D., & Nanehkaran, Y. A. (2021). Automatic identification of commodity label images using lightweight attention network. Neural Computing and Applications, 33(21), 14413-14428. https://doi.org/10.1007/s00521-021-06081-9
Chocarro, R., Cortiñas, M., & Villanueva, A. (2022). Attention to product images in an online retailing store: An eye-tracking study considering consumer goals and type of product. Journal of Electronic Commerce Research, 23(4), 257-281.
da Silva Torres, R., & Falcao, A. X. (2006). Content-based image retrieval: theory and applications. RITA, 13(2), 161-185.
Dagan, A., Guy, I., & Novgorodov, S. (2023). Shop by image: characterizing visual search in e-commerce. Information Retrieval Journal, 26(1), 2.
Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2008). Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv., 40(2), Article 5. https://doi.org/10.1145/1348246.1348248
Delazio, A., Israr, A., & Klatzky, R. L. (2017). Cross-modal correspondence between vibrations and colors. 2017 IEEE World Haptics Conference (WHC).
Di, W., Sundaresan, N., Piramuthu, R., & Bhardwaj, A. (2014). Is a picture really worth a thousand words? -on the role of images in e-commerce. Proceedings of the 7th ACM international conference on Web search and data mining.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
Feng, Y. (2023). Green Progress of Cross-border E-Commerce Industry Utilizing Random Forest Algorithm and Panel Tobit Model. Applied Artificial Intelligence, 37(1), 2219561. https://doi.org/10.1080/08839514.2023.2219561
Ghaleb, M. S., Ebied, H. M., Shedeed, H. A., & Tolba, M. F. (2022). Image retrieval based on deep learning. J. Syst. Manag. Sci, 12(2), 477-496.
Hameed, I. M., Abdulhussain, S. H., & Mahmmod, B. M. (2021). Content-based image retrieval: A review of recent trends. Cogent Engineering, 8(1), 1927469. https://doi.org/10.1080/23311916.2021.1927469
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition,
Hu, J., & Xu, L. (2022). Cross‐Border E‐Commerce Business Model Based on Big Data and Blockchain. Mobile Information Systems, 2022(1), 9986371.
Huang, G., Liu, Z., Maaten, L. V. D., & Weinberger, K. Q. (2017, 21-26 July 2017). Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Humeau-Heurtier, A. (2019). Texture Feature Extraction Methods: A Survey. IEEE Access, 7, 8975-9000. https://doi.org/10.1109/ACCESS.2018.2890743
Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178-210.
Karamanolakis, G., Ma, J., & Dong, X. L. (2020). Txtract: Taxonomy-aware knowledge extraction for thousands of product categories. arXiv preprint arXiv:2004.13852.
Katz, J. J., & Fodor, J. A. (1963). The structure of a semantic theory. language, 39(2), 170-210.
Kingma, D. P. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Krishnan, C., & Mariappan, J. (2024). “The AI Revolution in E-Commerce: Personalization and Predictive Analytics”. In L. Gaur & A. Abraham (Eds.), Role of Explainable Artificial Intelligence in E-Commerce (pp. 53-64). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-55615-9_4
Lhermitte, E., Hilal, M., Furlong, R., O’Brien, V., & Humeau-Heurtier, A. (2022). Deep Learning and Entropy-Based Texture Features for Color Image Classification. Entropy, 24(11), 1577.
Li, X., Yang, J., & Ma, J. (2020). Large Scale Category-Structured Image Retrieval for Object Identification Through Supervised Learning of CNN and SURF-Based Matching. IEEE Access, 8, 57796-57809. https://doi.org/10.1109/ACCESS.2020.2982560
Li, Y., & Li, Z. (2024). Research on B2C Cross-Border Electronic Commerce Return Logistics Model Selection Based on Estimated Return Rate. Journal of Theoretical and Applied Electronic Commerce Research, 19(2), 1034-1059. https://www.mdpi.com/0718-1876/19/2/54
Li, Y., Zhang, Y., Huang, X., Zhu, H., & Ma, J. (2017). Large-scale remote sensing image retrieval by deep hashing neural networks. IEEE Transactions on Geoscience and Remote Sensing, 56(2), 950-965.
Liu, C., Hou, P., Zeng, A., & Yu, H. (2024). Transformer-empowered multi-modal item embedding for enhanced image search in e-commerce. Proceedings of the AAAI Conference on Artificial Intelligence.
Liu, Y., Zhang, D., Lu, G., & Ma, W.-Y. (2007). A survey of content-based image retrieval with high-level semantics. Pattern recognition, 40(1), 262-282.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF international conference on computer vision.
Liu, Z., Luo, P., Qiu, S., Wang, X., & Tang, X. (2016). Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. Proceedings of the IEEE conference on computer vision and pattern recognition.
Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A convnet for the 2020s. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Lo, C.-M., & Hsieh, C.-Y. (2024). Large-scale hierarchical medical image retrieval based on a multilevel convolutional neural network. IEEE Transactions on Emerging Topics in Computational Intelligence.
Luo, M. R. (2023). Cielab. In Encyclopedia of color science and technology (pp. 251-257). Springer.
Luo, M. R., Cui, G., & Rigg, B. (2001). The development of the CIE 2000 colour‐difference formula: CIEDE2000. Color Research & Application: Endorsed by Inter‐Society Color Council, The Colour Group (Great Britain), Canadian Society for Color, Color Science Association of Japan, Dutch Society for the Study of Color, The Swedish Colour Centre Foundation, Colour Society of Australia, Centre Français de la Couleur, 26(5), 340-350.
Markowska-Kaczmar, U., & Kwaśnicka, H. (2018). Deep learning—A new era in bridging the semantic gap. In Bridging the Semantic Gap in Image and Video Analysis (pp. 123-159). Springer.
Mayayise, T. O. (2024). Investigating factors influencing trust in C2C e-commerce environments: A systematic literature review. Data and Information Management, 8(1), 100056. https://doi.org/https://doi.org/10.1016/j.dim.2023.100056
Mondol, E. P., Salman, N. A., Rahid, A. O., & Karim, A. M. (2021). The effects of visual merchandising on consumer’s willingness to purchase in the fashion retail stores. International Journal of Academic Research in Business and Social Sciences, 11(7), 386-401.
Morel, C., & Humeau-Heurtier, A. (2021). Multiscale permutation entropy for two-dimensional patterns. Pattern Recognition Letters, 150, 139-146. https://doi.org/https://doi.org/10.1016/j.patrec.2021.06.028
Plekhanov, D., Franke, H., & Netland, T. H. (2023). Digital transformation: A review and research agenda. European Management Journal, 41(6), 821-844. https://doi.org/https://doi.org/10.1016/j.emj.2022.09.007
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., & Bernstein, M. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3), 211-252.
Shamoi, P., Sansyzbayev, D., & Abiley, N. (2022, 28-30 April 2022). Comparative Overview of Color Models for Content-Based Image Retrieval. 2022 International Conference on Smart Information Systems and Technologies (SIST).
Shiau, R., Wu, H.-Y., Kim, E., Du, Y. L., Guo, A., Zhang, Z., Li, E., Gu, K., Rosenberg, C., & Zhai, A. (2020). Shop the look: Building a large scale visual shopping system at pinterest. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
Silva, L. E., Duque, J. J., Felipe, J. C., Murta Jr, L. O., & Humeau-Heurtier, A. (2018). Two-dimensional multiscale entropy analysis: Applications to image texture evaluation. Signal Processing, 147, 224-232.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Singh, S., & Jang, S. (2022). Search, purchase, and satisfaction in a multiple-channel environment: How have mobile devices changed consumer behaviors? Journal of Retailing and Consumer Services, 65, 102200. https://doi.org/https://doi.org/10.1016/j.jretconser.2020.102200
Skare, M., & Riberio Soriano, D. (2021). How globalization is changing digital technology adoption: An international perspective. Journal of Innovation & Knowledge, 6(4), 222-233. https://doi.org/https://doi.org/10.1016/j.jik.2021.04.001
Srivastava, D., Singh, S. S., Rajitha, B., Verma, M., Kaur, M., & Lee, H. N. (2023). Content-Based Image Retrieval: A Survey on Local and Global Features Selection, Extraction, Representation, and Evaluation Parameters. IEEE Access, 11, 95410-95431. https://doi.org/10.1109/ACCESS.2023.3308911
Statista. (2022). Estimated value of the in-store and e-commerce retail sales worldwide from 2022 and 2026 (in trillion U.S. dollars) [Graph]. https://www.statista.com/statistics/1095969/retail-sales-by-channel-worldwide/
Sun, Z., Jing, D., Yang, G., Fei, N., & Lu, Z. (2025). Leveraging large vision-language model as user intent-aware encoder for composed image retrieval. Proceedings of the AAAI Conference on Artificial Intelligence.
Taher, G. (2021). E-commerce: advantages and limitations. International Journal of Academic Research in Accounting Finance and Management Sciences, 11(1), 153-165.
Varish, N., Singh, P., Tugiti, P., Manikanta, M. H., Yedlapalli, B., Pappusetty, A., Thakkar, H. K., & Sharma, G. (2023). Color Image Retrieval Method Using Low Dimensional Salient Visual Feature Descriptors for IoT Applications. Comput Intell Neurosci, 2023, 6257573. https://doi.org/10.1155/2023/6257573
Vimina, E. R., & Divya, M. O. (2020). Maximal multi-channel local binary pattern with colour information for CBIR. Multimedia Tools and Applications, 79(35), 25357-25377. https://doi.org/10.1007/s11042-020-09207-8
Wahsheh, F. R., Al Moaiad, Y., El-Ebiary, Y. A. B., Hamzah, W. M. A. F. W., Yusoff, M. H., & Pandey, B. (2023). E-Commerce Product Retrieval Using Knowledge from GPT-4. 2023 International Conference on Computer Science and Emerging Technologies (CSET).
Wang, Q., Ma, D., Chen, H., Ye, X., & Xu, Q. (2020). Effects of background complexity on consumer visual processing: An eye-tracking study. Journal of Business Research, 111, 270-280. https://doi.org/https://doi.org/10.1016/j.jbusres.2019.07.018
Wang, W., Jiao, P., Liu, H., Ma, X., & Shang, Z. (2022). Two-stage content based image retrieval using sparse representation and feature fusion. Multimedia Tools and Applications, 81(12), 16621-16644. https://doi.org/10.1007/s11042-022-12348-7
Weatherall, I. L., & Coombs, B. D. (1992). Skin color measurements in terms of CIELAB color space values. Journal of investigative dermatology, 99(4), 468-473.
Wu, M., Liang, J., Dai, J., & Li, Y. (2023). Research on the Influencing Factors of Consumer Online Purchasing Behavior in B2C. Frontiers in Business, Economics and Management, 9(2), 49-53.
Xia, P., Zhang, L., & Li, F. (2015). Learning similarity with cosine similarity ensemble. Information Sciences, 307, 39-52. https://doi.org/https://doi.org/10.1016/j.ins.2015.02.024
Yu, W., Sun, Z., Liu, H., Li, Z., & Zheng, Z. (2018). Multi-level Deep Learning based e-Commerce Product Categorization. eCOM@ SIGIR.
Zenggang, X., Zhiwen, T., Xiaowen, C., Xue-min, Z., Kaibin, Z., & Conghuan, Y. (2021). Research on Image Retrieval Algorithm Based on Combination of Color and Shape Features. Journal of Signal Processing Systems, 93(2), 139-146. https://doi.org/10.1007/s11265-019-01508-y
Zhang, Y., Li, X., Chen, W., & Zang, Y. (2024). Image Classification Based on Low-Level Feature Enhancement and Attention Mechanism. Neural Processing Letters, 56(4), 217. https://doi.org/10.1007/s11063-024-11680-3
Zhang, Y., Pan, P., Zheng, Y., Zhao, K., Zhang, Y., Ren, X., & Jin, R. (2018). Visual search at alibaba. Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining.
Zhu, X., Huang, S.-W., Ding, H., Yang, J., Chen, K., Zhou, T., Neiman, T., Xie, O., Tran, S., & Yao, B. (2024). Bringing multimodality to Amazon visual search system. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
全文公開日期 2031/01/15