跳到主要內容

簡易檢索 / 詳目顯示

研究生: 黃郁君
Huang,Yu-Chun
論文名稱: 探勘空間相關樣式之研究
Mining Frequent Spatial Co-relation Patterns
指導教授: 沈錳坤
Huang,Man-Kwan
學位類別: 碩士
Master
系所名稱: 理學院 - 資訊科學系
論文出版年: 2004
畢業學年度: 92
語文別: 英文
論文頁數: 64
中文關鍵詞: 資料探勘空間相關樣式
外文關鍵詞: data mining, spatial co-relation pattern
相關次數: 點閱:81下載:65
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在這個資訊快速擴張的時代,許多種類的資料庫被應用在各式各樣的領域中。空間資料探勘即是一個例子,它在空間資料庫中探勘出頻繁的樣式以及空間關係。空間資料探勘是在空間資料庫中挖掘出有趣的、以前不知道的、但實際上是有用的樣式或空間關係。
    在本篇論文中,我們探勘空間序列的問題。我們主要討論兩個主題:空間相關樣式,以及空間相似相關樣式。關於空間相關樣式,我們提出以Apriori為基礎以及深度優先為基礎的解法。在空間相關相似樣式部分,我們提出兩個演算法AP-mine以及AS-mine來解決我們的問題。在AP-mine中,我們提出一個名為AP-tree的資料結構來有效率的挖掘出空間相關相似樣式。最後我們以實驗來驗證我們的演算法。


    With the growth of data, a variety of databases are applied in many applications. Spatial data mining is an example, and it discovers patterns or spatial relations from large spatial databases. Spatial data mining is the process of discovering interesting and previously unknown, but potential useful patterns or spatial relations from large spatial databases.
    In this thesis, we explore the problem of spatial sequential pattern mining. The two issues spatial co-relation patterns and approximate spatial co-relation patterns will be discussed. We utilize Apriori-based method and depth-first based method to solve the problem of spatial co-relation patterns. About approximate co-relation spatial patterns, we propose two algorithms, named AP-mine and AS-mine. In AP-mine, we propose a data structure, named AP-tree, to efficient mining the approximate spatial co-relation patterns. Lastly, We also perform the experiments to evaluate our spatial co-relation pattern mining algorithms.

    CHAPTER 1 Introduction 1
    1.1 Overview 1
    1.2 Paper Organization 3
    CHAPTER 2 Review of Literature 5
    2.1 Location Prediction 6
    2.2 Spatial Outliers 7
    2.3 Spatial Co-location Rules 10
    2.3.1. Problem Definition and Basic Concepts 10
    2.3.2. Modeling the Co-location Rules 12
    2.3.3. Mining Co-location Rules 15
    2.4 Sequential Pattern Mining 17
    2.4.1 Basic Concepts 18
    2.4.2 Mining Closed Sequential Patterns 18
    2.4.3 Mining Periodic Patterns 19
    CHAPTER 3 Mining Spatial Co-relation Patterns 21
    3.1 Problem Definition 21
    3.2 The Apriori-based Strategy 24
    3.3 The Depth-first Strategy 32
    CHAPTER 4 Mining Approximate Spatial Co-relation Patterns 39
    4.1 Problem Definition 39
    4.2 AP-mine: Mining Frequent Approximate Patterns 41
    4.2.1 Construction of AP-tree 42
    4.2.2 Mining Frequent Approximate Patterns from AP-tree 46
    4.3 AS-mine: Mining Approximate Spatial Co-relation Patterns 49
    CHAPTER 5 Performance and Evaluation 54
    5.1 Generation of Synthetic Data 54
    5.2 Mining Spatial Co-relation Patterns 55
    5.3 Mining Approximate Spatial Co-relation Patterns 57
    5.3.1 AP-mine 57
    5.3.2 AS-mine 59
    CHAPTER 6 Conclusions 61
    References 62

    List of Figures

    FIG. 2.1. SPATIAL DATASETS TO EXPLAIN DIFFERENT MODELS TO DISCOVER CO-LOCATION PATTERNS 14
    FIG. 2.2. SPATIAL DATASETS 16
    FIG. 3.1. A SYMBOLIC PICTURE 22
    FIG. 3.2. PICTURE MATCHING EXAMPLE 23
    FIG. 3.3. SYMBOLIC PICTURES OF TABLE 3.1 24
    FIG. 3.4. GENERATING CANDIDATE SPATIAL PATTERNS 25
    FIG. 3.5. SYMBOLIC PICTURES 26
    FIG. 3.6. GENERATION OF CANDIDATE LENGTH-2 SPATIAL PATTERNS 26
    FIG. 3.7. GENERATION OF CANDIDATE LENGTH-3 SPATIAL PATTERNS 27
    FIG. 3.8. THE APRIORI-BASED ALGORITHM FOR MINING SPATIAL CO-RELATION PATTERNS 30
    FIG. 3.9. GENERATION THE SET OF CANDIDATE AND FREQUENT SPATIAL PATTERNS 31
    FIG. 3.10. THE DEPTH-FIRST BASED ALGORITHM FOR MINING SPATIAL CO-RELATION PATTERNS 34
    FIG. 3.11. GENERATING CANDIDATE BRANCHES OF A NODE 35
    FIG. 3.12. COUNTING FREQUENT EXTENSIONS OF A NODE 35
    FIG. 3.13. THE LEXICOGRAPHIC TREE OF SPATIAL CO-RELATION RULES 37
    FIG. 4.1. THE AP-MINE ALGORITHM FOR MINING FREQUENT APPROXIMATE PATTERNS 42
    FIG. 4.2. THE AP-TREE CONSTRUCTED OF APPROXIMATE PATTERNS IN TABLE 4.3 45
    FIG. 4.3. THE CONSTRUCTION OF AP-TREE 46
    FIG. 4.4. MAP ALGORITHM 47
    FIG. 4.5. THE CONDITIONAL AP-TREE BUILT FOR A, AP|A 48
    FIG. 4.6. THE CONDITIONAL AP-TREES AP|AB AND AP|A G 49
    FIG. 4.7. AS-MINE ALGORITHM 50
    FIG. 4.8. THE AP-TREE CONSTRUCTED OF APPROXIMATE 2D STRINGS IN THE X-DIRECTION OF TABLE 4.4 51
    FIG. 4.9. THE CONDITIONAL AP-TREES AP|A IN THE X-DIRECTION 52
    FIG. 4.10. THE CONDITIONAL AP-TREES AP|B IN THE X-DIRECTION 52
    FIG. 4.11. THE AP-TREE CONSTRUCTED OF APPROXIMATE 2D STRINGS IN THE Y-DIRECTION OF TABLE 4.4 53
    FIG. 4.12. THE CONDITIONAL AP-TREES AP|B IN THE Y-DIRECTION 53
    FIG. 5.1. EXECUTION TIMES FOR USING APRIORI-BASED STRATEGY 56
    FIG. 5.2. EXECUTION TIMES FOR USING DEPTH-FIRST BASED STRATEGY 56
    FIG. 5.3. COMPARISON OF EXECUTION TIMES OF SIX SYNTHETIC DATA 57
    FIG. 5.4. EXECUTION TIME OF THREE SYNTHETIC DATA FOR DIFFERENT MINIMUM SUPPORT 58
    FIG. 5.5. EXECUTION TIMES FOR DIFFERENT VARIETY OF OBJECTS IN THE DATABASE 59
    FIG. 5.6. EXECUTION TIMES FOR DIFFERENT AVERAGE LENGTHS OF STRINGS 59
    FIG. 5.7. EXECUTION TIMES FOR DIFFERENT MINIMUM SUPPORTS IN THE DATABASE 60

    List of Tables

    TABLE 2.1. CO-LOCATION MINER ALGORITHM ILLUSTRATION ON SPATIAL DATASETS IN FIG. 2.2 17
    TABLE 3.1. A 2D STRING DATABASE SDB2D 24
    TABLE 4.1. A STRING DATABASE SDB OF EXAMPLE 4.1 40
    TABLE 4.2. A 2D STRING DATABASE SDB2D OF EXAMPLE 4.2 40
    TABLE 4.3. A STRING DATABASE SDB 44
    TABLE 4.4. APPROXIMATE STRINGS OF SDB 44
    TABLE 4.5. A 2D STRING DATABASE SDB2D OF EXAMPLE 4.5 51
    TABLE 4.6. APPROXIMATE STRINGS OF SDB2D 51
    TABLE 5.1. PARAMETERS DEFINITION 55
    TABLE 5.2. PARAMETER SETTINGS OF THE SYNTHETIC DATA 57

    [1] R. Agarwal, C. C. Aggarwal, and V. V. V. Prasad. Depth First Generation of Long Patterns. Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000.
    [2] R. Agarwal, C. C. Aggarwal, and V. V. V. Prasad. A Tree Projection Algorithm for Finding Frequent Itemsets. Journal on Parallel Distributed Computing, Vol. 61, No. 3, 2001.
    [3] R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. Proceedings of the ACM SIGMOD International Conference on Management of Data, 1993.
    [4] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th International Conference on Very Large Data Bases, 1994.
    [5] S. K. Chang, Q. Y. Shi, and C. W. Yan, Iconic Indexing by 2D Strings, IEEE Transactions Pattern Analysis and Machine Intelligence, Vol. 9, No. 3, 1987.
    [6] S. Chawla, S. Shekhar, W. Wu, and U. Ozesmi. Modeling Spatial Dependencies for Mining Geospatial Data: An Introduction. In Geographic data mining and Knowledge Discovery (GKD), Harvey Miller and Jiawei Han (editors), 1999.
    [7] S. Chawla, S. Shekhar, W. Wu, and U. Ozesmi. Extending Data Mining for Spatial Applications: A Case Study in Predicting Nest Locations. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000.
    [8] M. S. Chen, J. Han, and P. S. Yu. Data Mining: An Overview from a Database Perspective, IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, 1996.
    [9] J. Han, G. Dong, and Y. Yin. Efficient Mining of Partial Periodic Patterns in Time Series Database. Proceedings of the IEEE International Conference on Data Engineering, 1999.
    [10] J. Han, W. Gong, and Y. Yin. Mining Segment-Wise Periodic Patterns in Time-Related Databases. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1998.
    [11] J. Han, K. Koperski, and N. Stefanovic. GeoMiner: A System Prototype for Spatial Data Mining. Proceedings of the ACM SIGMOD International Conference on Management of Data, 1997.
    [12] J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. Proceedings of the ACM SIGMOD International Conference on Management of Data, 2000.
    [13] D. Hawkins. Identification of Outliers. Chapman and Hall, 1980.
    [14] Y. Huang, S. Shekhar, and H. Xiong. Discovering Co-location Patterns from Spatial Datasets: A General Approach. Submitted to IEEE Transactions on Knowledge and Data Engineering, under second round review, 2002.
    [15] Y. Huang, H. Xiong, S. Shekhar, and J. Pei. Mining Confident Co-location Rules without A Support Threshold. Proceedings of the 18th ACM Symposium on Applied Computing, 2003.
    [16] K. Koperski and J. Han. Discovery of Spatial Association Rules in Geographic Information Databases. Proceedings of the 4th International Symposium on Large Spatial Databases, 1995.
    [17] S. Y. Lee, M. K. Shan, and W. P. Yang. Similarity Retrieval of Iconic Image Databases. Pattern Recognition, Vol. 22, No. 6, 1989.
    [18] H. J. Loether and D. G. McTavish. Descriptive and Inferential Statistics: An Introduction. Allyn and Bacon, 1993.
    [19] Y. Morimoto. Mining Frequent Neighboring Class Sets in Spatial Databases. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001.
    [20] J. S. Park, M. S. Chen, and P. S. Yu. An Effective Hash-Based Algorithm for Mining Association Rules, Proceedings of ACM SIGMOD International Conference on Management of Data, 1995.
    [21] J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu. Mining Access Patterns Efficiently from Web Logs. Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000.
    [22] G. Piatetsky-Shapiro and W. J. Frawley. Knowledge Discovery in Databases. AAAI∕MIT Press, 1991.
    [23] S. Shekhar and S. Chawla. Introduction to Spatial Data Mining. In Spatial Databases: A Tour, Prentice Hall, 2003.
    [24] S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X. Liu, and C. T. Lu. Spatial Databases: Accomplishments and Research Needs. IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 1, 1999.
    [25] S. Shekhar and Y. Huang. Discovering Spatial Co-location Patterns: A Summary of Results. Proceedings of 7th International Symposium on Spatial and Temporal Databases, 2001.
    [26] S. Shekhar, Y. Huang, W. Wu, C. T. Lu, and S. Chawla, What's Spatial about Spatial Data Mining: Three Case Studies. In Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers, 2001.
    [27] S. Shekhar, C. T. Lu, and P. Zhang. Detecting Graph-Based Spatial Outliers: Algorithms and Applications (Summary of Results). Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001.
    [28] S. Shekhar, C. T. Lu, and P. Zhang. Detecting Graph-based Spatial Outliers. Intelligent Data Analysis, 2002.
    [29] S. Shekhar, C. T. Lu, and P. Zhang. A Unified Approach to Detecting Spatial Outliers. Geoinformatica, Vol. 7, Issue 2, 2003.
    [30] S. Shekhar, P. Schrater, R. Vatsavai, W. Wu, and S. Chawla. Spatial Contextual Classification and Prediction Models for Mining Geospatial Data. IEEE Transactions on Multimedia (special issue on Multimedia Databases), 2002.
    [31] S. Shekhar, P. Zhang, Y. Huang, and R. R. Vatsavai. Trends in Spatial Data Mining. In Data Mining: Next Generation Challenges and Future Directions, Hillol Kargupta and Anupam Joshi (editors), AAAI/MIT Press, 2003.
    [32] S. Shekhar, P. Zhang, Y. Huang, and R. R. Vatsavai. Spatial Data Mining, Proceedings of SIAM International Conference on Data Mining, 2003.
    [33] W. R. Tobler. Cellular Geography. In Philosophy in Geography, Dordrecht, 1979.
    [34] M. F. Worboys. GIS: A Computing Perspective. Taylor and Francis, 1995.
    [35] X. Yan, J. Han, and R. Afshar. CloSpan: Mining Closed Sequential Patterns in Large Datasets. Proceedings of SIAM International Conference on Data Mining, 2003.

    QR CODE
    :::