| 研究生: |
龔怡寧 Kung, Yi-Ning |
|---|---|
| 論文名稱: |
A Schema and Ontology-Assisted Heterogeneous Information Integration Study 運用綱要和本體論以協助異質資訊整合之研究 |
| 指導教授: |
諶家蘭
Seng, Jia-Lang |
| 學位類別: |
碩士
Master |
| 系所名稱: |
商學院 - 資訊管理學系 Department of Management Information System |
| 論文出版年: | 2004 |
| 畢業學年度: | 92 |
| 語文別: | 英文 |
| 論文頁數: | 86 |
| 中文關鍵詞: | 異質資訊整合 、延伸性標記語言 、本體論 、結構互動性和語意互動性 |
| 外文關鍵詞: | Heterogeneous Information Integration, XML, Ontology, Syntactic and Semantic Interoperability |
| 相關次數: | 點閱:94 下載:29 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於對資訊科技以及網際網路/和企業內網路的依賴持續加深,異質資訊整合在電子化企業中已經成為一個普遍存在而且相當重要的議題。因為在缺乏整合的情形下個別地存取異質資訊來源可能會造成資訊的混亂,而且在電子化企業的環境中,這麼做也不符合成本效益決策支援管理分析。在傳統異質資訊整合的研究中,通常會創造一個共同資料模式來處理異質性的問題,而可延伸性標記語言已經成為網路上交換資訊時的標準文件格式,使得XML成為整合工作中共同資料模式的一個很好的候選者;然而,XML僅能夠處理結構異質性,無法處理語意異質性,而本體論被視為是一個重要而且自然的工具可以用來表現真實世界中模糊不清的語意和關係,因此,在本研究中也加入了本體論以期達到異質資訊整合中的語意互動性。
在本篇論文中,我們提出一個以學名結構導向非特殊隨機式對應的方法來產生全區域綱要方法(Global Schema),以促成非傳統而是以網路為基礎的異質資訊整合。我們也提出一個對異質資訊來源較具智慧性的查詢方法,該查詢方法應用了global-as-view (GAV)全區域景觀導向方法加上本體論觀念運用,可以同時提高對底層異質資訊來源的結構互動性和語意互動性。我們透過雛型系統的實作來驗證本研究所提供的異質資訊整合方法的可行性。
The research issues of heterogeneous information integration have become ubiquitous and critically important in e-business (EB) with the increasing dependence on Internet/Intranet and information technology (IT). Accessing the heterogeneous information sources separately without integration may lead to the chaos of information requested. It is also not cost-effective in EB settings. A common general way to deal with heterogeneity problems in traditional HII is to create a common data model. The eXtensible Markup Language (XML) has been the standard data document format for exchanging information on the Web. XML only deals with the structural heterogeneity; it can barely handle the semantic heterogeneity. Ontologies are regarded as an important and natural means to represent the implicit semantics and relationships in the real world. And they are used to assist to reach semantic interoperability in HII in this research.
In this thesis, we provide a generic construct orientation no ad hoc method to generate the global schema to enable the web-based alternative to traditional HII. We provide a wiser query method over multiple heterogeneous information sources by applying global-as-view (GAV) approach with the use of ontology to enhance both structural and semantic interoperability of the underlying heterogeneous information sources. We construct a prototype implementing the method to provide a proof on the validity and feasibility.
TABLE OF CONTENTS
LIST OF FIGURES III
LIST OF TABLES IV
CHAPTER 1 INTRODUCTION 1
1.1. RESEARCH MOTIVATION 1
1.2. RESEARCH ISSUE 2
1.3. RESEARCH OBJECTIVE 3
1.4. RESEARCH FLOW 4
1.5. RESEARCH ORGANIZATION 5
CHAPTER 2 LITERATURE REVIEW 7
2.1. DISTRIBUTED INFORMATION INTEGRATION 7
2.1.1. TSIMMIS 7
2.1.2. Information Manifold 9
2.1.3. DISCO 10
2.1.4. Garlic & Clio 12
2.1.5. YAT 14
2.2. XML-BASED INFORMATION INTEGRATION 16
2.2.1. MIX 16
2.2.2. Agora 17
2.3. ANALYSIS AND COMPARISON 19
2.4. ONTOLOGY 26
2.4.1. Definition of Ontology 26
2.4.2. Ontology Representation Languages 27
2.4.3. Factors of Using Ontologies in Information Integration System 27
CHAPTER 3 RESEARCH METHOD 29
3.1. RESEARCH METHOD 29
3.2. RESEARCH STRUCTURE 29
3.3. INFORMATION INTEGRATION METHOD IN RESEARCH STRUCTURE 34
3.3.1. The Creation of Global Schema 35
3.3.2. The Creation of Ontology 52
3.3.3. Mapping Global Schema to Local Data Sources 54
3.4. QUERY RESOLUTION IN RESEARCH STRUCTURE 57
CHAPTER 4 RESEARCH PROTOTYPE 64
4.1. PROTOTYPE SYSTEM ARCHITECTURE 64
4.2. PROTOTYPE SYSTEM PLATFORM 65
4.3. PROTOTYPE SYSTEM DESIGN 66
4.4. PROTOTYPE SYSTEM PRESENTATION 70
CHAPTER 5 RESEARCH DISCUSSIONS AND LIMITATIONS 77
5.1. RESEARCH IMPLICATIONS 77
5.2. RESEARCH LIMITATIONS 81
CHAPTER 6 CONCLUSIONS AND FUTURE RESEARCH DIRECTIONS 83
6.1. SUMMARY 83
6.2. FUTURE RESEARCH DIRECTIONS 84
REFERENCES 86
A SIMPLE EXAMPLE IMPLEMENTATION 90
LIST OF FIGURES
FIGURE 1 1: RESEARCH FLOW 5
FIGURE 2 1: TSIMMIS ARCHITECTURE 8
FIGURE 2 2: ARCHITECTURE OF THE INFORMATION MANIFOLD 10
FIGURE 2 3: DISCO ARCHITECTURE 11
FIGURE 2 4: GARLIC ARCHITECTURE 13
FIGURE 2 5: CLIO’S LOGICAL ARCHITECTURE 14
FIGURE 2 6: YAT TRANSLATION SCENARIO 15
FIGURE 2 7: MIX ARCHITECTURE 17
FIGURE 2 8: GENERAL ARCHITECTURE OF THE AGORA DATA INTEGRATION SYSTEM 18
FIGURE 3 1: RESEARCH STRUCTURE 31
FIGURE 3 2: COMPONENTS IN RESEARCH STRUCTURE 32
FIGURE 3 3: THE GLOBAL INTEGRATION PROCESS 36
FIGURE 3 4: TRANSFORM RELATIONAL DATA MODEL INTO XML DATA MODEL 39
FIGURE 3 5: REWRITE RELATIONAL SCHEMA INTO W3C XML SCHEMA ACCORDING TO THE GENERIC CONSTRUCTS CORRESPONDENCE 40
FIGURE 3 6: AN EXAMPLE OF TRANSFORMING OBJECT DATA MODEL TO XML DATA MODEL 42
FIGURE 3 7: AN EXAMPLE OF TRANSFORMING OBJECT DATABASE SCHEMA TO XML SCHEMA 45
FIGURE 3 8: AN INTEGRATED SCHEMA IN W3C XML SCHEMA FOR THE EXAMPLE 51
FIGURE 3 9 A FRAGMENT OF THE EXAMPLE OF THE MAPPING BETWEEN GLOBAL SCHEMA AND SOURCE SCHEMA 57
FIGURE 3 10: QUERY PROCESSING IN RESEARCH STRUCTURE 59
FIGURE 4 1: THE PROTOTYPE SYSTEM ARCHITECTURE 65
FIGURE 4 2: DEMONSTRATION OF THE CREATION OF THE ONTOLOGY BY MEANS OF PROTÉGÉ 2.0 67
FIGURE 4 3: PROTOTYPE SYSTEM FUNCTIONS 68
FIGURE 4 4: QUERY INTERFACE OF THE PROTOTYPE SYSTEM 70
FIGURE 4 5: USERS FORMULATE THE XQUERY EXPRESSION OF THEIR OWN QUERIES ACCORDING TO THE GLOBAL SCHEMA 71
FIGURE 4 6: THE REFORMULATED QUERY 72
FIGURE 4 7: THE QUERY PLAN GENERATED BY THE PROTOTYPE SYSTEM 73
FIGURE 4 8: THE DECOMPOSED SUB-QUERIES AND THE TRANSLATED QUERY GENERATED BY WRAPPERS 74
FIGURE 4 9: THE DECOMPOSED SUB-QUERIES AND THE TRANSLATED QUERY GENERATED BY WRAPPERS (CONTINUE) 74
FIGURE 4 10: QUERY-PROCESSING COMPLETE 75
FIGURE 4 11: THE QUERY RESULT IN XML DOCUMENT 76
LIST OF TABLES
TABLE 2 1: COMPARISON OF INFORMATION INTEGRATION METHODS 23
TABLE 3 1: CORRESPONDENCES BETWEEN RELATIONAL SCHEMA CONSTRUCTS AND W3C XML SCHEMA CONSTRUCTS 38
TABLE 3 2: CORRESPONDENCES BETWEEN OBJECT DATABASE SCHEMA CONSTRUCTS AND W3C XML SCHEMA CONSTRUCTS 40
TABLE 3 3: CAUSES FOR STRUCTURAL HETEROGENEITY 46
TABLE 3 4: CAUSES FOR SEMANTIC HETEROGENEITY 53
TABLE 3 5: COMPARISON BETWEEN GAV AND LAV 55
TABLE 3 6: THE CORRESPONDENCES BETWEEN XQUERY EXPRESSION AND SQL EXPRESSION 61
TABLE 3 7: THE CORRESPONDENCES BETWEEN XQUERY EXPRESSION AND OQL EXPRESSION 62
References
中文參考文獻
梁定澎 (1997). 資訊管理研究方法概論. 資訊管理學報, 第四卷第1期, 頁1-6.
王瑞娟 (2002). 資料交換與查詢在XML文件與關連資料庫之間. 國立政治大學資訊管理研究所碩士論文.
English References
Baru, C. K., Gupta, A., Ludascher, B., Marciano, R., Papakonstantinou, Y., Velikhov, P., & Chu, V. (1999). XML-Based Information Mediation with MIX. Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD1999), 597-599.
Baru, C. K., Ludäscher, B., Papakonstantinou, Y., Velikhov, P., & Vianu, V. (1998). Features and Requirements for an XML View Definition Language: Lessons from XML Information Mediation. Position paper, W3C Query Language Workshop (QL’98).
Carey, M., Hass, L. M., Schwarz, P. M., Arya, M., Cody, W. F., Fagin, R., Flickner, M., Luniewski, A. W., Niblack, W., Petkovic, D., Thomas, J., Williams, J. H., & Wimmers, E. L. (1995). Towards Heterogeneous Multimedia Information Systems: The Garlic Approach. 5th International Workshop on Research Issues in Data Engineering-Distributed Object Management (RIDE-DOM’95), 124-131.
Chawathe, S., Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J., & Widom, J. (1994). The TSIMMIS Project: Integration of Heterogeneous Information Sources. Proceedings of the 10th Meeting of the Information Processing Society of Japan (IPSJ), 7-18.
Chu, Yu-Chi. (2001). Integrating Heterogeneous Information Sources through Ontology-Driven Model and Data Quality Analysis. Doctoral Dissertation, Department of Electronic Engineering, National Taiwan University of Science and Technology.
Cluet, S., Delobel, C., Siméon, J., & Smaga, K. (1998). Your Mediators Need Data Conversion. Proceedings of the ACM SIGMOD Conference of Management of Data.
Cui, Z., Jones, D., & O’Brien, P. (2001). Issues in Ontology-based Information Integration. Paper in Joint Session with IJCAI-01 Workshop on Ontologies & Information Sharing.
Decker, S., Melnik, S., Harmelen, F. V., Fensel, D., Klein, M., Broekstra, J., Erdmann, M., & Horrocks, I. (2000). The Semantic Web: The Roles of XML and RDF. IEEE Internet Computing, 4(5), 63-74.
Ding, Y., Fensel, D., Klein, M., & Omelayenko, B. (2002). The semantic web: yet another hip. Data & Knowledge Engineering, 41(2-3), 205-227.
Elmasri, R., & Navathe, S. B. (2004). Fundamentals of Database Systems. (4th ed.). Addison-Wesley.
Erdmann, M., & Decker, S. (2000). Ontology-aware XML-Queries. Submission for WebDB 2000.
Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., & Widom, J. (1997). The TSIMMIS Approach to Mediation: Data Models and Languages. Journal of Intelligent Information Systems, 8(2), 117-132.
Gruber, T. R. (1993). A translation approach to portable ontologies. Knowledge Acquisition, 5(2), 199-220.
Hass, L. M., Miller, R. J., Niswonger, B., Roth, M. T., Schwarz, P. M., & Wimmers, E. L. (1997). Transforming Heterogeneous Data with Database Middleware: Beyond Integration. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering.
Jhingran, A. D., Mattos, N., & Pirahesh, H. (2002). Information integration: A research agenda. IBM SYSTEMS JOURNAL, 41(4), 555-562.
Josifovski, V., Schwarz, P., Haas, L., & Lin, E. (2002). Garlic: A New Flavor of Federated Query Processing for DB2. Proceedings of the 2002 ACM SIGMOD international conference on Management of data, 524-532.
Kashyap, V., & Sheth A. (1996). Semantic and schematic similarities between database objects: a context-based approach. The VLDB Journal, 5, 276-304.
Kirk, T., Levy, A., Sagiv, Y., & Srivastava, D. (1995). The Information Manifold. Proceedings of the AAAI Spring Symposium on Information Gathering.
Kuo, W. (2003). A Generic Construct based Transformation Model between UML Data Model and XML. Master Thesis, Department of Management Information System, National Chengchi University.
Levy, A. Y. (2000). Logic-Based Techniques in Data Integration. Logic Based Artificial Intelligence.
Levy, A. Y., Rajaraman, A., & Ordille, J. J. (1996). Querying heterogeneous information sources using source descriptions. Proceedings of the Twenty-second International Conference on Very Large Databases, 251-262.
Mena, E., Illarramendi, A., Kashyap, V., & Sheth, A. P. (2000). OBSERVER: An Approach for Query Processing in Global Information Systems based on Interoperation across Pre-existing Ontologies. Distributed and Parallel Databases, 8(2), 223-271.
Manolescu, I., Florescu, D., & Kossmann, D. (2001). Answering XML Queries over Heterogeneous Data Sources. Proceedings of the 27th VLDB Conference.
Manolescu, I., Florescu, D., Kossmann, D., Xhumari, F., & Olteanu, D. (2000). Agora: Living with XML and Relational. Proceedings of the 26th VLDB Conference.
Miller, R. J., Hernández, M. A., Haas, L. M., Yan, L., Ho, C. T. H., Fagin, R., & Popa, L. (2001). The Clio project: managing heterogeneity. ACM SIGMOD Record, 30(1), 78-83.
Parent, C., & Spaccapietra, S. (1998). Issues and Approaches of Database Integration. Communications of ACM, 41(5), 166-178.
Rahm, E., & Bernstein, P. A. (2001). A survey of approaches to automatic schema matching. The VLDB Journal, 10, 334-350.
Roddick, J. F. (1995). A Survey of Schema Versioning Issues for Database Systems. Information and Software Technology, 37(7), 383-393.
Roth, M. T., Arya, M., Hass, L., Carey, M., Cody, W., Fagin, R., Schwarz, P., Thomas, J., & Wimmers, E. (1996). The Garlic Project. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, 557.
Sugumaran, V., & Storey, V. C. (2002). Ontologies for conceptual modeling: their creation, use, and management. Data & Knowledge Engineering, 42(3), 251-271.
Tomasic, A., Amouroux, R., Bonnet, P., Kapitskaia, O., Naacke, H., & Raschid, L. (1997). The Distributed Information Search Component (Disco) and the World Wide Web. ACM SIGMOD.
Tomasic, A., Raschid, L., & Valduriez, P. (1998). Scaling Access to Distributed Heterogeneous Data Sources with DISCO. Proceedings of the IEEE Transactions on Knowledge and Data Engineering.
Uschold, M., & Grüniger, M. (1996). Ontologies: principles, methods and applications. Knowledge Engineering Review, 11(2), 93-136.
Vdovjak, R., & Houben, G. (2001). RDF-Based Architecture for Semantic Integration of Heterogeneous Information Sources. Proceedings of the Workshop on Information Integration on the Web 2001, 51-57.
Visser, U., Stuckenschmidt, H., & Wache, H. (2003). Ontology-based Information Integration. IJCAI-Tutorial SP5. http://www.cs.vu.nl/~heiner/IJCAI-03/Tutorial (Data Accessed: January 7, 2004)
Wache, H., Vögele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., & Hübner, S. (2001). Ontology-Based Integration of Information-A Survey of Existing Approaches. Proceedings of the IJCAI-01 Workshop: Ontologies and Information Sharing.
Wiederhold, G. (1993). Intelligent Integration of Information. ACM SIGMOD Conference on Management of data, 434-437.
Internet References
TSIMMIS: http://www-db.stanford.edu/tsimmis/tsimmis.html
DISCO: http://www-caravel.inria.fr/Eprototype_Disco.html
Garlic: http://www.almaden.ibm.com/cs/garlic/
MIX: http://www.npaci.edu/DICE/mix-system.html
Agora: http://www-rocq.inria.fr/~manolesc/AGORA/index.html
OBSERVER: http://sol1.cps.unizar.es:5080/OBSERVER/
ONTOBROKER: http://ontobroker.aifb.uni-karlsruhe.de/index_ob.html
HERA: http://wwwis.win.tue.nl/~hera/
W3C: http://www.w3.org
XQuery: http://www.w3.org/XML/Query
XML Schema: http://www.w3.org/XML/Schema
RDF: http://www.w3.org/RDF/
OWL: http://www.w3.org/2001/sw/WebOnt/
Jena: http://jena.sourceforge.net/
Protégé: http://protege.stanford.edu
APA Style Essentials: http://www.vanguard.edu/faculty/ddegelman/index.cfm?doc_id=796#title
http://www-ksl.stanford.edu/kst/what-is-an-ontology.html