跳到主要內容

簡易檢索 / 詳目顯示

研究生: 鄧雅文
Teng, Ya Wen
論文名稱: 針對複合式競賽挑選最佳球員組合的方法
Selecting the best group of players for a composite competition
指導教授: 陳良弼
Chen, Arbee L. P.
學位類別: 碩士
Master
系所名稱: 理學院 - 資訊科學系
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 48
中文關鍵詞: Top-k查詢不確定資料Best-kGROUP查詢
外文關鍵詞: Top-k query, uncertain data, Best-kGROUP query, skyline kGROUP
相關次數: 點閱:84下載:20
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在資料庫的處理中,top-k查詢幫助使用者從龐大的資料中萃取出具有價值的物件,它將資料庫中的物件依照給分公式給分後,選擇出分數最高的前k個回傳給使用者。然而在多數的情況下,一個物件也許不只有一個分數,要如何在多個分數中仍然選擇出整體最高分的前k個物件,便成為一個新的問題。在本研究中,我們將這樣的物件用不確定資料來表示,而每個物件的不確定性則是其帶有機率的分數以表示此分數出現的可能性,並提出一個新的問題:Best-kGROUP查詢。在此我們將情況模擬為一個複合式競賽,其中有多個子項目,每個項目的參賽人數各異,且最多需要k個人參賽;我們希望能針對此複合式競賽挑選出最佳的k個球員組合。當我們定義一個較佳的組合為其在較多項目居首位的機率比另一組合高,而最佳的組合則是沒有比它更佳的組合。為了加快挑選的速度,我們利用動態規劃的方式與篩選的演算法,將不可能的組合先剔除;所剩的組合則是具有天際線特質的組合,在這些天際線組合中,我們可以輕易的找出最佳的組合。此外,在實驗中,對於在所有球員中挑選最佳的組合,Best-kGROUP查詢也有非常優異的表現。


    In a large database, top-k query is an important mechanism to retrieve the most valuable information for the users. It ranks data objects with a ranking function and reports the k objects with the highest scores. However, when an object has multiple scores, how to rank objects without information loss becomes challenging. In this paper, we model the object with multiple scores as an uncertain data object and the uncertainty of the object as a distribution of the scores, and consider a novel problem named Best-kGROUP query. Imagine the following scenario. Assume there is a composite competition consisting of several games each of which requires a distinct number of players. Suppose the largest number is k, and we want to select the best group of k players from all the players for the competition. A group x is considered better than another group y if x has higher aggregated probability to be the top ones in more games than y. In order to speed up the selection process, the groups worse than another group definitely should first be discarded. We identify these groups using a dynamic programming based approach and a filtering algorithm. The remaining groups with the property that none of them have higher aggregated probability to be the top ones for all games against the other groups are called skyline groups. From these skyline groups, we can easily compare them to select the best group for the composite competition. The experiments show that our approach outperforms the other approaches in selecting the best group to defeat the other groups in the composite competitions.

    1 INTRODUCTION 1
    2 RELATED WORK 7
    2.1 Top-k Queries 7
    2.2 Uncertain Data 8
    2.2.1 Uncertain Top-k Queries 8
    2.2.2 Uncertain Top-k Queries on Data Streams 14
    2.2.3 Uncertain Nearest Neighbor Queries 14
    2.3 Skyline 15
    3 METHODOLOGY 17
    3.1 Problem Definition 17
    3.2 The Basic Algorithms 20
    3.3 The Heuristic Approaches 28
    4 EXPERIMENTS 29
    4.1 Experiment Setup 29
    4.2 On Execution Time 32
    4.3 On Accuracy 34
    4.4 On Performance of the Composite Competition 35
    5 CONCLUSIONS AND FUTURE WORK 37
    6 REFERENCES 40

    [1] I. F. Ilyas, G. Beskales, and M. A. Soliman. 2008. A Survey of Top-k Query Processing Techniques in Relational Database System. ACM Computing Surveys, Vol. 40, No. 4, pp. 11:1-11:58.
    [2] C. C. Aggarwal and P. S. Yu. 2009. A Survey of Uncertain Data Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 5, pp. 609-623.
    [3] M. Soliman, I. Ilyas, and K. Chang. 2007. Top-k Query Processing in Uncertain Databases. In Proceedings of the 23rd International Conference on Data Engineering (ICDE), pp. 896-905.
    [4] M. Hua, J. Pei, W. Zhang, and X. Lin. 2008. Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data. In Proceedings of the 24th International Conference on Data Engineering (ICDE), pp. 1403-1405.
    [5] T. Ge, S. Zdonik, and S. Madden. 2009. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. In Proceedings of the 35th International Conference on Management of Data (SIGMOD), pp. 375-388.
    [6] C. Jin, K. Yi, L. Chen, J. X. Yu, and X. Lin. 2008. Sliding-Window Top-k Queries on Uncertain Streams. In Proceedings of the VLDB Endowment, Vol. 1, No. 1, pp. 301-312.
    [7] G. Beskales, M. A. Soliman, and I. F. Ilyas. 2008. Efficient Search for the Top-k Probable Nearest Neighbors in Uncertain Databases. In Proceedings of the VLDB Endowment, Vol. 1, No. 1, pp. 326-339.
    [8] M. L. Yiu and N. Mamoulis. 2007. Efficient Processing of Top-k Dominating Queries on Multi-Dimensional Data. In Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 483-494.
    [9] C.-Y. Chan, H. V. Jagadish, K.-L. Tan, A. K. H. Tung, and Z. Zhang. 2006. Finding k-Dominant Skylines in High Dimensional Space. In Proceedings of the 32nd International Conference on Management of Data (SIGMOD), pp. 503-514.

    QR CODE
    :::