| 研究生: |
吳秉勳 David Wu |
|---|---|
| 論文名稱: |
變數轉換之離群值偵測 Detection of Outliers with Data Transformation |
| 指導教授: | 鄭宗記 |
| 學位類別: |
碩士
Master |
| 系所名稱: |
商學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2001 |
| 畢業學年度: | 89 |
| 語文別: | 英文 |
| 論文頁數: | 89 |
| 中文關鍵詞: | 容離值 、最小中位數穩健迴歸估計值 、遮蔽效應 、最小體積橢圓體估計值 、Mahalanobis 距離 、分數統計量 、鐘乳石圖 、步進搜尋演算法 |
| 外文關鍵詞: | Breakdown Point, Least Median Square (LMS) Estimator, The Masking Effect, Minimum Volume Ellipsoid (MVE) Estimator, Mahalanobis Distance, Score Statistic, Stalactite Plot, The Forward Search Algorithm |
| 相關次數: | 點閱:161 下載:59 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在迴歸分析中,當資料中存在很多離群值時,偵測的工作變得非常不容易。 在此狀況下,我們無法使用傳統的殘差分析正確地偵測出其是否存在,此現象稱為遮蔽效應(The Masking Effect)。 而為了避免此效應的發生,我們利用最小中位數穩健迴歸估計值(Least Median Squares Estimator)正確地找出這些群集離群值,此估計值擁有最大即50﹪的容離值 (Breakdown point)。 在這篇論文中,用來求出最小中位數穩健迴歸估計值的演算法稱為步進搜尋演算法 (the Forward Search Algorithm)。 結果顯示,我們可以利用此演算法得到的穩健迴歸估計值,很快並有效率的找出資料中的群集離群值;另外,更進一步的結果顯示,我們只需從資料中隨機選取一百次子集,並進行步進搜尋,即可得到概似的穩健迴歸估計值並正確的找出那些群集離群值。 最後,我們利用鐘乳石圖(Stalactite Plot)列出所有被偵測到的離群值。
在多變量資料中,我們若使用Mahalanobis距離也會遭遇到同樣的屏蔽效應。 而此一問題,隨著另一高度穩健估計值的採用,亦可迎刃而解。 此估計值稱為最小體積橢圓體估計值 (Minimum Volume Ellipsoid),其亦擁有最大即50﹪的容離值。 在此,我們也利用步進搜尋法求出此估計值,並利用鐘乳石圖列出所有被偵測到的離群值。
這篇論文的第二部分則利用變數轉換的技巧將迴歸資料中的殘差項常態化並且加強其等變異的特性以利後續的資料分析。 在步進搜尋進行的過程中,我們觀察分數統計量(Score Statistic)和其他相關診斷統計量的變化。 結果顯示,這些統計量一起提供了有關轉換參數選取豐富的資訊,並且我們亦可從步進搜尋進行的過程中觀察出某些離群值對參數選取的影響。
Detecting regression outliers is not trivial when there are many of them. The methods of using classical diagnostic plots sometimes fail to detect them. This phenomenon is known as the masking effect. To avoid this, we propose to find out those multiple outliers by using a highly robust regression estimator called the least median squares (LMS) estimator which has maximal breakdown point. The algorithm in search of the LMS estimator is called the forward search algorithm. The estimator found by the forward search is shown to lead to the rapid detection of multiple outliers. Furthermore, the result reveals that 100 repeats of a simple forward search from a random starting subset are shown to provide sufficiently robust parameter estimators to reveal multiple outliers. Finally, those detected outliers are exhibited by the stalactite plot that shows greatly stable pattern of them.
Referring to multivariate data, the Mahalanobis distance also suffers from the masking effect that can be remedied by using a highly robust estimator called the minimum volume ellipsoid (MVE) estimator. It can also be found by using the forward search algorithm and it also has maximal breakdown point. The detected outliers are then displayed in the stalactite plot.
The second part of this dissertation is the transformation of regression data so that the approximate normality and the homogeneity of the residuals can be achieved. During the process of the forward search, we monitor the quantity of interest called score statistic and some other diagnostic plots. They jointly provide a wealth of information about transformation along with the effect of individual observation on this statistic.
封面頁
證明書
致謝詞
論文摘要
目錄
圖目錄
表目錄
Chapter One Introduction
1.1 Research Motivation
1.2 Research Purposes
1.3 Dissertation Structures
1.4 Literature Review
Chapter Two Forward Search Theory
2.1 Outliers, LMS and MVE Estimators
2.1.1 Leverage Points and Outliers
2.1.2 Least Median Squares (LMS) Estimator
2.1.3 Minimum Volume Ellipsoid (MVE) Estimator
2.2 The Motivation of the Forward Search
2.3 Introduction to the Forward Search Algorithm
2.3.1 General Principles
2.3.2 The Forward Search in Search of LMS Estimator
2.3.3 The Forward Search in search of MVE Estimator
2.4 Stalactite Plots
2.5 Examples
2.5.1 Rousseeuw Data
2.5.2 Hawkins-Bradu-Kass Data
Chapter Three Data Transformations
3.1 Importance of Normality
3.2 Transformations in Regression
3.3 Score Statistic for Transformation
3.3.1 Added Variable Plot
3.3.2 The Derivation of Score Statistic by Added variable
3.4 Examples
3.4.1 Stack Loss Data
Chapter Four Empirical Data Analysis
4.1 Data Illustration and Outlier Detection
4.2 Data Transformation to Improve the Model
Chapter Five Conclusions and Suggestions
5.1 Research Discoveries
5.2 Significance of the Forward Search Algorithm
5.3 Future Study
Appendix
Appendix A Datasets
Appendix B Terminologies
Appendix C Future Study
References