| 研究生: |
許斯淵 Hsu, Szu-Yuan |
|---|---|
| 論文名稱: |
迴歸分析中共線性於Suppression與Collapsibility之效果探討 Effects of Collinearity on Suppression and Collapsibility in Multiple Linear Regression |
| 指導教授: | 江振東 |
| 口試委員: |
江振東
薛慧敏 王鴻龍 陳珍信 張源俊 |
| 學位類別: |
博士
Doctor |
| 系所名稱: |
商學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 英文 |
| 論文頁數: | 51 |
| 中文關鍵詞: | 共線性 、相關係數 、迴歸係數 、判定係數 、t 統計量 |
| 外文關鍵詞: | Collinearity, Correlation coefficient, Regression coefficient, R-square, t-statistics |
| DOI URL: | http://doi.org/10.6814/NCCU201900647 |
| 相關次數: | 點閱:113 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在探討一個連續型反應變數與一個以上的解釋變數之間的關係時,線性迴歸是一種經常被使用的統計方法。當額外的解釋變數加入模型時,研究者通常著重於迴歸係數估計值與其t統計量的行為表現以及判定係數(R-square)的增加程度等迴歸結果,然而這些結果與新加入的解釋變數及原先已存在於模型裡的解釋變數之間的共線性(collinearity)不無關係。本文主要在探討共線性的效果對於迴歸係數估計值與其t統計量以及判定係數的行為表現之影響。本文研究中發現,當額外的解釋變數加入模型時,新模型的迴歸分析結果可以完全透過三個相關係數以及原模型的判定係數來詮釋,因此可以進一步透過這些訊息來預期新的模型之下的迴歸結果。另一方面,藉由將額外加入的解釋變數視為研究所感興趣的解釋變數,而將原先存在於模型裡的解釋變數視為共變量(covariate),本文亦透過類似的方式來探討共線性的效果對於模型裡collapsibility之影響。所謂的collapsibility是指無論共變量是否存在於模型裡,皆不會影響到研究中所感興趣的解釋變數與反應變數之間的關係。整體而言,本文研究發現當共線性存在於線性迴歸模型中,並不一定會對於迴歸結果造成不好的影響。因此,當模型裡解釋變數間存在共線性時,變數是否從模型中移除必須謹慎思量。
Linear regression is a statistical method that allows researchers to summarize and study the relationship between a response and one or more predictor variables. When adding a predictor into a model, we are most interested in knowing its estimated regression coefficient, the corresponding t-statistic, and the value of R-square that increases. One apparent issue that might impact the results is the collinearity between the added-predictor and those already in the model. In this study, we investigate behavior patterns of the estimated regression coefficient, the corresponding t-statistic and R-square as the collinearity varies. We argue that all the above mentioned statistics are functions of three correlation coefficients and an R-square, and provide summary tables that can be used to anticipate the behavior of the statistics. On the other hand, by treating the added-predictor as the predictor of interest, and those predictors already in the model as covariates, we are able the apply similar techniques to deal with the impact of collinearity on collapsibility, that is, whether the relationship between the response and the predictor of interest remains the same if the covariates are dropped from the model. Overall, we found that collinearity in a linear regression model may not necessarily yield ill effects as we normally think. We urge researchers to think twice before dropping a collinear predictor from further model consideration.
Contents
1. Introduction 1
2. Effects of collinearity on suppression and enhancement in two-predictor case 5
3. Effects of collinearity on suppression and enhancement in general cases 8
3.1 Working formulas of b, se(b), t, and R-square 9
3.2 Behavior pattern of b as a function of r_(x,x^) 13
3.3 Behavior pattern of t as a function of r_(x,x^) 17
3.4 Behavior pattern of R_yU^2 as a function of r_(x,x^) 21
4. Effects of collinearity on collapsibility in multiple linear regression 24
4.1 Working formulas of d^=β^*-β^, se(d^) and t(d^) 25
4.2 Behavior pattern of d^ as a function of r_(x,x^) 27
4.3 Behavior pattern of t(d^) as a function of r_(x,x^) 30
4.4 Relationship between suppression and collapsibility 33
5. Illustrating examples 34
6. Conclusions and discussions 38
References 40
Appendix 43
A.1 Derivations of estimated regression coefficients and R-squares 43
A.2 Working formulas of R_yU^2 and b when X=x 47
A.3 Working formulas of R_yU^2 and b when Z_1=z_1 and X=x 49
A.4 Situations where t^2>t_0^2 51
Chiang, J. T. and Hsu, S. Y. (2018), “Revisiting the Effects of Collinearity in Multiple Linear Regression: High Collinearity May Not Cause the Serious Problems You Might Think,” (Unpublished manuscript).
Clogg, C. C., Petkova, E, and Shihadeh, E. S. (1992), “Statistical Methods for Analyzing Collapsibility in Regression Models,” Journal of Educational Statistics, 17(1), 51-74.
Clogg, C. C., Petkova, E, and Haritou, A. (1995), “Statistical Methods for Comparing Regression Coefficients between Models,” American Journal of Sociology, 100(5), 1261-1293.
Cohen, J. and Cohen, P. (1975), Applied Multiple Regression/Correlation Analysis for The Behavioral Sciences, New Jersey: Lawrence Erlbaum Associates.
Conger, A. J. (1974), “A Revised Definition for Suppressor Variables: A Guide to Their Identification and Interpretation,” Educational and Psychological Measurement, 34, 35-46.
Currie, I. and Korabinski, A. (1984), “Some Comments on Bivariate Regression,” The Statistician, 33, 283-292.
Darlington, R. B. (1968), “Multiple Regression in Psychological Research and Practice,” Psychological Bulletin, 69, 161-182.
Dua, S., Bhuker, M., Sharma, P., Dhall, M., and Kapoor, S. (2014), “Body Mass Index Relates to Blood Pressure Among Adults,” North American Journal of Medical Sciences, 6(2), 89-95.
Friedman, L., and Wall, M. (2005), “Graphical Views of Suppression and Multicollinearity in Multiple Linear Regression,” The American Statistician, 59, 127-137.
Greenland, S., Robins, J. M., and Pearl, J. (1999), “Confounding and Collapsibility in Causal Inference,” Statistical Science, 14(1), 29-46.
Hamilton, D. (1987), “Sometimes R^2>r_(yx_1)^2+r_(yx_2)^2: Correlated Variables Are Not Always Redundant,” The American Statistician, 41, 129-132.
—— (1988), “Reply to [Comments by Freund and Mitra],” The American Statistician, 42, 90-91.
Horst, P. (1941), “The Prediction of Personal Adjustment,” Social Science Research Council Bulletin, 48, 431-436.
Kleinbaum, D. G., Kupper, L. L., Nizam, A., and Muller, K. E. (2008), Applied Regression Analysis and Other Multivariable Methods (4th ed.), Tomson-Brooks/Cole.
Kutner, M., Nachtsheim, C., and Neter, J. (2004), Applied Linear Regression Models (4th ed.), McGraw-Hill/Irwin.
Ludlow, L., and Klein, K. (2014), “Suppressor Variables: The Difference between ‘Is’ versus ‘Acting As’,” Journal of Statistics Education, 22(2), 1-28.
O’Brien R. M. (2017), “Dropping Highly Collinear Variables from a Model: Why it Typically is Not a Good Idea,” Social Science Quarterly, 98(1), 360-375.
Rencher, A. C. and Schaalje, G. B. (2008), Linear Models in Statistics (2nd ed.), John Wiley & Sons, Inc.
Shieh, G. (2001), “The Inequality between The Coefficient of Determination and The Sum of Squared Simple Correlation Coefficients,” The American Statistician, 55, 121-124.
Shieh, G. (2006), “Suppression Situations in Multiple Linear Regression,” Educational and Psychological Measurement, 66, 435-447.
Velicer, W. (1978), “Suppressor Variables and The Semipartial Correlation Coefficient,” Educational and Psychological Measurement, 38, 953-958.
Waller, N. G. (2011), “The Geometry of Enhancement in Multiple Regression,” Psychometrika, 76, 634-649.