| 研究生: |
李晏 Lee, Yen |
|---|---|
| 論文名稱: |
非常態間斷隨機變數的產生 Generation of non-normal approximated discrete random variables |
| 指導教授: | 鄭中平 |
| 學位類別: |
碩士
Master |
| 系所名稱: |
理學院 - 心理學系 Department of Psychology |
| 論文出版年: | 2010 |
| 畢業學年度: | 98 |
| 語文別: | 英文 |
| 論文頁數: | 103 |
| 中文關鍵詞: | 最大資訊熵 、非常態分配 、間斷變數(間斷分配) 、強韌性研究 |
| 外文關鍵詞: | Maximum Entropy, Non-normality, Discrete Variables (Distributions), Robustness |
| 相關次數: | 點閱:124 下載:27 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
使用母數統計方法(Parametric Tests)分析資料時,常需滿足常態假設,但實際得到的資料卻少有常態,因此研究違反常態假設對統計量所造成影響的強韌性研究(Robustness Research)在應用統計方法上是重要的研究主題。在進行此類研究時,常使用蒙地卡羅法(Monte Carlo Method)產生非常態之資料進一步進行研究,目前雖已有多個可產生非常態連續資料的方法被提出,但心理學研究之資
料卻多為間斷資料。而在產生非常態間斷資料時,除難以產生指定參數之間斷分配外,亦有無限多組具同樣參數之間斷分配可供選擇。針對以上兩困難,本研究提出可使用最大資訊熵程序估計符合指定參數之單變數間斷分配,用以產生對應之單變數間斷資料。最大資訊熵方法可所估出之間斷最大資訊熵分配除為符合指定參數時最常出現之分配以外,同時具有平滑、非必要無0 機率等特性。本研究呈現指定4 參數(平均數、變異數、偏態及峰度)與指定2 參數(偏態及峰度)
之最大資訊熵方法,及相對應之R 套件,並以R 套件對此2 方法進行探討評估。結果發現本研究所提出之二方法,在要求指定參數與估計參數之誤差均不超過 .001 時,均可估計出符合指定參數之可能組合之分配,顯示此二方法可精確產生指定參數之間斷分配。而本研究所提供之R 套件,除可在輸入點數、指定參數後產生間斷分配,亦可輸入指定樣本數目及樣本數於此間斷分配中抽取樣本,使此二方法於使用蒙地卡羅法進行間斷資料之強韌性研究時,更易於使用。
When conducting the robustness researches about normality assumption with Monte Carlo method, a procedure for simulating non-normal data is needed. Some procedures for simulating the non-normal continuous data have been proposed, but the discrete data of ordered categorized variables (e.g., Likert-Type scale) are what we
met mostly in practice. To estimate the discrete probability distribution precisely and choose one from infinite discrete probability distributions with the same constraints are 2 difficulties encountered on discrete data simulating process. Therefore, the research purposed a procedure called Maximum Entropy Procedure (MEP) which
simulates the univariate discrete maximum entropy distribution with the specified parameters. The distribution is the one with greatest number with the specified parameters, most unlikely probability distribution with 0 probability and smoothest.
The characteristics make the MEP a reasonable and considerable choice on simulating univariate discrete data with specified parameters. The MEP-4 (constraints on mean,
variance, skewness and kurtosis), the MEP-2 (constraints on skewness and kurtosis) and the corresponding R packages which could estimate the univariate discrete distributions with the specified parameters are presented, evaluated and discussed in this research. It shows that the MEP-4 and MEP-2 are able to estimate the discrete probability distributions precisely with possible combinations of specified parameters with all differences are smaller than .001 and thus useful for robustness researches. The R packages presented in this study are easily to estimate the discrete probability distributions with specified parameters and generate data from these distributions with
specified number of samples and sample size. Therefore the MEP-4 and MEP-2 could be easily implemented for generating discrete data with the specified parameters through the corresponding R package and thus useful for Monte Carlo method of robustness researches.
Chapter 1 Introduction ............................................................................................. 1
Section 1 the Importance of Generation of Non-Normal Data ........................ 1
Section 2 Previous Researches on Simulating Non-Normal Continuous Data.
...........................................................................................................
.......................................................................................................... 3
Section 3 Previous Researches on Simulating Non-Normal Approximated
Discrete Data.................................................................................... 5
Section 4 2 Difficulties of Previous Procedure on Simulating Non-normal
Approximated Discrete Data............................................................ 7
Section 5 the Research Purpose .......................................................................11
Chapter 2 the Characteristics of the Maximum Entropy Procedure (MEP) ........... 12
Section 1 the Definition of the Maximum Entropy Procedure ....................... 12
Section 2 the Rationale of Choosing the Maximum Entropy Distributions ... 13
Section 3 the Solutions of the Maximum Entropy Procedure......................... 20
Section 4 the Maximum Entropy Procedures Proposed in this Research....... 21
Chapter 3 the Maximum Entropy Procedure with 4 Parameters (MEP-4) ............. 23
Section 1 the Solution of the MEP-4 .............................................................. 23
Section 2 the Details of the R Package for the MEP-4 ................................... 27
Chapter 4 Evaluation of the Maximum Entropy Procedures with 4 Parameters .... 30
Section 1 Research Design of the Study 1 ...................................................... 30
Section 2 Results of the Study 1 ..................................................................... 33
Section 3 Research Design of the Study 2 ...................................................... 34
Section 4 Results of the Study 2 ..................................................................... 35
Chapter 5 the Maximum Entropy Procedure with 2 Parameters (MEP-2) ............. 37
Section 1 the Solution of the MEP-2 ............................................................ 37
Section 2 the Details of the R Package for the MEP-2 ................................... 39
Chapter 6 Evaluation of the Maximum Entropy Procedures with 2 Parameters .... 42
Section 1 Research Design of the Study 3 ...................................................... 42
Section 2 Results of the Study 3 ..................................................................... 44
Chapter 7 General Discussion and Conclusion....................................................... 45
Section 1 the Range of the Possible Generated Parameter Space of the MEP-4
and the MEP-2 ................................................................................ 45
Section 2 the Shape of the Generated Discrete Probability Distributions of the
MEP-4 and the MEP-2.................................................................... 46
Section 3 the Meaning of Zero Skewness and Zero Kurtosis of the Discrete
Probability Distributions................................................................. 48
Section 4 the Uniqueness of the Solutions of the MEP-4 and the MEP-2...... 49
Section 5 the Maximum Entropy Distributions with Prior Probability
Distributions.................................................................................... 51
Section 6 Conclusion ...................................................................................... 52
Reference ........................................................................................................ 53
Allen, R. C., Bottcher, C., Bording, P., Burns, P., Conery, J., Davies, T. R., et al. (1996). Computational Science Education Project. In M. Baranger, J. Dongarra, G. Fox & D. Schneider (Eds.) Available from http://www.phy.ornl.gov/csep
Anderson, M. J. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecology, 26, 32-46.
Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H., & Tukey, J. W. (1972). Robust estimates of location: Survey and Advances. NJ: Princeton University Press.
Ansell, M. J. G. (1973). Robustness of location estimators to asymmetry. Applied Statistics, 22, 249-254.
Bollen, K. A. (1989). Structural Equations with Latent Variables. New York: John Wiley & Sons, Inc.
Bollen, K. A., & Barb, K. H. (1981). Pearson's r and coarsely categorized measures. American Sociological Review, 46, 232-239.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 16-29.
David, H. A., & Shu, V. S. (1978). Robustness of location estimators in the presence of an outlier. In H. A. David (Ed.), Contributions to survey sampling and applied statistics. New York: Academic Press.
Erceg-Hurn, D. M., & Mirosevich, V. M. (2008). Modern robust statistical methods: An easy way to maximize the accuracy and power of your research. American Psychologist, 63, 591-601.
Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43, 521-532.
Golan, A., Judge, G., & Miller, D. (1996). Maximum Entropy Econometrics: Robust Estimation with Limited Data. New York: John Wiley & Sons, Inc.
Gorsuch, R., L. (1983). Factor analysis. NJ: Lawrence Erlbaum Associates, Inc.
Guttman, L. (1948). An inequality for kurtosis. The Annals of Mathematical Statistics, 2, 277-278.
Hipp, J. R., & Bollen, K. A. (2003). Model fit in structural equation models with censored, ordinal, and dichotomous variables: Testing vanishing tetrads. Sociological Methodology, 33, 267-305.
Jaynes, E. T. (1957a). Information theory and statistical mechanics. Physics Review, 106, 620-630.
Jaynes, E. T. (1957b). Information theory and statistical mechanics II. Physics Review, 108, 171-190.
Jaynes, E. T. (1982). On the rationale of maximum-entropy methods Proceedings of the IEEE, 70, 939-952.
Kapur, J. N., & Kesavan, H. K. (1992). Entropy Optimization Principles with Applications. Boston: Academic Press.
Kesavan, H. K., & Kapur, J. N. (1989). The generalized maximum entropy principle. IEEE Transactions on Systems, Man, and Cybernetics, 19(5), 1042-1052.
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annuals of Mathematical Statistics, 22, 79-86.
Lei, M., & Lomax, R. G. (2005). The effect of varying degrees of nonnormality in structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 12, 1-27.
Martin, W. S. (1973). The effects of scaling on the correlation coefficient: A test of validity. Journal of Marketing Research, 10, 316-318.
Martin, W. S. (1978). Effects of scaling on the correlation coefficient: additional considerations. Journal of Marketing Research, 15, 304-308.
Mattson, S. (1997). How to generate non-normal data for simulation of structural equation models. Multivariate Behavioral Research, 32, 355-373.
Micceri, T. (1989). The Unicorn, The normal curve, and other improbable creatures. Psychological Bulletin, 105, 156-166.
Mohammad-Djafari, A., & d'Electricite, E. S. (1992). A MATLAB program to calculate the maximum entropy distributions. In C. R. Smith, G. J. Erickson & P. O. Neudofer (Eds.), Maximum Entropy and Bayesian Methods Seattle, 1991 (Vol. 50, pp. 221-233). Dordrecht, Netherland: Kluwer Academic Publisher.
Muthén, B., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171-189.
Muthén, B., & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19-30.
Nelder, J. A., & Mead, R. (1965). A simplex algorithm for function minimization. Computer Journal, 7, 308-313.
Olsson, U. (1979). On the robustness of factor analysis against crude classification of the observations. Multivariate Behavioral Research, 14, 485-500.
Ory, D. T., & Mokhtarian, P. L. (2010). The impact of non-normality, sample size and estimation technique on goodness-of-fit measures in structural equation modeling: Evidence from the empirical models of travel behavior. Quality and Quantity, 44, 427-445.
Pearson, E. S., & Please, N. W. (1975). Relation between the shape of population distribution and the robustness of four simple test statistics Biometrika, 62, 223-241.
Pearson, K. (1916). Mathematical contributions to the theory of evolution. XIX. Second supplement to a memoir on skew variation. Philosophical Transactions of the Royal Society of London. A, 216, 429-457.
Reinartz, W. J., Echambadi, R., & Chin, W. W. (2002). Generating non-normal data for simulation of structural equation models using Mattson's method. Multivariate Behavioral Research, 37, 227-244.
Tadikamalla, P. R. (1980). On simulating non-normal distributions. Psychometrika, 45, 273-279.
Theil, H., & Fiebig, D. G. (1984). Exploiting Continuity: Maximum Entropy Distribution of Continuous Distribution. Cambridge, MA: Ballinger Publishing Company.
Wilcox, R. R. (1998). How many discoveries have been lost by ignoring modern statistical methods. American Psychologist, 53, 300-314.
Wilkins, J. E. (1944). A note on skewness and kurtosis. The Annals of Mathematical Statistics, 15, 333-335.
Wu, N. (1997). Rationale of the maximum entropy method In T. S. Huang, T. Kohonen & M. Schroeder (Eds.), The Maximum Entropy Method. New York: Springer.
Zellner, A., & Highfield, R. A. (1988). Calculation of maximum entropy distributions and approximation of marginal posterior distributions. Journal of Ecnometrics, 37, 195-209.