跳到主要內容

簡易檢索 / 詳目顯示

研究生: 張椀荑
Chang, Wan-Yi
論文名稱: 具有測量誤差及輔助解釋變數下的圖模型
Predictors-Assisted Graphical Models with Error-Prone Variables
指導教授: 陳立榜
Chen, Li-Pang
口試委員: 周珮婷
Chou, Pei-Ting
張欣民
Chang, Hsing-Ming
學位類別: 碩士
Master
系所名稱: 商學院 - 統計學系
Department of Statistics
論文出版年: 2025
畢業學年度: 113
語文別: 英文
論文頁數: 52
中文關鍵詞: 圖形結構測量誤差多重反應變數選擇
外文關鍵詞: Graphical structure, Measurement errors, Multiple response, Variable selection
相關次數: 點閱:69下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本研究中,我們考慮了兩組不同的多變量變數,分別記為 Y 和 X,目標是建立一個迴歸模型來描述 Y 和 X 之間的關係。雖然多變量迴歸模型來處理這種問題是一種蠻直覺的方法,而且過去也有很多人研究過,但資料仍存在一些重要問題尚未被充分解決,包括 Y 和 X 中的測量誤差、資訊變數 X 的選擇,以及 Y 中網絡結構。此外,現有的方法大多限制是只考慮線性模型和連續型的 Y。為了應對這些挑戰並提供可靠的估計方法,我們首先提出了一種一般性的非參數方法來建模 Y 和 X,其中 Y 被允許是混合分布的隨機向量。我們將隨機森林方法結合迴歸校正(regression calibration),處理測量誤差並偵測 X 中的資訊變數及 Y 的網絡結構。在某些情況下,Y 和 X 被多變量線性模型描述的特例下,我們開發了兩種不同的誤差修正策略,能同時解決測量誤差問題以及變數選擇和網絡偵測問題。最後,透過模擬研究和資料分析,數值結果驗證了我們的新方法是有效。


    In this study, we consider two different multivariate variables, denoted Y and X, respectively, and aim to build up the regression model to characterize Y and X. While the multivariate regression model is an intuitive strategy and has been studied in past years, there still exist some crucial issues in the dataset and have not been fully addressed, including measurement error in both Y and X, selection of informative X, and detection of network structure in Y. Moreover, the other critical limitation in the existing methods is the consideration of linear models and continuous variable Y. To tackle these challenges and provide reliable estimator, we first consider a general nonparametric approach to model Y and X, and Y is allowed to be a mixture distributed random vector. We extend the random forest method with regression calibration approach to handle measurement error and detect informative X and network structure in Y. Moreover, under a special case that Y and X are characterized by multivariate linear models, we develop two different error-corrected strategies to address measurement error and handle variable selection and network detection simultaneously. Throughout simulation studies and data analysis, numerical results verify the validity of our novel methods.

    摘要 I
    Abstract II
    Table of Contents III
    Tables V
    Figures VII
    Chapter 1 Introduction 1
    Chapter 2 Notation and Models 3
    2.1 Data Structure and Regression Models 3
    2.2 Measurement Error Models 5
    Chapter 3 Estimation for Nonlinear Models 6
    Chapter 4 Estimation for Linear Regression Models 9
    4.1 Estimation via Gaussian Maximum Likelihood 9
    4.2 Estimation via Conditional Likelihood Function 12
    4.3 Model Averaging 15
    Chapter 5 PAGE : R package Implementation 16
    5.1 NP_Gragh 16
    5.2 Joint_Gaussian 18
    5.3 Cond_Gaussian 19
    Chapter 6 Numerical Studies 20
    6.1 Simulation Setup 20
    6.1.1 Data Generation Under Nonlinear Model (1) 20
    6.1.2 Data Generation Under Linear Model (2) 22
    6.2 Simulation Results 23
    6.3 Simulation of Model Misspecification 25
    Chapter 7 Real Data Analysis 27
    Chapter 8 Summary 30
    References 31

    Breiman, L. and Friedman, J. H. (1997). Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society. Series B, 59 3–54.
    Buckley, J.J., Feuring, T., Hayashi, Y. (1999). Multivariate nonlinear fuzzy regression: An evolutionary algorithm approach. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 7, 83–98.
    Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models. Chapman and Hall/CRC Press, New York.
    Chen, L.-P. (2023). De-noising boosting methods for variable selection and estimation subject to error-prone variables. Statistics and Computing, 33:38.
    Chen, L.-P. (2024). Estimation of graphical models: An overview of selected topics. International Statistical Review, 92(2), 194–245.
    Chen, L.-P. and Tsao, H.-S. (2024). GUEST: An R package for handling estimation of graphical structure and multi-classification for error-prone gene expression data. Bioinformatics, 40(12), btae731.
    Chen, L.-P. and Yi, G. Y. (2022). De-noising analysis of noisy data under mixed graphical models. Electronic Journal of Statistics, 16, 3861-3909.
    Chun, H. and Kele¸s, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(1), 3–25.
    Cui, J. and Yi, G. Y. (2024). Variable selection in multivariate regression models with measurement error in covariates. Journal of Multivariate Analysis, 202, 105299.
    Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical Lasso. Biostatistics, 9(3), 432-441.
    Gan, L., Narisetty, N. N., and Liang, F. (2022). Bayesian estimation of Gaussian conditional random fields. Statistica Sinica, 32, 131-152.
    Hastie, T., Tibshirani, R., and Friedman, J. H. (2008). The Elements of Statistical Learning. Springer, New York.
    He, D., Zhou, Y., and Zou, H. (2021). On sure screening with multiple responses. Statistica Sinica, 31, 1749–1777.
    Holmes, C. C. and Mallick, B. K. (2003). Generalized nonlinear modeling with multivariate free-knot regression splines. Journal of the American Statistical Association, 98(462), 352–368.
    Huang, X. and Zhang, H. (2013). Variable selection in linear measurement error models via penalized score functions. Journal of Statistical Planning and Inference, 143(12), 2101-2111.
    Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thomson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, and Young RA (2002). Transcriptional regulatory networks in Saccharomyces cerevisiae, Science, 298(5594), 799-804.
    Lee, W. and Liu, Y. (2012). Simultaneous multiple response regression and inverse covariance matrix estimation via penalized Gaussian maximum likelihood. Journal of Multivariate Analysis, 111, 241-255.
    Li, R., Zhong, W., and Zhu, P. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129-1139.
    Liang, H. and Li, R. (2009). Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association, 104(485), 234–248.
    Liu, H. and Zhang, X. (2023). Frequentist model averaging for undirected Gaussian graphical models. Biometrics, 79(3), 2050–2062.
    Negi, A. and Negi, D. S. (2022). Difference-in-differences with a misclassified treatment. arXiv:2208.02412
    Niu, Y., Guha, N., De, D., Bhadra, A., Baladandayuthapani, V., and Mallick, B. K. (2021). Bayesian variable selection in multivariate nonlinear regression with graph structures. arXiv:2010.14638v2
    Rencher, A. C. (2002). Methods of Multivariate Analysis. John Wiley Sons, New York.
    Rothman, A., Levina, E. and Zhu, J. (2010). Sparse multivariate regression with covariance estimation. Journal of Computational and Graphical Statistics, 19, 947-962.
    Shedden, K. and Cooper, S. (2002). Analysis of cell-cycle gene expression in Saccharomyces cerevisiae using microarrays and multiple synchronization methods. Nucleic Acids Research , 30(13), 2920-2929.
    Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, and Futcher B (1998). Comprehensive identification of cell cycle–regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular biology of the cell, 9(12), 3273-3297.
    Su, Q., Liao, X., Chen, C., and Carin, L. (2016). Nonlinear statistical learning with truncated Gaussian graphical models. arXiv:1606.09006
    Sz´ekely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769–2794.
    Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288.
    Timm, N. H. (2002). Applied Multivariate Analysis. Springer, New York.
    Wang, J. (2015). Joint estimation of sparse multivariate regression and conditional graphical models. Statistica Sinica, 25, 831-851.
    Xu, H., and Zhang, C. (2024). Nonlinear multivariate function-on-function regression with variable selection. arXiv:2406.19021.
    Yin, J. and Li, H. (2011). A sparse conditional Gaussian graphical model for analysis of genetical genomics data. Annals of Applied Statistics, 5, 2630-2650.
    Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418-1429.
    Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

    無法下載圖示 全文公開日期 2030/07/07
    QR CODE
    :::