| 研究生: |
呂泓廷 |
|---|---|
| 論文名稱: |
RNA序列實驗中檢測差異表現基因之統計方法 Testing for differentially expressed genes with RNA-Seq data |
| 指導教授: | 薛慧敏 |
| 學位類別: |
碩士
Master |
| 系所名稱: |
商學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2013 |
| 畢業學年度: | 102 |
| 語文別: | 中文 |
| 論文頁數: | 27 |
| 中文關鍵詞: | 負二項分配 、過度離散 、最大擬概似函數估計 、差異表現基因顯著性檢定 、RNA Seq |
| 相關次數: | 點閱:127 下載:13 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,由基因之次世代定序 (next generation sequencing)科技所發展出的RNA-Seq (RNA Sequencing)實驗隨著成本降低日益受到重視。該實驗利用高通量定序技術來探討基因體的基因轉錄(transcriptomes),並以計數型態(count)的序列資料來測量基因表現量。在考慮資料中之過度離散(overdispersion)的特性,我們在此研究中採用負二項(negative binomial)分配假設,並以最大擬概似函數估計(maximum pseudo-likelihood estimation)方法來估計基因之平均表現量。為了進一步找出在兩組具有不同的外顯狀態(phenotype)受試者間存在著差異表現量的基因(differentially expressed genes),我們運用上述估計量之Wald檢定統計量來檢定基因與外顯狀態相關程度之顯著性。我們利用統計模擬以驗證所提出的方法,最後也將此方法應用到真實範例資料。
第一章、緒論 3
第二章、方法 5
第一節、序列資料 5
第二節、參數估計 5
第三節、假設檢定 7
第三章、模擬研究與探討 9
第一節、模擬設計 9
第二節、模擬結果 11
第四章、實證分析 20
第五章、結論 24
參考文獻、26
1.Auer, P. L. and Doerge, R. W. (2011) A Two-stage Poisson Model for Testing RNA-Seq Data, Statistical Applications in Genetics and Molecular Biology, 10, 1–26.
2.Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing, J. Roy. Statist. Soc. Ser. B, 57, 289-300.
3.Hall, J.M., Lee, M.K., Newman, B., Morrow, J.E., Anderson, L.A., Huey, B., King, M.C.(1990) Linkage of Early-Onset Familial Breast Cancer to Chromosome 17q21. Science, 250, 1684–1689.
4.Li, J., Witten, D. M., Johnstone, I. M. and Tibshirani, R. (2012) Normalization, Testing,and False Discovery Rate Estimation for RNA-sequencing Data, Biostatistics, 13,523-538.
5.Marioni, J. C., Mason, C.E., Mane, S. M., Stephens, M. and Gilad, Y. (2008) Rna-seq:an Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays,Genome Res., 18, 1509-1517.
6.Nakashima, E. (1997) Some Methods for Estimation in a Negative-Binomial Model, Ann.Inst. Statist. Math., 49, 101-105.
7.Pao, W., Miller, V., Zakowski, M., Doherty, J., Politi, K., Sarkaria, I., Singh, B., Heelan, R., Rusch, V., Fulton, L., Mardis, E., Kupfer, D., Wilson, R., Kris, M. and Varmus, H. (2004) EGF Receptor Gene mutations are Common in Lung Cancers from Never Smokers and are associated with Sensitivity of Tumors to Gefitinib and Erlotinib, Proceedings of the National Academy of Sciences of the United States of America, 101, 13306–13311.
8.Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010) edgeR: a Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data, Bioinformatics, 26,139-140.
9.Robinson, M. D. and Smyth, G. K. (2007) Moderated Statistical Tests for Assessing Differences in Tag Abundance, Bioinformatics, 23, 2881-2887.
10.Robinson, M. D. and Smyth, G. K. (2008) Small-sample Estimation of Negative Binomial Dispersion, with Applications to SAGE Data, Biostatistics, 9, 321-332.
11.Storey, J. D. (2003) The Positive False Discovery Rate: a Bayesian Interpretation and the q-value, Annals of Statistics, 31, 2013-2035.
12.’t Hoen, P. A. C., Ariyurek, Y., Thygesen, H. H., Vreugdenhil, E., Vossen, R. H., De Menezes, R. X., Boer, J. M., Van Ommen, G. J. and Den Dunnen, J. T. (2008) Deep Sequencing-Based Expression Analysis Shows Major Advances in Robustness, Resolution and Inter-lab Portability over Five Microarray Platforms. Nucleic Acids Research, 36, e141.
13.Wang, Z., Gerstein, M. and Snyder, M. (2009) RNA-Seq: a Revolutionary Tool for Transcriptomics,Nat. Rev. Genet., 10, 57-63.