| 研究生: |
吳小萍 Wu, Hsiao-Ping |
|---|---|
| 論文名稱: |
模擬高密度寡聚核甘酸微陣列矩陣資料及正規化方法之探討 A Simulation Study on High Density Oligonucleotide Microarray Data With Discussion of Normalization Methods |
| 指導教授: |
郭訓志
Kuo, Hsun-Chih 蔡紋琦 Tsai, Wen-Chi |
| 學位類別: |
碩士
Master |
| 系所名稱: |
商學院 - 統計學系 Department of Statistics |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 英文 |
| 論文頁數: | 59 |
| 中文關鍵詞: | 微陣列矩陣 、正規化 |
| 外文關鍵詞: | microarray, normalization |
| 相關次數: | 點閱:197 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
微陣列矩陣晶片是一門現今被廣泛使用在許多領域的生物醫學研究,在本文,我們主要是對寡核甘酸微陣列矩陣晶片資料的正規化感興趣。為了比較不同的正規化方法,我們致力於模擬更接近真實寡核甘酸微陣列矩陣晶片的資料。在資料的模擬上,我們主要是根據Li和Wong的模型來進行模擬,並利用階層法來設定模型的參數。最後為了判別正規化方法的好壞,我們模擬了100組資料,並且利用四個判斷準則來做比較。模擬的結果表示,我們所提出的新方法
(LOESS to Average),一般來說都比其他的正規化方法來的好。
Microarray technology is now widely used in many areas of biomedical research. In this thesis, we are interested in the normalization for oligonucleotide Microarray data. We aimed to simulate more realistic oligonucleotide microarry data in order to compare different normalization methods. The data simulation was based on Li and Wong's model with a hierarchical setup for parameters. In order to compare normalization methods, 100 data sets were simulated data. The performance of ten normalization methods was assessed based on four comparison criteria. Simulation results suggest that our new proposed normalization method, LOESS
to Average, is generally a better method than other normalization methods.
謝辭.....................................................{i}
Abstract...............................................{ii}
中文摘要...............................................{iii}
1 Introduction..........................................{1}
2 Literature Review.....................................{3}
2.1 Affymetrix Gene Chip Technologies.................{3}
2.2 Li and Wong's Model...............................{4}
2.3 DNA-Chip (dChip)..................................{4}
2.3.1 Invariant Normalization.......................{5}
2.4 Robust Multi-Array Average (RMA)..................{5}
2.4.1 Background Correction in RMA..................{6}
2.4.2 Quantile Normalization in RMA.................{7}
2.4.3 Summarization in RMA: Median Polish...........{7}
2.5 Microarray Analysis Suite Software (MAS 5.0)......{9}
2.5.1 Background Correction in MAS 5.0..............{9}
2.5.2 The Ideal Mismatch Value (IM)................{10}
2.5.3 The Adjusted Log-Transformed PM Intensities..{10}
2.5.4 One Step Tukey Biweight Algorithm............{11}
2.5.5 Scaling Normalization........................{11}
2.6 omparisons of Normalization Methods..............{12}
3 Methodology..........................................{13}
3.1 Scaling Method...................................{13}
3.2 Median Centered..................................{15}
3.3 Hybrid Scaling-Median Centered Methods...........{15}
3.4 Z^* Scores.......................................{16}
3.5 Quantile Normalization...........................{16}
3.6 Cyclic LOESS.....................................{18}
3.7 New Proposed Normalization Method: LOESS to Average
.................................................{20}
4 Real Data............................................{21}
4.1 Real Data........................................{21}
4.2 The Perfect Match (PM) Value.....................{21}
4.3 The Mismatch (MM) Value..........................{22}
4.4 The Theta (θ)....................................{23}
4.5 The Phi (Φ)......................................{24}
5 Simulation...........................................{26}
5.1 Common Simulation Setting........................{26}
5.2 Simulation Settings for Differentially Expressed
Genes............................................{27}
5.3 Simulated Data...................................{28}
6 Comparisons of Normalization Methods.................{30}
6.1 Interquarter Range (IQR).........................{30}
6.2 Diff-statistics..................................{31}
6.3 Mean Standard Deviation (MSD)....................{33}
6.3.1 Overall MSD..................................{33}
6.3.2 Diff-MSD.....................................{34}
6.4 Ratio............................................{36}
7 Discussion and Future Work...........................{38}
7.1 Discussion of Comparison Criteria................{38}
7.2 Summary of Comparisons for Normalization Methods.{39}
7.3 Discussion of Simulation Settings................{39}
7.3.1 Simulation Setting 1.........................{39}
7.3.2 Simulation Setting 2 and Setting 3...........{40}
7.4 Future Work......................................{41}
References.............................................{43}
Appendix...............................................{45}
[1] Affymetrix (2002), Statistical algorithms description
document, Technical report, Affymetrix.
[2] B. M. Bolstad, R. A. Irizarry, M. Astrand and T. P.
Speed (2003), A comparison of normalization methods for
high density oligonucleotide array data based on
variance and bias, Bioinformatics, 19(2), 185-193.
[3] R. A. Irizarry, B. Hobbs, F. Collin, Y. D. Beazer-
Barclay, K. J. Antonellis, U. Scherf and T. P. Speed
(2003), Exploration, normalization, and summaries of
high density oligonucleotide array probe level data,
Biostatistics, 4(2), 249-264.
[4] C. Li and W. H. Wong (2001a), Model-based analysis of
oligonucleotide arrays: expression index computation
and outlier detection, Proceedings of the National
Academy of Science USA, 98, 31-36.
[5] C. Li and W. H. Wong (2001b), Model-based analysis of
oligonucleotide arrays: model validation, design issues
and standard error application, Genome Biology 2(8):
research 0032.1-0032.11.
[6] R. A. Irizarry, B. M. Bolstad, F. Collin, L. M. Cope,
B. Hobbs and T. P. Speed (2003), Summaries of
affymetrix GeneChip probe level data, Nucleic Acids
Research, 31(4), e15.
[7] B. Bolstad (2001), Probe level quantile normalization of
high density oligonucleotide array data, Division of
Biostatistics.
[8] B. Bolstad (2002), Comparing the effects of background,
normalization and summarization on gene expression
estimates.
[9] Affymetrix (2001), GeneChip arrays provide optimal
sensitivity and specificity for microarray expression
analysis, Affymetrix.
[10] B. M. Bolstad (2004), Low-level analysis of high-
density Oligonucleotide array data: background,
normalization and summarization.
[11] D. Holder, R. F. Raubertas, V. Bill Pikounis, V.
Svetnik and K. Soper, statistical analysis of high
density oligonucleotide arrays: a safer approach,
Merck Research Laboratories, WP37C-305, West Point, PA
19486.
[12] F. Naef, D. A. Lim, N. Patil and M. O. Magnasco
(2001),From features to expression: High-density
oligonucleotide array analysis revisited, Tech Report,
1, 1-9.
[13] R. Sasik, E. Calvo and J. Corbeil (2002), Statistical
analysis of high-density oligonucleotide arrays: a
multiplicative noise model, Bioinformatics 18(12),
1633-1640.
[14] dChip User's Manual (2005)
http://biosun1.harvard.edu/complab/dchip
[15] 薛慧芬 (2005), The research of normalization methods
for high density oligonucleotide array, Thesis at
National Chengchi University.
[16] S. Dudoit, Y. H. Yang, M. J. Callow and T. P. Speed
(2000), Statistical methods for identifying
differentially expressed genes in replicated cDNA
microarray experiments.
此全文未授權公開