跳到主要內容

簡易檢索 / 詳目顯示

研究生: 王健源
Wang,Chien-yuan
論文名稱: 兩階段特徵選取法在蛋白質質譜儀資料之應用
A Two-Stage Approach of Feature Selection on Proteomic Spectra Data
指導教授: 張源俊
郭訓志
學位類別: 碩士
Master
系所名稱: 商學院 - 統計學系
Department of Statistics
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 52
中文關鍵詞: 特徵選取基因演算法表面增強雷射脫附游離/飛行時間質譜支援向量機
外文關鍵詞: Feature Selection, Genetic Algorithm (GA), SELDI, Support Vector Machines (SVM)
相關次數: 點閱:72下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 藉由「早期發現,早期治療」的方式,我們可以降低癌症的死亡率。因此找出與癌症病變有關的生物標記以期及早發現與治療是一項重要的工作。本研究分析了包含正常人以及攝護腺癌症病人實際的蛋白質質譜資料,而這些蛋白質質譜資料是來自於表面強化雷射解吸電離飛行質譜技術(SELDI-TOF MS)的蛋白質晶片實驗。表面增強雷射脫附遊離飛行時間質譜技術可有效地留存生物樣本的蛋白質特徵。如果沒有經過適當的事前處理步驟以消除實驗雜訊,ㄧ 個質譜中可能包含多於數百或數千的特徵變數。為了加速對於可能的蛋白質生物標記的搜尋,我們只考慮可以區分癌症病人與正常人的特徵變數。
    基因演算法是一種類似生物基因演化的總體最佳化搜尋機制,它可以有效地在高維度空間中去尋找可能的最佳解。本研究中,我們利用仿基因演算法(GAL)進行蛋白質的特徵選取以區分癌症病人與正常人。另外,我們提出兩種兩階段仿基因演算法(TSGAL),以嘗試改善仿基因演算法的缺點。


    Early detection and diagnosis can effectively reduce the mortality of cancer. The discovery of biomarkers for the early detection and diagnosis of cancer is thus an important task. In this study, a real proteomic spectra data set of prostate cancer patients and normal patients was analyzed. The data were collected from a Surface-Enhanced Laser Desorption/Ionization Time-Of-Flight Mass Spectrometry (SELDI-TOF MS) experiment. The SELDI-TOF MS technology captures protein features in a biological sample. Without suitable pre-processing steps to remove experimental noise, a mass spectrum could consists of more than hundreds or thousands of peaks. To narrow down the search for possible protein biomarkers, only those features that can distinguish between cancer and normal patients are selected.
    Genetic Algorithm (GA) is a global optimization procedure that uses an analogy of the genetic evolution of biological organisms. It’s shown that GA is effective in searching complex high-dimensional space. In this study, we consider GA-Like algorithm (GAL) for feature selection on proteomic spectra data in classifying prostate cancer patients from normal patients. In addition, we propose two types of Two-Stage GAL algorithm (TSGAL) to improve the GAL.

    1 Introduction 1
    2 Literature Review 3
    3 Descriptions of Data 9
    3.1 SELDI 9
    3.2 SELDI-TOF MS spectra of the prostate cancer 10
    3.3 Preprocessing of the Raw Spectra 12
    3.3.1 Baseline Subtraction 12
    3.3.2 Normalization 13
    3.3.3 Peak detection 13
    3.3.4 Peak alignment 13
    4 Methodologies 18
    4.1 SVM Classifier 19
    4.2 Genetic Algorithm (GA) 20
    4.2.1 Chromosome 23
    4.2.2 Fitness Function 23
    4.2.3 GA operators 23
    4.2.4 Termination 25
    4.3 GA-Like algorithm (GAL) 26
    4.4 Two-stage GAL algorithm (TSGAL) 26
    5 Data Analysis 30
    5.1 GA-Like for Feature Selection 30
    5.2 TSGAL for Feature Selection 35
    5.3 Comparisons between GAL and TSGALs 38
    6 Results and Discussion 40
    Reference 43
    Appendices 46

    Alpaydm, E.(2004). Introduction To Machine Learning. The MIT Press.

    Adam, B.L., Qu, Y., Davis, J.W., Ward, M.D., Clements, M.A., Cazares, L.H., Semmes, O.J., Schellhanmmer, P.F., Yasui, Y., Feng, Z., and Wright, G.L.(2002). Serum Protein Fingerprinting Coupled with a Pattern-Matching Algorithm Distinguishes Prostate Cancer from Benign Prostate Hyperplasia and Healthy Men. CANCER RESEARCH 62(13), 3609–3614.

    Baggerly, K.A., Morris, J.S., and Coombes, K.R.(2004). Reproducibility of SELDI-TOF Protein Patterns in Serum: Comparing Data Sets from Different Experiments. Bioinformatics 20(5), 777-785.

    Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J.(1984). Classification and Regression Trees. Belmont, CA: Wadsworth International Group, 203-215.

    Crosby J. L.(1967). Computers in the study of evolution. Science Progress (55), 279–292.

    Fraser, A. S.(1957). Simulation of Genetic Systems by Automatic Digital Computers—I: Introduction. Australian Journal of Biological Sciences (10), 484-491.

    Fogel, D. B.(1998). Evolutionary Computation: The Fossil Record. New York: IEEE Press.

    Freund, Y., and Schapire, R.(1997) A Decision-Theoretical Generalization of On-Line Learning and an Application to Boosting. Computer System Science (55), 119-139.

    Goldberg, D.E.(1989) Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, New Work.

    Gretzer, M.B., Chan, D.W., van Rootselaar, C.L., Rosenzweig, J.M., Dalrymple, S., Mangold, L.A., Partin, A.W., and Veltri, R.W.(2004). Proteomic Analysis of Dunning Prostate Cancer Cell Lines With Variable Metastatic Potential Using SELDI-TOF. Prostate (60), 325-331.
    Holland, J.H.(1975) Adaptation in Natural and Artificial Systems, The University of Michigan Press; Ann Arbor, IL.

    Honkela T.(1998). Self-Organizing Maps in Natural Language Processing. Helsinki University of Technology Neural Networks Research Centre.

    Hutchens, T.W., Yip, T.T.(1993). New Desorption Strategies for the Mass Spectrometric Analysis of Macromolecules. Rapid Commun Mass Spectrom (7), 576-580.

    Lilien, R., Farid, H., and Donald, B.(2003). Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum. Journal of Computational Biology 10(6):925-946.

    Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., and Liotta, L.A.(2002a). Use of Proteomic Patterns in Serum to Identify Ovarian Cancer. Lancet 359(9306), 572–577.

    Petricoin, E.F., Ornstein, D.K., Paweletz, C.P., Ardekani, A., Hackett, P.S., Velassco, A., Trucco, C., Wiegand, L., Wood, K., Simone, C.B., Levine, P.J., Linehan, W.M., EmmertBuck, M.R., Steinberg, S.M., Kohn, E.C., and Liotta, L.A.(2002b). Serum Proteomic Patterns for Detection of Prostate Cancer. Journal of the National Cancer Institute 94(20), 1576–1578.

    Peng, S., Xu, Q., Ling, X.B., Peng, X., Du, W., and Chen, L.(2003). Molecular Classification of Cancer Types from Microarray Data Using the Combination of Genetic Algorithms and Support Vector Machines. FEBS Letters 555, 358-362.

    Qu, Y., Adam, B.L., Yasui, Y., Ward, M.D., Cazares, L.H., Schellhammer, P.F., Feng, Z., Semmes, O.J., and Wright, G.L.(2002). Boosted Decision Tree Analysis of Surface-Enhanced Laser Desorption/Ionization Mass Spectral Serum Profiles Discriminates Prostate Cancer from Noncancer Patients. Clin Chem.Clinical Chemistry 48(10), 1835–1843.

    Tuszynski, J.(2006). Processing & Classification of Protein Mass Spectra (SELDI) Data. The caMassClass Package of R software.

    Tong, W, Xie, Q, Hong, H, Fang, H., Shi, L., Perkins, R., and Petricoin, E.F.(2004). Using Decision Forest to Classify Prostate Cancer Samples on the Basis of SELDI-TOF MS Data: Assessing Chance Correlation and Prediction Confidence. Environmental Health Perspectives 112(16),

    無法下載圖示 此全文未授權公開
    QR CODE
    :::