跳到主要內容

簡易檢索 / 詳目顯示

研究生: 王神鐸
Armando Serrato
論文名稱: 總體基因組 Hi-C 接觸圖網絡分析及其重組總體基因組品質預測
Metagenomic Hi-C Contact Map Network Analysis and Prediction of Recovered Metagenome Assembled Genome Quality
指導教授: 張家銘
Chang, Jia-Ming
口試委員: 蘇家玉
張詠淳
學位類別: 碩士
Master
系所名稱: 資訊學院 - 資訊科學系
Department of Computer Science
論文出版年: 2024
畢業學年度: 113
語文別: 英文
論文頁數: 39
中文關鍵詞: Hi-C 接觸圖總體基因體學總體基因體組裝基因體網路相關指標機器學習生物資料科學
外文關鍵詞: Hi-C Contact Maps, Metagenomics, Metagenome-Assembled genomes, Network Theory, Machine Learning, Bioinformatics
相關次數: 點閱:39下載:12
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,總體基因體學利用 Hi-C 定序數據從複雜的微生物群落中收復總體基因體組裝基因體 (MAGs)。本研究進一步驗證了先前提出的假設,即可以通過網路相關的指標來預測 MAG 質量。我們深入分析了總體基因體Hi-C 接觸圖,提取了額外的網路屬性,並整合了來自群聚基因組的生物信息,以提升預測表現。這種網路與生物屬性相結合的特徵在機器學習模型中的應用,不僅增強了 MAG 質量預測,還提供了對微生物群落動態的見解。


    Recent advancements in Metagenomics leverage Hi-C sequencing data to recover Metagenome Assembled Genomes (MAGs) from complex microbial communities. This research advances MAG quality prediction by building upon previous hypotheses that network-based metrics could be used to predict MAG quality. Deeper analysis of metagenomic HI-C contact maps extracts additional network properties and integrates biological information from the clustered genomes, enhancing predictive performance. This combination of network and biological properties used as features in Machine Learning Models, enhances MAG quality prediction and offers insights into microbial community dynamics.

    1. Introduction 1
    1.1. Metagenomic Hi-C 1
    1.2. Metagenome Assembled Genome Quality Assessment 2
    1.3. Previous work 4
    1.4. Experiment Design 6
    2. Methods 8
    2.1. Dataset and Genome Binning 8
    2.2. Network Analysis 10
    2.3. Statistical Significance Testing 12
    2.4. Quality assessment prediction 12
    3. Results 17
    3.1. Dataset Variations 17
    3.2. Small-World Properties Analysis 17
    3.3. Degree Properties Analysis 18
    4. Influence and Connectivity Analysis 20
    4.1. CheckM Features 21
    4.2. Statistical Significance Analysis of Network and Biological Properties 22
    4.3. Feature Generation and Quality Prediction 25
    4.4. Feature Importance 29
    4.5. Prediction Across Datasets 33
    5. Discussion 36
    6. Future Work 37
    7. References 38

    Sait M, Hugenholtz P, Janssen PH. Cultivation of globally distributed soil bacteria from phylogenetic lineages previously only detected in cultivation-independent surveys. Environ Microbiol. 2002; 4(11):654–66.
    Hugenholtz et al. (2008) Metagenomics. Nature, 455, 481–483.
    Burton et al. (2014) Species-Level Deconvolution of Metagenome Assemblies with Hi-C–Based Contact Probability Maps. G3: GENES, GENOMES, GENETICS, 4, 7.
    Lieberman-Aiden et al. (2009) Comprehensive mapping of long range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293.
    DeMaere et al. (2019) bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biology, 20, 46.
    Cheng et al. (2020) Bin3C_SLM: Deconvoluting metagenomic assemblies via Hi-C connect networks.
    Stalder, T., Press, M.O., Sullivan, S. et al. Linking the resistome and plasmidome to the microbiome. ISME J 13, 2437–2446 (2019). https://doi.org/10.1038/s41396-019-0446-4
    Du, Y., Sun, F. HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol 23, 63 (2022). https://doi.org/10.1186/s13059-022-02626-w
    Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055.
    Yuting Hsu (2022) The Network Analysis of the metagenomic Hi-C contact map and its downstream metagenome assembly
    Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–2
    Waltman L, Eck NJ van. A smart local moving algorithm for large-scale modularity-based community detection. European Phys J B. 2013;86:471.
    Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Statistical Mech Theory Exp. 2008;2008:P10008.
    Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc National Acad Sci. 2008;105:1118–23.
    Ke Zhang, Chenxi Wang, Liping Sun, Jie Zheng, Prediction of gene co-expression from chromatin contacts with graph attention network, Bioinformatics, Volume 38, Issue 19, October 2022, Pages 4457–4465, https://doi.org/10.1093/bioinformatics/btac535
    Ernest YB, Daniel AA. A Review of the Logistic Regression Model with Emphasis on Medical Research. J Data Analysis Information Process. 2019;07:190–207.
    Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119
    Gao, W., Lin, W., Li, Q. et al. Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder. Nat Protoc 19, 2803–2830 (2024). https://doi.org/10.1038/s41596-024-00999-9
    Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.
    Manchanda, N., Portwood, J.L., Woodhouse, M.R. et al. GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics 21, 193 (2020). https://doi.org/10.1186/s12864-020-6568-2
    Hunt, M., Kikuchi, T., Sanders, M. et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol 14, R47 (2013). https://doi.org/10.1186/gb-2013-14-5-r47

    QR CODE
    :::