| 研究生: |
吳映函 Wu, Yin-Han |
|---|---|
| 論文名稱: |
HiCSeg:針對不同樣本和物種的互動式基因體分割 HiCSeg: an interactive genome segmentation cross samples and species |
| 指導教授: |
張家銘
Chang, Jia-Ming |
| 口試委員: |
蘇家玉
陳世淯 |
| 學位類別: |
碩士
Master |
| 系所名稱: |
理學院 - 資訊科學系 |
| 論文出版年: | 2021 |
| 畢業學年度: | 109 |
| 語文別: | 中文 |
| 論文頁數: | 45 |
| 中文關鍵詞: | 基因體分割 、Hi-C 、ChIP-Seq |
| 外文關鍵詞: | Genome segmentation, Hi-C, ChIP-Seq |
| DOI URL: | http://doi.org/10.6814/NCCU202101389 |
| 相關次數: | 點閱:42 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Hi-C的全基因組染色體接觸可用於研究染色體的更高級別組織,例如隔室或拓撲關聯域。根據哺乳動物Hi-C圖的主成分分析可得到數據中兩個區室A和B。TAD或隔室可被視為基因組的分段。通常我們會使用基因體分割進行數據壓縮,並在不同細胞類型中整理出不同的修飾。我們比較了不同解析度下的PCA結果以找出差異,然後引入ChIP-Seq數據進行進一步分析。我們還引進了其他兩種進行聚類的方法,Louvain和Leiden。它們不僅可以與PCA的結果進行比較,還可以計算出網路的相關性。此外,我們可以基於結合ChIP-Seq和Hi-C的資訊使用兩者相加及網路融合來分割基因組。
The genome-wide chromosomal contact by Hi-C can be used to investigate the higher-level organization of chromosomes, such as compartments or topologically associating domains (TAD). Hi-C data revealed two compartments, A and B, based on principal component analysis (PCA) of Hi-C maps in mammals. TAD or compartment can be considered as a segmentation of the genome. Generally, we use genome segmentation for data compression and sort out different modifications in different cell types. We compared the PCA results in various resolutions to determine the difference and introduced the ChIP-Seq data for further analysis. We also introduce other methods to do clustering, which are the Louvain and Leiden methods. They can not only compare with the result of PCA but also figure out the correlation of networks. Furthermore, we can segment the genome based on integrated ChIP-Seq and Hi-C information using adding function and network fusion.
Introduction 1
High-throughput Chromatin Conformation Capture (Hi-C) 1
Chromatin immunoprecipitation sequence (ChIP-Seq) 1
ChromHMM 2
Similarity Network Fusion (SNF) 3
Integration of different types of data 3
Methods 5
Overview 5
Data Sets 6
Hi-C 6
ChIP-Seq 6
Hi-C Contact Matrix preprocessing 6
Knight-Ruiz (KR) normalization processing 7
ChIP-Seq binning 7
Correlation Matrix 8
Principal Component Analysis (PCA) 9
Network transform 9
Hi-C and ChIP-Seq Network fusion 10
Network clustering 11
Louvain method 11
Leiden method 12
Results 13
A/B compartment reproducibility 13
KR normalization effect 13
The resolution influence the runtime 15
The explanation proportion of PCA 16
Hi-C network cluster 18
Louvain cluster of Hi-C network 18
Cluster Hi-C data by Leiden 22
The consistency between network cluster and PCA decomposition 24
ChIP-Seq data 26
Cluster ChIP-Seq data by Louvain 26
Cluster ChIP-Seq network by Leiden 31
Further analysis of the A/B compartment 36
Correlation of Hi-C clusters and ChIP-Seq clusters 37
Combination of ChIP-Seq and Hi-C 39
Discussion and Conclusion 43
References 44
Balazs, R. (2014). Epigenetic mechanisms in Alzheimer’s disease. Degenerative neurological and neuromuscular disease, 4, 85.
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008 (10), P10008.
Bo Wang, Aziz M Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin Haibe-Kains & Anna Goldenberg (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature Methods volume 11, 333–337.
ChromHMM: Chromatin state discovery and characterization. http://compbio.mit.edu/ChromHMM/
Community detection for NetworkX’s documentation (2010). https://Python-louvain.readthedocs.io/en/latest/
Dekker,J. et al. (2002) Capturing chromosome conformation. Science, 295, 1306–11.
Eigenvector, Juicer (2017). https://github.com/aidenlab/juicer/wiki/Eigenvector
ENCODE: Encyclopedia of DNA Elements. https://www.encodeproject.org/
Eugenio Marco1, Wouter Meuleman, Jialiang Huang, Kimberly Glass, Luca Pinello, Jianrong Wang,Manolis Kellis & Guo-Cheng Yuan (2017). Multi-scale chromatin state annotation using a hierarchical hidden Markov model. Nature communications. DOI: 10.1038/ncomms15011
Illumina et al. (2007) Pub. No. 770-2007-007 Current as of 26 November 2007. Whole-Genome Chromatin IP Sequencing (ChIP-Seq).
Introduction of dataset preprocessing (2014). File: GSE63525_GM12878_combined_README.rtf. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525
Kloetgen, A., Thandapani, P., Ntziachristos, P., Ghebrechristos, Y., Nomikou, S., Lazaris, C., ... & Tsirigos, A. (2020). Three-dimensional chromatin landscapes in T cell acute lymphoblastic leukemia. Nature genetics, 52(4), 388-400.
Lan, X., Witt, H., Katsumura, K., Ye, Z., Wang, Q., Bresnick, E. H., ... & Jin, V. X. (2012). Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages. Nucleic acids research, 40(16), 7690-7704.
Lieberman-Aiden E, Van Berkum N L, Williams L, et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 326, 289–293 (2009).
Lin Liu, Yiqian Zhang, Jianxing Feng, Ning Zheng, Junfeng Yin, Yong Zhang (2012). GeSICA: genome segmentation from intra-chromosomal associations. BMC Genomics. 2012 May 4;13:164. doi: 10.1186/1471-2164-13-164.
Luo, Z., Wang, X., Jiang, H., Wang, R., Chen, J., Chen, Y., ... & Song, X. (2020). Reorganized 3D genome structures support transcriptional regulation in mouse spermatogenesis. iScience, 23(4), 101034.
Network fusion. https://nbisweden.github.io/workshop_omics_integration/session_nmf/SNF_main.html
Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680.
SIMILARITY NETWORK FUSION(SNF). http://compbio.cs.toronto.edu/SNF/SNF/Software.html
Strahl, B. D., & Allis, C. D. (2000). The language of covalent histone modifications. Nature, 403(6765), 41-45.
Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z
Van Berkum, Nynke L et al. (2010) Hi-C: a method to study the three-dimensional architecture of genomes. Journal of visualized experiments : JoVE ,39, 1869.
Visualization tool: Juicebox. https://www.aidenlab.org/juicebox/
Waltman, L., & Van Eck, N. J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. The European physical journal B, 86(11), 1-14.
Weighted correlation network analysis. https://en.wikipedia.org/wiki/Weighted_correlation_network_analysis
networkanalysis, CWTSLeiden (2020). https://github.com/CWTSLeiden/networkanalysis
全文公開日期 2026/08/17