跳到主要內容

簡易檢索 / 詳目顯示

研究生: 楊鎮遠
Yang, Jhen-Yuan
論文名稱: 深度學習應用在偵測拓撲結構域
Topology Association Domain Identification using Deep Learning
指導教授: 張家銘
Chang, Jia-Ming
口試委員: 陳鯨太
Chen, Ching-Tai
蘇家玉
Emily, Chia-Yu Su
學位類別: 碩士
Master
系所名稱: 理學院 - 資訊科學系
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 38
中文關鍵詞: 拓撲關聯域TADHi-C染色體組織深度學習
外文關鍵詞: Topology Association Domain, TAD, Hi-C, Chromosome organization, Deep learning
DOI URL: http://doi.org/10.6814/NCCU201901133
相關次數: 點閱:100下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • ● 背景:近年來,越來越多的證據表明三維染色體結構在基因組功能中起著重要作用。拓撲關聯域(TAD)是一種自相互作用區域,已被證明是染色體的結構單元。然而,在高通量染色體構象捕獲圖中鑑定TAD 是一項計算挑戰。
    ● 結果:我們提出了一個新問題,即TAD 分類,而不是原始的TAD 識別。具體地,我們將Hi-C 圖考慮為圖像,使得TAD 分類是使用兩個深度學習模型,卷積神經網絡和殘差神經網絡來解決的圖像分類問題。此外,我們設計了一種合乎邏輯的方法來生成非TAD 數據,用於二元分類問題。通過跨物種和細胞類型驗證,深度學習模型的表現
    良好,AUC> 0.80。
    ● 結論:TAD 在進化過程中被證明是保守的。有趣的是,我們的結果證實TAD 分類模型是實用的跨物種。從圖像分類的角度來看,它表明人與鼠之間的TAD 顯示了共同的模式。我們的方法可以成為測試Hi-C 圖中TAD 變化或保存的新方法。例如,如果兩個分類模型是可交換的,則保留兩個Hi-C 圖的TAD


    ● Background: In the last years, increasing evidence indicates that three-dimensional chromosome structure plays important rule in genomic function. A Topologically Associating Domain (TAD), a self-interacting region, has been shown as a structure unit of chromosome. However, it is a computational challenge to identify TADs in high-throughput chromosome conformation capture map.
    ● Results: We proposed a novel problem, TAD classification, instead of original TAD identification. Specifically, we consider Hi-C map as image such that TAD classification is an image classification problem which is solved using two deep learning models, convolutional neural network and residual neural network. Besides, we designed an elegant way to generate non-TAD data for binary classification problem. The performance of deep learning models is quite promising, AUC > 0.80, through cross species and cell types validation.
    ● Conclusions: TAD has been shown conserved during evolution. Interestingly, our results confirm TAD classification model is practical cross species. It indicates TADs between human and mouse show common pattern from point of view of image classification. Our approach could be a new way to test variation or conservation of TADs among Hi-C maps. For example, TADs of two Hi-C maps are conserved if two classification models are exchangeable.

    List of Figures ii
    List of Tables iii
    Abstract iv
    Keywords iv
    1. Introduction 1
    1.1 Overview chromosome conformation capture 1
    1.2 High-throughput chromosome conformation capture 1
    1.3 Topologically Associating Domains 2
    1.4 CTCF 4
    1.5 Deep learning algorithm 5
    1.6 Fully Convolutional Neural Network 6
    1.7 Residual Neural Network 7
    1.8 Squeeze-and-Excitation Net 8
    1.9 Deep learning with Hi-C 9
    2. Methods 10
    2.1 Data preparation 11
    2.2 non-TAD generation 11
    2.3 Deep learning models 12
    2.3.1 Model architectures 12
    2.4 Evaluation 14
    2.4.1 Experimental designs 14
    2.4.2 Metrics 15
    2.5 TAD caller by Dynamic programming 16
    3. Results 18
    3.1 Five-cross validation in species-specific dataset 18
    3.2 Prediction error analysis 26
    3.3 Data preprocessing 28
    3.4 Evaluate model 32
    4. Discussion 35
    5. Conclusion 36
    6. References 37

    1. Bonev, B. & Cavalli, G. Organization and function of the 3D genome. Nat Rev Genet. 17:661–78. 2016.
    2. Dekker, J. et al. Capturing chromosome conformation. Science. 295(5558):1306–11. 2002.
    3. Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-ChIP (4C). Nat Genet. 38:1348–54. 2006.
    4. Dostie, J. & Dekker, J. Mapping networks of physical interactions between genomic ele-ments using 5C technology. Nat Protoc. 2:988–1002. 2007.
    5. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals fold-ing principles of the human genome. Science. 326(5950):289–93. 2009.
    6. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 485, pp. 376-380. 2012.
    7. Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblas-toma. Nature. 526:700–704. 2015.
    8. Zufferey, M. et al. Comparison of computational methods for the identification of topologically associating domains. Genome Biol. 19(1):217. 2018.
    9. van Berkum, N.L. et al. Hi-C: a method to study the three-dimensional architecture of ge-nomes. J Vis Exp. 39:pii:1869. 2010.
    10. Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 148, 458–472. 2012.
    11. Matharu, N. & Ahituv, N. Minor. Loops in major folds: enhancer-promoter looping, chromatin restructuring, and their association with transcriptional regulation and disease. PLoS Genet. 11: e1005640. 2015.
    12. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern.1980; 36, 193–202
    13. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. IEEE. 86(11):2278–2324. 1998.
    14. Krizhevsky, A., Sutskever, I., and Hinton, G. ImageNet classification with deep convolutional neural networks. NIPS. 2012.
    15. Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic bi-ases to characterize global chromosomal architecture. Nature Genet. 2012; 43, 1059–1065
    16. Hi-C project at Ren Lab, http://chromosome.sdsc.edu/mouse/hi-c/download.html
    17. Pal, K., Forcato, M., and Ferrari, F. Hi-C analysis: from data generation to integration. Bio-phys Rev, 11. pp. 67-78. 2019.
    18. Dali, R. & Blanchette, M. A critical assessment of topologically associating domain predic-tion tools. Nucleic Acids Res. 45, 2994–3005. 2017.
    19. Hu, J. et al. Squeeze-and-excitation networks. CVPR.2018
    20. Ioffe,S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR; 2015.
    21. He, K. et al. Deep residual learning for image recognition. CVPR. 2016
    22. Y. Shen. et al. A map of the cis-regulatory sequences in the mouse genome Nature, 488, pp. 116-120. 2012
    23. Liu T, Wang Z. HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data. Bioinformatics. 2019
    24. Z. Wang, W. Yan, and T. Oates. Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline. CoRR, abs/1611.06455. 2016.
    25. Zhou, B. et al. Learning deep features for discriminative localization. CVPR. 2014
    26. Zhang, Y. et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun 9, 750. 2018.
    27. Szabo, Q. et al. TADs are 3D structural units of higher-order chromosome organization in Drosophila. Science Advances 4, eaar8082. 2018.
    28. Henderson, J. et al. Accurate prediction of boundaries of high resolution topologically associated domains (TADs) in fruit flies using deep learning. Nucleic Acids Res. 47, e78. 2019.
    29. Schuettengruber, B. et al. Cooperativity, specificity, and evolutionary stability of Polycomb targeting in Drosophila. Cell Rep 9, 219–33. 2014.
    30. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–80. 2014.
    31. Bonev, B. et al. Multiscale 3D Genome Rewiring during Mouse Neural Development. Cell 171, 557–572.e24. 2017.

    QR CODE
    :::