跳到主要內容

簡易檢索 / 詳目顯示

研究生: 高文聰
Kao, Wen-Tsung
論文名稱: 基於i-Vector特徵之聲音風格分析
Analysis of Voice Styles Using i-Vector Features
指導教授: 廖文宏
Liao, Wen-Hung
口試委員: 廖文宏
Liao, Wen-Hung
廖峻鋒
Liao, Chun-Feng
花凱龍
Hua, Kai-Lung
學位類別: 碩士
Master
系所名稱: 理學院 - 資訊科學系碩士在職專班
Excutive Master Program of Computer Science
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 56
中文關鍵詞: 聲音風格機器學習模式分類i-VectorALIZE
外文關鍵詞: Sound style, Machine learning, Pattern recognition, I-Vector, ALIZE
DOI URL: http://doi.org/10.6814/THE.NCCU.EMCS.007.2018.B02
相關次數: 點閱:87下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 聲音的風格有若干常見的形容詞,但難以被精確定義。本論文試圖從語者辨識(Speaker Recognition)的觀點出發,針對不同的聲音風格進行分析,使用的方法為目前在語音辨識中常用的特徵值向量i-Vector,並搭配支援向量機(SVM)做分類。為了測試i-Vector對於聲音風格描述的可用性,在過程中我們事先做了許多的驗證,包含基本語者辨識、最短輸入聲音長度測試、白噪音對於語者驗證的影響、說話內容關聯性測試、聲音取樣率測試與配音員使用不同聲調對於風格的測試。確認特徵之相關性後,我們挑選日常生活中常見的八種聲音風格類型進行分類,分析結果是否具一致性,證實利用語者辨識系統也可以有效的辨識聲音的風格類型。


    Many adjectives have been used to describe voice characteristics, yet it is challenging to define sound styles precisely using quantitative measure. In this thesis, we attempt to tackle the sound style classification problem based on techniques designed for speaker recognition. Specifically, we employ i-Vector, a widely adopted feature in speaker identification together with support vector machine (SVM) for style classification. In order to verify the reliability of i-vector, we conducted a series of experiments, including basic speaker recognition function, minimum voice duration¸ noise sensitivity, context dependency, sensitivity to different sampling rates and style classification of samples from voice actors. The results indicate that i-Vector can indeed be utlilized to classify sound styles that are commonly perceived in daily life.

    第一章 緒論 1
    1.1 研究動機 1
    1.2 論文架構 4
    第二章 背景知識與相關研究 5
    2.1 聲音特徵 5
    2.1.1 梅爾倒頻譜係數 6
    2.2 語者模型 7
    2.2.1 高斯混合模型 7
    2.2.2 通用背景模型 9
    2.2.3 聯合因素分析 9
    2.2.4 i-Vector 11
    2.3 機器學習 13
    2.3.1 深度學習 13
    2.3.2 支援向量機 15
    2.4 小結 16
    第三章 研究方法 18
    3.1 工具探討 18
    3.1.1 ALIZE Toolkit 18
    3.1.2 LIBSVM 19
    3.2 前期研究 20
    3.2.1 資料前處理 21
    3.2.2 i-Vector功能基本驗證 21
    3.2.3 最短資料長度測試 24
    3.2.4 白噪音對於語者辨識的影響能 26
    3.2.5 不連續語音內容測試 30
    3.2.6 聲音取樣率測試 31
    3.2.7 配音員使用不同聲調對於聲音風格的影響 33
    3.3 研究架構 35
    3.3.1 風格定義 35
    3.3.2 資料來源 40
    3.4 目標設定 40
    第四章 研究過程與結果分析 41
    4.1 收集訓練資料 41
    4.2 訓練資料前處理 42
    4.2.1 i-Vector正規化 42
    4.2.2 SVM訓練及測試結果 43
    4.2.3 預測錯誤樣本分析 48
    4.3 聲音風格分析之應用 50
    4.3.1 使用電話錄音之聲音風格辨識 50
    4.3.2 電話錄音預測風格結果分析 51
    第五章 結論與未來研究方向 52
    5.1 結論 52
    5.2 未來研究方向 52
    參考文獻 54

    [1] Heap, Michael. "Neuro-linguistic programming." Hypnosis: Current clinical, experimental and forensic practices (1988): 268-280.
    [2] NIST, “Speaker Recognition”,
    https://www.nist.gov/itl/iad/mig/speaker-recognition
    [3] Tong, Rong, et al. "The IIR NIST 2006 Speaker Recognition System: Fusion of Acoustic and Tokenization Features." presentation in 5th Int. Symp. on Chinese Spoken Language Processing, ISCSLP. 2006.
    [4] Hasan, Md Rashidul, Mustafa Jamil, and M. G. R. M. S. Rahman. "Speaker identification using mel frequency cepstral coefficients." variations 1.4 (2004).
    [5] Reynolds, Douglas A., and Richard C. Rose. "Robust text-independent speaker identification using Gaussian mixture speaker models." IEEE transactions on speech and audio processing 3.1 (1995): 72-83.
    [6] Reynolds, Douglas A., Thomas F. Quatieri, and Robert B. Dunn. "Speaker verification using adapted Gaussian mixture models." Digital signal processing 10.1-3 (2000): 19-41.
    [7] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 14 (2005): 28-29.
    [8] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798.
    [9] AlplaGo, https://deepmind.com/research/alphago/
    [10] Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3 (1995): 273-297.
    [11] Franc, Vojtech, Alexander Zien, and Bernhard Schölkopf. "Support vector machines as probabilistic models." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.
    [12] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798
    [13] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 215 (2005).
    [14] Larcher, Anthony, et al. "I-vectors in the context of phonetically-constrained short utterances for speaker verification." Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012.
    [15] 陳嘉穎,“應用因素分析與識別向量於語音情緒辨識”, 國立中山大學碩士論文, 2016.
    [16] Bonastre, J-F., Frédéric Wils, and Sylvain Meignier. "ALIZE, a free toolkit for speaker recognition." Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP'05). IEEE International Conference on. Vol. 1. IEEE, 2005.
    [17] Larcher, Anthony, et al. "ALIZE 3.0-open source toolkit for state-of-the-art speaker recognition." Interspeech. 2013.
    [18] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM transactions on intelligent systems and technology (TIST) 2.3 (2011): 27
    [19] SoX, “Sound eXchange”, http://sox.sourceforge.net
    [20] ALIZÉ, http://alize.univ-avignon.fr/
    [21] SPro, http://www.irisa.fr/metiss/guig/spro/
    [22] Audacity, https://www.audacityteam.org/
    [23] Haykin, Simon, and Zhe Chen. "The cocktail party problem." Neural computation 17.9 (2005): 1875-1902.
    [24] Hyvärinen, Aapo, Juha Karhunen, and Erkki Oja. Independent component analysis. Vol. 46. John Wiley & Sons, 2004.
    [25] FFmpeg, https://www.ffmpeg.org/
    [26] 娃娃音,維基百科,https://zh.wikipedia.org/wiki/%E5%A8%83%E5%A8%83%E9%9F%B3
    [27] Youtube, https://www.youtube.com/
    [28] 愛樂電台,https://www.e-classical.com.tw/index.html
    [29] 警察廣播電台,https://www.pbs.gov.tw/cht/index.php
    [30] Garcia-Romero, Daniel, and Carol Y. Espy-Wilson. "Analysis of i-vector length normalization in speaker recognition systems." Twelfth Annual Conference of the International Speech Communication Association. 2011.
    [31] 百度語音,http://fanyi.baidu.com/#auto/zh/
    [32] Google語音, https://translate.google.com.tw/

    QR CODE
    :::