Text- Independent Speaker Recognition System Using Noisy Data Set

المشروع: Other project

تفاصيل المشروع

Description

Nowadays more and more attention has been paid on speaker recognition (SR) field. Speaker recognition, which involves two applications: speaker identification and speaker verification, is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker and rsquo;s voice to verify their identity and control access to services such as voice dialling, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers [1]. Speaker verification (SV) is the processof determining whether the speaker identity is who the person claims to be. It performs a one-to-one comparison (it is also called binary decision [2]) between the features of an input voice and those of the claimed voice that is registered in the system. Speaker identification (SI) is the process of finding the identity of an unknown speaker by comparing his/her voice with voices of registered speakers in the database. It and rsquo;s a one-to-many comparison [2]. There are three main components in a basic structure of SI system: front-end processing, speaker modelling, and pattern matching. Front-end processing is used to highlight the relevant features and remove the irrelevant ones. M speaker models are scored in parallel and the most-likely one is selected. In practice, speaker recognition systems could be divided into text-dependent recognition and text-independent recognition systems. For text-dependent SR systems, speakers are only allowed to say some specific sentences or words, which are known to the system. On contrary, as for the text-independent SR systems, they could process freely spoken speech. Compared with text-dependent SR systems, text-independent SR systems are more flexible, but more complicated. and nbsp; and nbsp; and nbsp; and nbsp; As can be seen from above figure the first stage in speaker recognition is feature extraction. and nbsp; Features can be calculated in time-domain, frequency domain, or in both domains [3-4]. Features derived from spectrum of speech have proven to be the most effective in automatic systems [1]. and nbsp; The most widespread feature parameters used in research are the Short-Term Real Cepstrum (STRC) [5] and the Mel-Frequency Cepstral Coefficients (MFCC) [6], Linear Prediction Cepstral Coefficients (LPCC) [7]. The second stage of speaker recognition is the modelling of the speakers. The most frequently used approaches are Vector Quantization Modelling (VQM) [8], Neural Network Modelling (NNM) [9] and Hidden Markov Modelling (HMM) [10]. The main advantage of parametric modelling techniques like GMM is the restriction on the structure of the speaker models that allow an efficient use of the available training data in estimating the models. On the other hand, the major advantage of non-parametric techniques like VQM and NNM is that they do not put any restriction on the underlying speaker model. This proposed project is intended to build a Text-Independent Speaker Recognition Systems (Verification and Identification). Standard data sets are to be used in this project [13, 14]. The system is to be trained and tested with different standard sets of speaker recognition data. and nbsp; and nbsp;

Layman's description

Nowadays more and more attention has been paid on speaker recognition (SR) field. Speaker recognition, which involves two applications: speaker identification and speaker verification, is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker and rsquo;s voice to verify their identity and control access to services such as voice dialling, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers [1]. Speaker verification (SV) is the processof determining whether the speaker identity is who the person claims to be. It performs a one-to-one comparison (it is also called binary decision [2]) between the features of an input voice and those of the claimed voice that is registered in the system. Speaker identification (SI) is the process of finding the identity of an unknown speaker by comparing his/her voice with voices of registered speakers in the database. It and rsquo;s a one-to-many comparison [2]. There are three main components in a basic structure of SI system: front-end processing, speaker modelling, and pattern matching. Front-end processing is used to highlight the relevant features and remove the irrelevant ones. M speaker models are scored in parallel and the most-likely one is selected. In practice, speaker recognition systems could be divided into text-dependent recognition and text-independent recognition systems. For text-dependent SR systems, speakers are only allowed to say some specific sentences or words, which are known to the system. On contrary, as for the text-independent SR systems, they could process freely spoken speech. Compared with text-dependent SR systems, text-independent SR systems are more flexible, but more complicated. and nbsp; and nbsp; and nbsp; and nbsp; As can be seen from above figure the first stage in speaker recognition is feature extraction. and nbsp; Features can be calculated in time-domain, frequency domain, or in both domains [3-4]. Features derived from spectrum of speech have proven to be the most effective in automatic systems [1]. and nbsp; The most widespread feature parameters used in research are the Short-Term Real Cepstrum (STRC) [5] and the Mel-Frequency Cepstral Coefficients (MFCC) [6], Linear Prediction Cepstral Coefficients (LPCC) [7]. The second stage of speaker recognition is the modelling of the speakers. The most frequently used approaches are Vector Quantization Modelling (VQM) [8], Neural Network Modelling (NNM) [9] and Hidden Markov Modelling (HMM) [10]. The main advantage of parametric modelling techniques like GMM is the restriction on the structure of the speaker models that allow an efficient use of the available training data in estimating the models. On the other hand, the major advantage of non-parametric techniques like VQM and NNM is that they do not put any restriction on the underlying speaker model. This proposed project is intended to build a Text-Independent Speaker Recognition Systems (Verification and Identification). Standard data sets are to be used in this project [13, 14]. The system is to be trained and tested with different standard sets of speaker recognition data. and nbsp; and nbsp;
اختصارTTotP
الحالةلم يبدأ

بصمة

استكشف موضوعات البحث التي تناولها هذا المشروع. يتم إنشاء هذه الملصقات بناءً على الجوائز/المنح الأساسية. فهما يشكلان معًا بصمة فريدة.