College essays: Speaker Recognition System Pattern Classification

verbalizer acquaintance System conventionalism potpourriA Study on loudtalker Recognition System and Pattern potpourri TechniquesDr E.Chandra,K.Manikandan,M.S.KalaivaniAbstract Speaker Recognition is the procedure of identifying a psyche through his/her component part signals or talk waves. Pattern sorting plays a vital role in verbaliser unit recognition. Pattern classification is the dish out of grouping the descriptors, which argon sharing the same set of properties. This paper deals with verbaliser recognition dodging and over view of Pattern classification techniques DTW, GMM and SVM.Key word of honors Speaker Recognition System, Dynamic Time distort (DTW), Gaussian Mixture sample (GMM), Support Vector apparatus (SVM).INTRODUCTIONSpeaker Recognition is the process of identifying a person through his/her voice signals 1 or voice communication waves. It screw be classified into deuce categories, loudspeaker musical arrangement identification and speaker proof. In speaker identification task, a spoken language vocalism of an unknown speaker is comp bed with set of valid users. Thebest pock is utilise to identify the speaker. Similarly, in speaker checkout the unknown speaker commencement ceremony claims identity, and the claimed stumper is therefore apply for identification. If the match is above a pre define threshold, the identity claim is judge The speech use for these task can be either textual matterual matter dependent or text independent. In text dependent employment the system has the prior knowledge of the text to be spoken. The user bequeath speak the same text as it is in the predefined text. In a text-independent application, there is no prior knowledge by the system of the text to be spoken.Pattern classification plays a vital role in speaker recognition. The term Pattern defines the objects of interest. In this paper the sequence of acoustical vectors, extracted from comment speech argon taken as pat terns. Pattern classification is the process of grouping the patterns, which atomic number 18 sharing the same set of properties. It plays a vital role in speaker recognition system. The result of pattern classification decides whether to accept or reject a speaker. Several question efforts micturate been done in pattern classification. Most of the works found on generative framework. There atomic number 18 Dynamic Time warp (DTW) 3, Hidden Markov Models (HMM) , Vector Quantization (VQ) 4, Gaussian mixture lay (GMM) 5 and so forth.Generative model is for indiscriminately generating observed data, with some hidden parameters. Because of the randomly generating observed data functions, they are not able to provide a apparatus that can directly optimize discrimination.Support vector machine was introducing as an alternative classifier for speaker verification. 6. In machine learning SVM is a new tool, which is used for hard classification problems in several field of applica tion. This tool is capable to deal with the samples of higher dimensionality. In speaker verification binary decision is needed, since SVM is discriminative binary classifier it can distinguish a complete utterance in a single step.This paper is think as follows. In section 2 speaker recognition system, in section 3, Pattern Classification, AND overview of DTW, GMM, and SVM techniques .section 4 Conclusion.SPEAKER RECOGNITION SYSTEMSpeaker recognition categorized into verification and identification. Speaker Recognition system consists of dickens stages .speaker verification and speaker identification. Speaker verification is 11 match, where the voice print is matched with one template. But speaker identification is 1N match, where the input speech is matched with to a greater extent than one templates. Speaker verification consists of five steps. 1. Input data acquisition 2.feature extraction 3.pattern interconnected 4.decision making 5.generate speaker models. build 1 Speak er recognition systemIn the first step sample speech is acquired in a controlled manner from the user. The speaker recognition system will process the speech signals and extract the speaker discriminatory tuition. This information forms a speaker model. At the beat of verification process, a sample voice print is acquired from the user. The speaker recognition system will extract the features from the input speech and compared withpredefined model. This process is called pattern matching.DC Offset Removal and Silence RemovalSpeech data are discrete-time speech signals, carry some redundant constant offset called DC offset 8.The values of DC offset affect the information ,extracted from the speech signals. Silence haul ups are audio frames of background noise with low skill level .silence removal is the process of skying the silence period from the speech. The signal might in each speech frame is calculated by victimization equality (1).M Number of samples in a speech frames, N- tally number of speech frames. Threshold level is determined by use the equation (2)Threshold = Emin + 0.1 (Emax Emin) (2)Emax and Emin are the lowest and grea rivulet values of the N segments. common pattern tree 2. Speech quest before Silence RemovalFig 3. Speech charge subsequently Silence RemovalThis technique is used to enhance the high frequencies of the speech signal. The aim of this technique is to spectrally flatten the speech signal that is to increase the relative energy of its high frequency spectrum. The quest devil factors decides the need of Pre-emphasis technique.1.Speech Signals generally contains more speaker specific information in higher frequencies 9. 2. If the speech signal energy decreases the frequency increases .This do the feature extraction process to focus all the aspects of the voice signals. Pre-emphasis is employ as first order finite Impulse Response filter, defined asH(Z) = 1-0.95 Z-1 (3)The below example represents speech signals befo re and after Pre-emphasizing.Fig 4. Speech Signal before Pre-emphasizingFig 5. Speech Signal after Pre-emphasizingWindowing and characteristic ExtractionThe technique windowing is used to minimize the signal discontinuities at beginning and end of each frame. It is used to smooth the signal and makes the frame more flexible for spectral analysis. The following equation is used in windowing technique.y1(n) = x (n)w(n), 0 n N-1 (4) N- Number of samples in each frame.The equation for Hamming window is(5)There is large variability in the speech signal, which are taken for processing. to reduce this variability ,feature extraction technique is needed. MFCC has been wide used as the feature extraction technique for automatic speaker recognition. Davis and Mermelstein reported that Mel-frequency cepstral Coefficients (MFCC) provided better performance than other features in 1980 10.Fig 6. Feature ExtractionMFCC technique divides the input signal into short frames and apply the windowing te chniques, to discard the discontinuities at edges of the frames. In fast Fourier transform (FFT) phase, it converts the signal to frequency discip track and after that Mel scale filter bank is applied to the resultant frames. later on that, Logarithm of the signal is passed to the inverse DFT function converting the signal back to time domain.PATTERN CLASSIFICATIONPattern classification involves in computing a match score in speaker recognition system. The term match score refers the similarity of the input feature vectors to some model. Speaker models are built from the features extracted from the speech signal. Based on the feature extraction a model of the voice is generated and stored in the speaker recognition system. To validate a user the matching algorithm compares the input voice signal with the model of the claimed user. In this paper three techniques in pattern classification ware been compared. Those three major(ip) techniques are DTW, GMM and SVM.Dynamic Time Warping This considerably known algorithm is used in some areas. It is currently used in Speech recognition,sign language recognition and gestures recognition, handwriting and online signature matching ,data mining and time series clustering, surveillance , protein sequence alignment and chemic engineering , music and signal processing . Dynamic Time Warping algorithm is proposed by Sadaoki Furui in 1981.This algorithm measures the similarity surrounded by two series which may vary in time and speed. This algorithm finds an best match between two given sequences. The average of the two patterns is taken to form a new template. This process is repeated until all the nurture utterances have been combined into a single template. This technique matches a test input from a multi-dimensional feature vector T= t1, t2tI with a reference work template R= r1, r2rj. It finds the function w(i) as shown in the below figure. In Speaker Recognition system Every input speech is compared with the utterance in the database .For each comparison, the exceed measure is calculated .In the measurements lower distance indicates higher similarity.Fig 7. . Dynamic Time WarpingGaussian mixture modelGaussian mixture model is the most commonly used classifier in speaker recognition system.It is a type of density model which comprises a number of component functions. These functions are combined to provide a multimodal density. This model is often used for data clustering. It uses an alternative algorithm that converges to a local optimum. In this method the distribution of the feature vector x is modeled clearly using mixture of M Gaussians.mui- represent the mean and covariance of the i th mixture. x1, x2xn, Training data ,M-number of mixture. The task is parameter mind which best matches the distribution of the training feature vectors given in the input speech. The well known method is maximum likehood estimation. It finds the model parameters which maximize the likehood of GMM. T herefore, the test data which gain a maximum score will signalize as speaker.Support Vector MachineSupport machine was proposed in 1990 and it is one of the best machine learning algorithms. This is used in many pattern classification problems. such as image recognition, speech recognition, text categorization, face detection and faulty card detection, etc. The basic image of support vector machine is to find the optimal linear decision surface based on the concept of structural risk minimization. It is a binary classification method. The decision surface refers the weighted combination of elements in a training dataset. These elements are called support vectors. These vectors define the boundary between two classes. In a binary problem +1 and -1 are taken as two classes. The size of the marge should be maximized to modify the boundary between two classes.The below example explains pattern classification by using SVM. In the fig 3(a), there are two different kinds of patterns t aken for process. A line is drawn to separate these two patterns. In the fig 3(b),by using a single line the patterns are separated, the patterns are presented in two dimensional space. The similar representation in one dimensional space in the fig 3(c), a summit can be used to separate patterns in one dimensional space. a woodworking plane that separates these patterns in 3-D space ,represented in the fig 3(d),is called separating hyper plane. . The next task a plane should be selected from the set of planes whose perimeter is maximum. The plane with the maximum margin i.e. perpendicular distance from the marginal line is known as optimal hyper plane or maximum margin hyper plane as shown in fig 3(f). The patterns that lie on the edges of the plane are called support vectorsWhile classify the patterns, there may outlast some errors in the representation, as shown in the fig 3(g), such types of errors are called soft margin. Sometimes ,these errors can be ignored to some thresho ld value. The patterns that can be easily separated using line or Plane are called linearly Separable patterns .Non-linear separable patterns (fig-j,k,l)are difficult to classify. These patterns are classified by using kernel functions. In order to classify non-linear separable patterns the original datas are mapped to higher dimensional space using kernel function.CONCLUSIONIn this paper we have explained about speaker recognition system and discussed about three major pattern classification techniques, Dynamic Time Warping, Gaussian mixture model and Support Vector Machine. SVM will work efficiently on firm length vectors. To implement SVM the input data should be normalized for better performance. In future, we have planned to implement these techniques in speaker recognition system and evaluate the performance. The performance of the models will also be evaluated by incrementing the amounts of training data.REFERENCES1 Campbell, J.P., Speaker Recognition A Tutorial, Proc. Of th e IEEE, vol. 85,no. 9, 1997, pp. 1437-1462.2 Sadaoki Furui., Recent advances in speaker recognition,Pattern Recognition Letters. 1997,18 (9) 859-72.3 Sakoe, H.and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition, acoustics,Speech, and Signal Processing, IEEE Transactions on heap 26, Issue 1, Feb 1978 Page 43 49.4 Lubkin, J. and Cauwenberghs, G., VLSI writ of execution of Fuzzy Adaptive Resonance and Learning Vector Quantization, Int. J. Analog interconnected Circuits and Signal Processing, vol. 30 (2), 2002,pp. 149-157.5 Reynolds, D. A. and Rose, R. C. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 1995, pp 7283.6 Solera, U.R., Martn-Iglesias, D., Gallardo-Antoln, A., pixelez-Moreno, C. and Daz-de-Mara, F, Robust ASR using Support Vector Machines, Speech Communication, Volume 49 Issue 4, 2007.7 Temko, A. Monte, E. Nadeu, C., Comparison of Sequence Discriminant Support Ve ctor Machines for Acoustic Event Classification, ICASSP 2006 Proceedings, 2006 IEEE foreign Conference on Volume 5, Issue , 14-19 whitethorn 20068 Shang, S. Mirabbasi, S. Saleh, R., A technique for DCoffset removal and carrier phase error allowance in integrated wireless receivers Circuits and Systems, ISCAS apos03. Proceedings of the 2003 International Symposium onVolume 1, Issue , 25-28 whitethorn 2003 Page I-173 I-176 vol.19 Vergin, R. OaposShaughnessy, D., Pre-emphasis and speech recognition lectrical and ready reckoner Engineering,Canadian Conference on Volume 2, Issue , 5-8 Sep 199510 Davis, S. B. and Mermelstein, P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. on Acoustic, Speech and Signal Processing, ASSP-28, 1980, No. 4.11 Sadaoki Furui., Cepstral analysis technique for automatic speaker verification, IEEE Trans. ASSP 29, 1981,pages 254-272.BIOGRAPHIESDr.E.Chandra received her B.Sc., from Bhar athiar University, Coimbatore in 1992 and received M.Sc., from Avinashilingam University ,Coimbatore in 1994. She obtained her M.Phil. In the area of Neural Networks from Bharathiar University, in 1999. She obtained her PhD distributor point in the area of Speech recognition system from Alagappa University Karikudi in 2007. She has whole 15 yrs of experience in teaching including 6 months in the industry. right away she is working as Director, Department of Computer Applications in D. J. Academy for managerial Excellence, Coimbatore. She has published more than 30 research papers in case, International Journals and Conferences in India and abroad. She has guided more than 20 M.Phil. Research Scholars. currently 3 M.Phil Scholars and 8 PhD Scholars are working on a lower floor her guidance. She has delivered lectures to various Colleges. She is a posting of studies member of various Institutions. Her research interest lies in the area of selective information Mining, Artificia l Intelligence, Neural Networks, Speech Recognition Systems, Fuzzy Logic and Machine Learning Techniques. She is an active and Life member of CSI, Society of Statistics and Computer Applications. Currently she is Management Committee member of CSI Coimbatore Chapter. K. Manikandan received his Bsc from Bharathidhasan University, Tiruchirappalli in1998 and received his MCA from Bharathiadsan University, Tiruchirappalli in 2001. He received M.Phil in the area of soft computing from Bharathiyar university, Coimbatore in 2004. He has 12 years of experience in teaching. Currently, he is working as a Assistant Professor, Department Of Computer Science, PSG College of arts and Science, Coimbatore and pursuing PhD in Bharathiar University, Coimbatore.He has presented research papers in National and International Conferences and published a paper in International Journal. His Research Interest is Soft calculation . He is Life a member of IAENG. He has guided more than 4 M.Phil Research Scho lars. Currently 3 M.Phil Scholars are working under his guidance. He has delivered lectures to various Colleges. M.S.Kalaivani received her BCA from P.S.G College of Arts and Science, Coimbatore, in 2005 and received her MCA from National Institute of Technology, Tiruchirappalli in 2008.She has 4 years of working experience at software industry. Presently, she is working as a Research Scholar, Department of Computer Science, P.S.G. College of Arts and Science, Coimbatore. Her research interests are Machine Learning and Fuzzy logic.

College essays

.

Friday, March 29, 2019

Speaker Recognition System Pattern Classification

No comments:

Post a Comment