List of articles (by subject) Speech Processing


    • Open Access Article

      1 - Language Model Adaptation Using Dirichlet Class Language Model Based on Part-of-Speech
      Ali Hatami Ahmad akbari Babak Nasersharif
      Language modeling has many applications in a large variety of domains. Performance of this model depends on its adaptation to a particular style of data. Accordingly, adaptation methods endeavour to apply syntactic and semantic characteristics of the language for langua More
      Language modeling has many applications in a large variety of domains. Performance of this model depends on its adaptation to a particular style of data. Accordingly, adaptation methods endeavour to apply syntactic and semantic characteristics of the language for language modeling. The previous adaptation methods such as family of Dirichlet class language model (DCLM) extract class of history words. These methods due to lake of syntactic information are not suitable for high morphology languages such as Farsi. In this paper, we present an idea for using syntactic information such as part-of-speech (POS) in DCLM for combining with one of the language models of n-gram family. In our work, word clustering is based on POS of previous words and history words in DCLM. The performance of language models are evaluated on BijanKhan corpus using a hidden Markov model based ASR system. The results show that use of POS information along with history words and class of history words improves performance of language model, and decreases the perplexity on our corpus. Exploiting POS information along with DCLM, the word error rate of the ASR system decreases by 1.2% compared to DCLM. Manuscript profile
    • Open Access Article

      2 - Instance Based Sparse Classifier Fusion for Speaker Verification
      Mohammad Hasheminejad Hassan Farsi
      This paper focuses on the problem of ensemble classification for text-independent speaker verification. Ensemble classification is an efficient method to improve the performance of the classification system. This method gains the advantage of a set of expert classifiers More
      This paper focuses on the problem of ensemble classification for text-independent speaker verification. Ensemble classification is an efficient method to improve the performance of the classification system. This method gains the advantage of a set of expert classifiers. A speaker verification system gets an input utterance and an identity claim, then verifies the claim in terms of a matching score. This score determines the resemblance of the input utterance and pre-enrolled target speakers. Since there is a variety of information in a speech signal, state-of-the-art speaker verification systems use a set of complementary classifiers to provide a reliable decision about the verification. Such a system receives some scores as input and takes a binary decision: accept or reject the claimed identity. Most of the recent studies on the classifier fusion for speaker verification used a weighted linear combination of the base classifiers. The corresponding weights are estimated using logistic regression. Additional researches have been performed on ensemble classification by adding different regularization terms to the logistic regression formulae. However, there are missing points in this type of ensemble classification, which are the correlation of the base classifiers and the superiority of some base classifiers for each test instance. We address both problems, by an instance based classifier ensemble selection and weight determination method. Our extensive studies on NIST 2004 speaker recognition evaluation (SRE) corpus in terms of EER, minDCF and minCLLR show the effectiveness of the proposed method. Manuscript profile
    • Open Access Article

      3 - Speech Emotion Recognition Based on Fusion Method
      Sara Motamed Saeed Setayeshi Azam Rabiee Arash  Sharifi
      Speech emotion signals are the quickest and most neutral method in individuals’ relationships, leading researchers to develop speech emotion signal as a quick and efficient technique to communicate between man and machine. This paper introduces a new classification meth More
      Speech emotion signals are the quickest and most neutral method in individuals’ relationships, leading researchers to develop speech emotion signal as a quick and efficient technique to communicate between man and machine. This paper introduces a new classification method using multi-constraints partitioning approach on emotional speech signals. To classify the rate of speech emotion signals, the features vectors are extracted using Mel frequency Cepstrum coefficient (MFCC) and auto correlation function coefficient (ACFC) and a combination of these two models. This study found the way that features’ number and fusion method can impress in the rate of emotional speech recognition. The proposed model has been compared with MLP model of recognition. Results revealed that the proposed algorithm has a powerful capability to identify and explore human emotion. Manuscript profile
    • Open Access Article

      4 - Long-Term Spectral Pseudo-Entropy (LTSPE): A New Robust Feature for Speech Activity Detection
      Mohammad Rasoul  kahrizi Seyed jahanshah kabudian
      Speech detection systems are known as a type of audio classifier systems which are used to recognize, detect or mark parts of an audio signal including human speech. Applications of these types of systems include speech enhancement, noise cancellation, identification, r More
      Speech detection systems are known as a type of audio classifier systems which are used to recognize, detect or mark parts of an audio signal including human speech. Applications of these types of systems include speech enhancement, noise cancellation, identification, reducing the size of audio signals in communication and storage, and many other applications. Here, a novel robust feature named Long-Term Spectral Pseudo-Entropy (LTSPE) is proposed to detect speech and its purpose is to improve performance in combination with other features, increase accuracy and to have acceptable performance. To this end, the proposed method is compared to other new and well-known methods of this context in two different conditions, with uses a well-known speech enhancement algorithm to improve the quality of audio signals and without using speech enhancement algorithm. In this research, the MUSAN dataset has been used, which includes a large number of audio signals in the form of music, speech and noise. Also various known methods of machine learning have been used. As well as Criteria for measuring accuracy and error in this paper are the criteria for F-Score and Equal-Error Rate (EER) respectively. Experimental results on MUSAN dataset show that if our proposed feature LTSPE is combined with other features, the performance of the detector is improved. Moreover, this feature has higher accuracy and lower error compared to similar ones. Manuscript profile
    • Open Access Article

      5 - A New VAD Algorithm using Sparse Representation in Spectro-Temporal Domain
      Mohadese  Eshaghi Farbod Razzazi Alireza Behrad
      This paper proposes two algorithms for Voice Activity Detection (VAD) based on sparse representation in spectro-temporal domain. The first algorithm was made using two-dimensional STRF (Spectro-Temporal Response Field) space based on sparse representation. Dictionaries More
      This paper proposes two algorithms for Voice Activity Detection (VAD) based on sparse representation in spectro-temporal domain. The first algorithm was made using two-dimensional STRF (Spectro-Temporal Response Field) space based on sparse representation. Dictionaries with different atomic sizes and two dictionary learning methods were investigated in this approach. This algorithm revealed good results at high SNRs (signal-to-noise ratio). The second algorithm, whose approach is more complicated, suggests a speech detector using the sparse representation in four-dimensional STRF space. Due to the large volume of STRF's four-dimensional space, this space was divided into cubes, with dictionaries made for each cube separately by NMF (non-negative matrix factorization) learning algorithm. Simulation results were presented to illustrate the effectiveness of our new VAD algorithms. The results revealed that the achieved performance was 90.11% and 91.75% under -5 dB SNR in white and car noise respectively, outperforming most of the state-of-the-art VAD algorithms. Manuscript profile