Home Journals AMA_B Speaker Identification Using Auditory Modelling and Vector Quantization

JOURNAL METRICS

CiteScore 2019: 0.50 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2019: 0.117 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2019: 0.415 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

123.png

Speaker Identification Using Auditory Modelling and Vector Quantization

Konstantina Iliadi | Stefan Bleeck^*

Institute of Sound and Vibration Research, University of Southampton, SO17 1BJ, UK

Corresponding Author Email:

bleeck@soton.ac.uk

Received:

21 September 2017

| |

Accepted:

25 September 2017

| | Citation

60.02_01.pdf

OPEN ACCESS

Abstract:

This paper presents an experimental evaluation of different features for use in speaker identification (SID). The features are tested using speech data provided by the EUROM1 database, in a text-independent closed-set speaker identification task. The main objective of the paper is to present a novel parameterization of speech that is based on an auditory model called Auditory Image Model (AIM). This model provides features of the speech signal and their utility is assessed in the context of speaker identification. In order to explore the features that are more informative for predicting a speaker’s identity, the auditory image is used within the framework of cutting it into rectangles. Then, a novel strategy is incorporated for the enrolment of speakers, which is used for specifying the regions of the image that contain features that make a speaker discriminative. Afterwards, the new speaker-specific feature representation is assessed in noisy conditions that simulate a real-world environment. Their performance is compared with the results obtained adopting MFCC features in the context of a Vector Quantization (VQ) classification system. The results for the identification accuracy suggest that the new parameterization provides better results compared to conventional MFCCs especially for low SNRs.

Keywords:

Auditory image model, Speaker identification, Feature extraction

1. Introduction

2. Methodology

3. Speaker Identification in Quiet Conditions

4. Noise-Robust Speaker Identification

5. Conclusion

Acknowledgements

References

[1] R.D. Patterson, M.H. Allerhand, C. Giguere, Time-domain modelling of peripheral auditory processing: A modular architecture and a software platform, 1995, JASA, vol. 98, pp. 1890–1894.

[2] D.A. Reynolds, An overview of automatic speaker recognition technology, 2002, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP’02), vol. IV, pp. 4072-4075.

[3] P. Rose, Forensic Speaker Recognition, 2002, Taylor and Francis, Inc., New York.

[4] S. Bleeck, Ives, T. & Patterson, R.D. (2004). Aim-mat: The auditory image model in MATLAB. Acta Acustica, 90, pp. 781–787.

[5] R.D. Patterson, K. Robinson, J. Holdsworth, Complex sounds and auditory images. In: Auditory physiology and perception. Y. Cazals, L. Demany, K. Horner (eds.), 1992, Pergamon, Oxford, pp. 429–446.

[6] R.F. Lyon, M. Rehn, S. Bengio, T.C. Walters, G. Chechik, Sound retrieval and ranking using sparse auditory representations. Neural computation, 2010, vol. 22, pp. 2390–416.

[7] D. Chan, A. Fourcin, D. Gibbon, B. Granstrom, M. Huckvale, G. Kokkinakis, K. Kvale, L. Lamel, B. Lindberg, A. Moreno, J. Mouropoulos, F. Senia, I. Trancoso, C. Veld, J. Zeiliger, EUROM- A spoken language resource for the EU, 1995, Eurospeech'95 Proceedings of the 4th European Conference on Speech Communication and Speech Technology, Madrid, Spain, 18-21 September. vol. 1, pp. 867-870.

[8] X. Zhou, D. Garcia-Romero, R. Duraiswami, C. Espy-Wilson, S. Shamma, Linear versus Mel-frequency cepstral coefficients for speaker recognition, 2011, Proceedings of IEEE Workshop on ASRU, pp. 559-564.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Speaker Identification Using Auditory Modelling and Vector Quantization