A Psychoacoustic Model and a Filter Bank Design Using Optimization for Speech Compression

A Psychoacoustic Model and a Filter Bank Design Using Optimization for Speech Compression

Talbi MouradMedS. Bouhlel

Laboratoire des Semi-Conducteurs, Nanostructures et Technologie Avancée, Center of Researches and Technologies of Energy of Borj Cedria, Tunis 952050, Tunisia

Sciences Electroniques, Technologie de l'Information et Télécommunications (SETIT), Sfax BP 11693029, Tunisia

Corresponding Author Email: 
talbi1969@yahoo.fr
Page: 
80-87
|
DOI: 
https://doi.org/10.18280/ama_b.610205
Received: 
7 April 2018
|
Accepted: 
8 June 2018
|
Published: 
30 June 2018
| Citation

OPEN ACCESS

Abstract: 

In this paper we propose a new speech compression technique employing psychoacoustic model and a general approach for Filter Bank Design using optimization. This technique is inspired from an audio compression technique using psychoacoustic model and a Modified Discrete Cosine Transform (MDCT) filter banks of 32 filters. In fact, in this proposed approach, we have used Uniform/Non-Uniform Filter Bank (which is designed using optimization) instead of a MDCT filter banks of 32 filters. The two techniques are evaluated and compared with each other by computing bits before and after compression. They are tested and applied to different speech signals. The simulation results obtained from the computation of the compressed files size and the Compression Ratios (CR), show that the proposed technique outperforms the second one. In term of perceptual speech quality, the outputs speech signals of the proposed compression system are with good quality. This is justified by the computation of SNR (Signal to Noise Ratio), PSNR (Peak Signal to Noise Ratio), NRMSE (Normalized Root Mean Square Error) and PESQ (Perceptual evaluation of speech quality). We have also compared the proposed technique to one previous research work which is a speech compression technique based on Discrete Wavelet Transform (DWT) and integrating a Voice Activity Detection (VAD) Module. This comparison is also based on the computation of SNR, PSNR, NRMSE, PESQ and CR and the obtained results show that the proposed technique outperforms this third technique based on DWT and VAD.

Keywords: 

speech compression, psychoacoustic model, Filter Bank, optimization, bits before/bits after compression

1. Introduction
2. The Proposed Technique
3. Results and Discussion
4. Conclusion
Acknowledgment
  References

[1] Xie N, Dong G, Zhang T. (2011). Using lossless data compression in data storage systems: not for saving space. IEEE Transactions on Computers 60(3): 335–345.

[2] Gibson JD. (2005). Speech coding methods, standards, and applications. IEEE Circuits and Systems Magazine 5(4): 30–49.

[3] Riadh A, Salim S, Said G, Ali C, Taleb Ahmed A. (2014). Novel detection algorithm of speech activity and the impact of speech codecs on remote speaker recognition system. WSEAS Transactions on Signal Processing. E-ISSN: 2224-3488.

[4]  Junejo N, Ahmed N, Unar MA, Rajput AQK. (2005). Speech and image compression using discrete wavelet transform. In IEEE Symposium on Advances in Wired and Wireless Communication, pp. 45–48.

[5] Arif M, Anand RS. (2012). Turning point algorithm for speech signal compression. International Journal of Speech Technology. 10.1007/s10772-012-9151-7, 2012.

[6] Gersho A. (1992). Speech coding. In A. N. Ince (Ed.), Digital Speech Processing, pp. 73–100. Boston: Kluwer Academic.

[7] Gersho A. (1994.). Advance in speech and audio compression. Proceedings of the IEEE 82(6): 900–918.

[8] Agbinya JI. (1996). Discrte wavelet transform techniques in speech processing. In IEEE Tencon Digital Signal Processing Applications Proceedings, New York: IEEE, pp. 514–519.

[9] Gershikov E, Porat M. (2007). On color transforms and bit allocation for optimal subband image compression. Signal Processing. Image Communication 22: 1–18.

[10] W.A.V.S. Compression. http://www.aamusings.com/project documentation/wavs/index.html 

[11] http://www.360doc.com/content/05/0810/17/641_5382.shtml

[12] Giron-Sierra JM. (2017). ‘Digital signal processing with matlab examples. http://www.springer.com/us/book/9789811025334

[13] Moazzen I, Agathoklis P. (2014). A general approach for filter bank design using optimization. Technical Report. http://www.ece.uvic.ca/~imanmoaz/publications.htm

[14] Aloui N, Boussalmi S, Cherif A. (2015). Optimized speech compression algorithm based on wavelets techniques and its real time implementation on DSP. I.J. Information Technology and Computer Science 3: 33-41.

[15] Suryavanshi HE, et al. (2013). Digital image watermarking in wavelet domain. International Journal of Electrical and Computer Engineering (IJECE) 3(1): 1– 6.

[16] Narmadha D, et al. (2014). An optimal HSI image compression using DWT and CP. International Journal of Electrical and Computer Engineering (IJECE) 4(3): 411– 421.

[17] Boussalmi S, Aloui N, Cherif A. (2016). Adaptive speech compression based on discrete wave atoms transform. International Journal of Electrical and Computer Engineering (IJECE) 6(5): 2150-2157.

[18] Talbi M, Barnoussa C, Cherif A. (2013). Speech compression based on psychoacoustic model and a general approach for filter bank design using optimization. ACIT’2013.

[19] Talbi M. (2017). Speech enhancement based on stationary bionic wavelet transform and maximum a posterior estimator of magnitude-squared spectrum. International Journal of Speech Technology 20(1): 75–88.

[20] Chattopadhyaya A, Chattopadhyay S, Bera JN, Sengupta S. (2016). Wavelet decomposition based skewness and kurtosis analysis for assessment of stator current harmonics in a PWM – fed induction motor drive during single phasing condition. Advances in Modelling and Analysis B 59(1): 1-14.

[21] Sun Q, Zhao X. (2016). A new subspace based speech enhancement algorithm with low complexity. Advances in Modelling and Analysis B 59(1): 164-176.