UOW

Site Index | Site Map

 

Home

About TITR

Research Centres

Management Structure

Conferences

Location

Search

Contact Us

Title:
Scalable Speech Compression

Presenter:
Jason Lukasiak

Date:
November 29, 2002

Abstract:
Due to both the digitisation of most communication channels and an ever-increasing demand for mobile communication services, the amount of traffic generated by digitised speech signals continues to grow rapidly. To accommodate this increased traffic load using the finite bandwidth available for speech communication channels, it is necessary to develop speech compression algorithms that can dynamically scale to traffic and user demands. These scalable compression algorithms must be capable of dynamically altering the bit rate required for transmission, whist smoothly and gradually varying the synthesized speech subjective quality with the changes in bit rate. To further increase the throughput of the communication channel, the scalable algorithm should operate in the lower range of bit rates currently used for speech compression (i.e. 2-8kbps).

We propose a number of scalable speech coding techniques that lead to the development of a single coding algorithm that is capable of scalable operation. Firstly, via a thorough review of current literature, the characteristics of existing speech compression algorithms that limit scalable operation between bit rates of 2 and 8kbps are identified. The major limiting characteristics are identified as the existence of a distinct barrier at 4kbps where parametric coders dominate below and waveform coders dominate above the large delay requirements for current low rate coding algorithms.

A method that exploits the simultaneous masking property of the human ear in a linear predictive filter is proposed. The proposed method modifies the linear predictive filter to remove more of the perceptually important information from the input signal than a standard linear predictive filter. This characteristic is shown to improve the subjective speech quality of low-rate linear prediction based speech coders.

To enable the pitch cycle redundancies of the speech signal to be exploited in the coding algorithm, without introducing excessive algorithmic delay, a novel low delay method for segmenting the speech into non-overlapped pitch length subframes is proposed. This method requires only a single frame of speech and locates the pitch pulses by selecting the pulse locations in a closed loop function. The proposed segmentation is shown to produce a much more accurate pitch track in transient section of the speech signal, than the pitch track produced by traditional autocorrelation based pitch detectors. Also, as the pitch length subframes are not overlapped, the segmentation supports closed loop analysis by synthesis modeling of the signal. A number of Low delay decomposition techniques are proposed which decompose the speech into perceptually different components and allow scalable reconstruction of the speech signal. The preferred technique performs the decomposition in a closed loop function, which allows quantisation errors to be accounted for in the decomposition process.

The proposed scalable techniques are combined to produce a scalable algorithm that operates at a range of bit rates from 2-8kbps. The proposed algorithm produces synthesized speech whose subjective quality varies in a perceptually meaningful manner, as the operating bit rate is varied. A key feature of the proposed algorithm is the ability to merge from a time asynchronous parametric coder at low rates, to a time synchronous waveform coder at higher bit rates. The coder also requires only a single frame of algorithmic delay (30ms) for operation. Subjective results presented indicate that the scalable coder produces subjective speech quality that is comparable with that achieved for fixed rate standardized coders at each of the tested bit rates.

Back to:

 
^ Back to Top
 
TITR,
University of Wollongong
Copyright & Disclaimer
Feedback: webmaster@titr.uow.edu.au