Audio Engineering Society Singapore Section

>> Home
>> About Us
>> Contact Us
>> What's On
>> Section Reports
>> Job Postings
>> AES Inc.

PALA '96 - AES Seminar
"Application of Digital Signal Processing (DSP)
in Professional Audio"
Wednesday, 10 July 1996

reported by: Dr. Roland K C Tan
                  Secretary (Term 1995/96)

The AES Singapore Section organised a public forum on the "Applications of Digital Signal Processing (DSP) in Professional Audio" during the recent Pro Audio & Light Asia '96 (PALA '96) exhibition held at the Bay Suite, World Trade Centre, Singapore. It was a comprehensive 3-hour discussion session lasting from 2.00 pm to 5.00 pm on Wednesday, July 10, 1996. 

Featuring 2 international and 4 regional speakers, the panellist included David Robinson, vice-president of AES International Region, Senior vice-president of Dolby Laboratories, Inc., USA, and also the forum moderator; Dr. Roland Tan, Research Fellow in Audio Processing at Nanyang Technological University, Singapore; Craig Todd, Senior Member of the Technical Staff of Dolby Laboratories, Inc., USA; Stephen Low, Lecturer from the Electrical Engineering Department at Ngee Ann Polytechnic, Singapore; Eric Wong, Consulting Engineer of CCW Acoustics, Singapore; and David Burgess, Managing Director, Digital Audio Systems Pacific Ltd, Hongkong. 

It was a highly successful event judging from the overwhelming response the organiser received from the 127 guests and 10 members. The participants on that day comprised of audio R&D scientists and engineers, audio consultants, studio recording artistes, sales & marketing personnel, and academics. They were representatives from the local as well as foreign industries, government bodies, universities & polytechnics, and even the service industries. The organising committee was generally very pleased with the turnout and outcome of this event. Through this forum, it was hoped that the acronym, "DSP" would no longer be just a technical jargon understood only to a few confined within the laboratories or lecture rooms but rather, a common buzzword to everybody involved in the business of audio. 

Pictures    

(picture - At Singapore Section's exhibition booth (left to right): Dr. Roland Tan, Stephen Low, Eric Wong, David Robinson and Richt Teo - photograph by Ms. Leena Goh) Mr. David Robinson, Senior Vice-President (Technology) - Dolby Labs (USA) and also Vice-President AES International Region 1995/96 acts as the Moderator - photograph by Ms. Leena Goh. More than 130 members and guests attended the AES Singapore Public Forum on the Application of Digital Signal Processing (DSP) in Professional Audio. - photograph by Dr. Roland Tan.

Opening Address by David Robinson ... 

Robinson kicked-off the session by giving a brief history of the Audio Engineering Society (AES). The AES has a current membership of 11,300 in more than 40 countries world-wide. Founded since 1948 in New York with 150 members, the aims of the society are to increase and spread scientific and educational knowledge in the field of Audio Engineering. It is also to ensure maintenance of high professional standards among its members. Through its annual International Conventions, Regional Conventions, and Sections' activities, members have many opportunities to exchange their ideas. The Singapore Section was formed officially only recently in December 1995 with 20 members. To date, they have a total membership of 39 members.

The main objective of the forum is to generate enough interest among the participants on DSPs without boring them with very complex mathematical formulas. The forum was divided into three main parts: the first was a revision session for those familiar with the subject. It also served as a good introduction to those new in the field of DSPs. The idea was also to help the audience better understand and appreciate the many advantages and benefits that DSP technologies can offer over older analogue techniques. 

The second part of the forum discussed examples where DSP technologies have been applied successfully in pro-audio systems. The final part of the forum was a 20 minutes question-and-answer session that gave participants opportunities to exchange their ideas. 

DSP Primer by Dr. Roland Tan ...
During the first 45 minutes, Tan presented an overview of some of the fundamental DSP theories and its applications. The aim of digitisation of an analogue audio system is to increase the quality of recorded sound. However, to our disappointment, the term "Perfect Sound Forever" simply just does not exists. Noise distortion in analogue system has been replaced by quantization noise distortion in digital system.

The psychology of the human auditory system was described. From the Fletcher-Munson Curve plots shown in Figure 1, any sound below the 0-phon curve, or the Minimum Audible Field (MAF) curve, is not audible. This is the absolute threshold of hearing in quiet environment. These also tell us that sounds at frequency below 1 kHz or above 5 kHz are less sensitive. Whereas the ear is most sensitive in the mid-band region between 1 kHz and 5 kHz. The most intense sound we can hear without damaging our ears is about 120 dB. The plots also indicate that sounds are detectable over a dynamic range of more than 100 dB. Current digital audio format at 16-bit is therefore theoretically not sufficient since it can only achieve a dynamic range of around 96 dB. 

Simultaneous masking is a frequency domain phenomenon whereby a low-level signal can be made inaudible by a simultaneously occurring stronger signal if they are close enough to each other in frequency. This produces a masking threshold below which low-level signal will not be audible as shown in Figure 2. These imply that we can ignore audio signals below the masking threshold during coding. Besides simultaneous masking, temporal masking effects such as the pre- and post-masking effects also play an important role in human auditory perception. These occur when two sounds appear within a small interval of time. In fact, depending on the individual sound pressure levels, the stronger sound (masker) may mask the weaker one (maskee) even though the maskee precedes the masker! 

Theoretically, digitising of audio signal by considering Nyquist sampling theorem has no degradation. The theorem simply states that the sampling frequency must be greater than twice the highest frequency content in the analogue audio signal to be sampled. Aliasing distortion can be avoided in this case as shown in Figure 3. It is the subsequent process of quantization that caused a reduction in signal quality. 

Quantization distortion can be overcome by the use of additive dither noise as shown in Figure 4. Dither is a low-level white noise signal added to the input before quantization that makes the total quantization error behave like white noise. Digital dither can also be added to a digital signal prior to a re-quantization operation that allows a reduction in the number of bits to represent a signal. 

This circumstance arises when an audio signal quantized at 20-bit resolution for high-quality digital mastering, must be reduced to 16-bit for storing it on CD. The flat broadband dither noise is not subjectively optimum for audio signals. In fact as shown in Figure 5, noise shaping technique has been used to weigh the noise spectrum to match the ears' sensitivity at different frequencies so as to minimise the audibility of the dither noise. 

On spectral analysis, the Discrete Fourier Transform (DFT) as well as the Fast Fourier Transform (FFT) algorithms were discussed. Since audio signals do not remain "stationary" and that we do not have doubly infinite time to process a signal, short-time analysis technique must be employed. In short-time analysis, audio waveform are segmented into short duration between 10ms to 30ms called frames. Each frame are then analysed individually. A frame is basically a block audio samples within a "window", and is zero outside it. An overview on the Finite Impulse Response (FIR), shown here in Figure 6, and the Infinite Impulse Response (IIR) digital filters, shown in Figure 7, was also described. A comparison of their respective pros and cons were also highlighted. 

Final implementation of the DSP algorithms can be performed on either programmable DSP chips or hardwired chips such as ASICs and FPGAs devices. For hardwired DSPs, the computational architecture on which the algorithms will execute has not yet been defined. On the other hand, for programmable DSPs, the algorithms will be executed on a device with a fixed architecture using DSP code. The main advantage of using programmable DSP chips is that it give a fast time-to-market and do not need huge investments in design tools. That is provided, of course, that the production volumes are low and that the application does not have excessive performance requirements. For high-volume applications dedicated hardwired DSPs is preferred. Moreover, these allow one to have complete freedom to make it as powerful as desired. This is not the case for off-the-shelf programmable DSP chips. The choice between choosing a fixed-point or floating-point programmable DSP chip was compared. 

Before concluding, an overview of five examples where DSP has been applied in audio such as in standard low bit-rate codecs (PASC, ATRAC, MPEG, APT-X100, etc), digital loudspeaker equaliser, digital-crossover filter in loudspeaker system, studio digital audio effects, and Eureka-147 digital audio broadcasting (DAB) were shown. DAB is illustrated in Figure 8. In the next 10 years, better progress in DSP technologies could be seen and it would be more readily available. This would bring about an unpredictable advance in the pro-audio system of the near future.

Dolby's AC-3 Coder by Craig Todd ... 
During the next 45 minutes, Todd discussed about Dolby's AC-3 low bit-rate multi-channel audio coding. Two coding strategies were defined; lossless coding versus lossy coding. In lossless coding such as predictive coding, natural redundancy from the signals are removed and the required bit-rate depends only on the signal complexity. Predictive coders such as LPC and ADPCM, use past sample values of the audio signal to predict the next sample. On the contrary, lossy coding which is perceptually based considers the psychoacoustic characteristics of human hearing. Bit-rate is reduced by only coding those aspects of the signal which are of perceptual importance. Signal which are perceptually irrelevant are discarded. 

The Dolby's AC-3 is basically a 5.1 multi-channel perceptual audio coder that supports sampling rates of 32, 44.1, and 48 kHz. It is a multi-frequency band coder, shown in Figure 9, that considers psychoacoustic masking in both time and frequency domain. It uses transform coding strategy whereby a block of input samples is linearly transformed using a discrete transform algorithm into a set of transform coefficients. Basically, the high resolution transform filterbanks make use of the time-domain aliasing cancellation (TDAC) technique based on an alternate application of the modified discrete sine transform (MDST) and the discrete cosine transform (MDCT) techniques. Masking is exploited during periods of high bit demand in order to reduce the overall bit rate. 

The assignment of bits to individual bands is based on a psychoacoustic model of human hearing as shown in Figure 10. A hybrid backward/forward adaptive bit allocation algorithm is used that allows a reduction in the necessary bit rate for transmitting bit allocation side information. Here, calculations of the bit allocation routines are done in both the encoder and decoder. Certain parameters can be transmitted to the decoder by which the actual psychoacoustic model or bit allocation routine in the decoder can be changed by the encoder. Of course, the final bit allocations of both the encoder and decoder still remain the same.  

There are several important factors that must be addressed in coder design in order to make a system genuinely useful to consumers. These consumer features played a large part in the design of AC-3 coder. For example in downmixing, AC-3 decoder can produce the downmix needed by each listener from multi-channel sound to mono, stereo, or even Dolby Surround sound format. Similarly, loudness uniformity is another important factor that must be considered in order that any undesirable level variations between different programs are eliminated. AC-3 used the dialogue level as a metric for establishing loudness uniformity. An indication of the dialogue level is included in the bit stream thus allowing the levels to be normalised at the decoder. A mean to control the dynamic range is also necessary since all coding systems can easily deliver extreme amounts of dynamic range. In addition, the dynamic range needed by different listeners varies depending on whether it is for home theatre enthusiast or for portable TV users. 

Coders such as the MPEG Layer II and Dolby's AC-3 codecs are becoming popular in the market because they offer good performance and cost. New coders are still being developed such as the MPEG-NBC codec. Todd concluded his discussion by listing four most important points which must be considered on the choice of a coder. They are performance, cost, practical consumer features, and inter-operability. 

Loudspeaker Non-Linear Error Correction by Stephen Low ...
In the next 20 minutes session that followed, Low introduced the effects of non-linearity in loudspeaker systems and discussed how these errors can be corrected using DSP technologies. Non-linearity is the result when harmonic distortions are generated at the output besides the fundamental component when only a single tone input signal is used as a test. In addition, the magnitude of the distortion products has no direct relation to the magnitude of the input signal. 

Using the moving-coil loudspeaker as an illustration shown here in Figure 12, the three main contributions to its non-linearities can be identified. Firstly, non-linearities can be found in the motor & motor magnet system caused by variations in the bx l products with cone displacement, a change in the voice coil inductance and induced emf with the cone excursion, and heating effects. Secondly, there are mechanical non-linearities due to suspension stiffness of the loudspeaker spider and outer rim, mechanical clipping, compression, & hysteresis effect of the voice coil, the cone material and design of the closed-box system. The third factor which contributes to its non-linearities are the Doppler distortion effect and non-ideal room acoustical conditions.

In loudspeaker error correction using DSPs, the mechanical, motor and motor magnet system non-linearities are measured. A digital "mirror" filter is then created to pre-distort the input signal in such a manner that the real-time changes in the non-linear behaviour of the loudspeaker can be tracked and fed-back to change the properties of the filter. As shown in Figure 13, artificial neutral network (ANN) algorithm can be used to follow the non-linear behaviour of the loudspeaker and generate the "mirror" filter corrections. The final convolution of the pre-distorted input signal with that of the actual non-linear loudspeaker subsequently produced a linear output without the undesirable distortions. 

DSP in Sound Reinforcement by Eric Wong and David Burgess ...
During the next 30 minutes of the discussion session, Wong and Burgess talked about small-scale and large-scale applications of DSP in sound reinforcement. Using DSP instead of analogue technology in sound reinforcement utilises less rack space, allow multiple configurations and settings, and provides better security. However, it is generally difficult, if not impossible, to swap equipment during 'live' performance situation. It is also difficult to check if the settings have been tempered with and if each function in the DSP is working properly as desired. 

Small-scale application as shown in Figure 14 simply has equipment in one-rack unit that have the following functions; equaliser, signal delay, crossover, compressor/limiter, multiple configuration, and remote control. As shown in Figure 15, large-scale application is basically very similar except that it has a higher number of input and output connections. Systems can also be linked in a network environment and recorded materials stored on a common database. 

In future voice announcement control system (ACS), shown in Figure 16, installed at the airport for example, it would be interconnected to the flight information display system (FIDS) and other communication systems. This is illustrated in Figure 17. The user-defined programmable software incorporated provides greater control and flexibility in terms of DSP power. Dynamic noise sensing devices can also be found in large-scale applications to automatically adjust the level of the output speakers according to the environmental noise level. 

Finally ... 
The AES Singapore Section would like to take this opportunity to thank their main sponsor, Mr. Anthony Chan, Regional General Manager of IIR Exhibitions Pte Ltd, Singapore. To Ms. Darell Lee, Marketing Manager, Ms. Tan Seok Hoon and Ms. Rosalind Ng, Senior Project Managers, and Ms. Lyn Chew, Project Co-ordinator, for extending their kind invitations to the section and also for their excellent supports given. 


Copyright 1996 AES Singapore Section