0% found this document useful (0 votes)
133 views29 pages

Understanding Acoustics and Audio Systems

Uploaded by

Adityan G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views29 pages

Understanding Acoustics and Audio Systems

Uploaded by

Adityan G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SUB: MULTIMEDIA TECHNOLOGY Class: III BSc.

, (CS&IT)

Unit – 3
1) Acoustics, Nature of Sound Waves, Fundamental Characteristics of Sound
2) Microphone, Amplifier, Loudspeaker, Audio Mixer, Digital Audio, Synthesizers
3) MIDI, Basics of Staff notation, Sound Card, Audio Transmission
4) Audio File Formats &CODECs, Audio Recording Systems, Audio and Multimedia
5 Voice Recognition and Response, Audio Processing software

ACOUSTICS
Sound is a form of energy similar to heat and light. Sound is generated from
vibrating objects and can flow through a material medium from one place to another.
During generation a kinetic energy of the vibrating body is converted to sound energy.

Acoustics is the branch of science dealing with


the study of sound and is concerned with the generation
transmission and reception of sound waves.
There are several types they are:

i) Aero-acoustics : Concerned with how gas flows


produce sound and has application in aeronautics.
ii) Architectural : Concerned with the study of sound in buildings like halls and
auditoriums.
iii) Bioacoustics : Concerned with a sound made by animals like elephants,
whales, Bat….etc.
iv)Bio-medical: Concerned with the study of sound in medicine. Ex: ultrasonography.
v) Psycho-acoustics : Concerned with the hearing, perception, and localization of
sound related to human beings.
vi) Physical Acoustics: Concerned with interaction of sound with materials and fluid,
Ex: How shock to waves travel through the solid and liquid portions of the earth’s
crust.
vii) Speech communication: Concerned with the production, analysis, transmission,
and recognition of speech.
viii) Ultrasonic: Concerned with the study of high frequency sounds beyond human
hearing range.
ix) Musical acoustics: Concerned with the study of sound in relation to musical
instruments.
NATURE OF SOUND WAVES
It is an alternative in pressure, particle displacement or particle velocity
propagated in an elastic material. As sound energy propagates through the material

1
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

medium, it sets up, alternate regions of compression and refraction by shifting the
particles of the medium. Example: Molecules of air.

Two types of waves


1) Longitudinal 2) Mechanical

1) Longitudinal - Direction of propagation of time is same as the direction along with


medium of particles oscillate.

2) Mechanical- It means compressed and expended like spring. On compression, the


frequency of sound increases, and it appear high pitched to our ears. While on
expansion, the frequency decreases making it appear more dull and flat.

FUNDAMENTAL CHARACTERISTICS OF SOUND


Sounds that we hear can be broadly classified into three categories: speech,
music, and environmental sounds. Speech is anything uttered by a human being and
generating from the human voice box. Music originates from the musical instruments
like guitar, flute, violin…etc. The sound wave is associated with the following four
physical characteristics are as Amplitude, Frequency, waveform and speed.

i) Amplitude ii) Frequency


Amplitude of a wave is the This measures the number of
maximum displacement of a particle in vibrations of a particle in the path of a wave
the path of a wave and it is the peak in one second. The physical manifestation
height of the wave. Subsequently, We of frequency of a sound wave is the pitch
will use the term “amplitude” to mean sound.
particle displacement.

2
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

iii) Waveform - It indicates the actual


shape of the wave when represented
pictorially. Shapes of the waves can be
sinusoidal, square, triangle….etc. The
physical manifestation of waveform is
the quality or timbre of sound.

iv) Speed- A sound wave is also characterized by its speed. The speed of sound
depends on the medium through which the sound travels and the temperature of the
medium but not on the pressure. The speed is about 340 m/s in air and 1500 m/s in
water.

COMPONENTS OF AN AUDIO SYSTEM (Microphone, Amplifier, Loudspeaker,


and Audio Mixer)

The microphone converts


the environmental sound into
electrical form, i.e., conversion
of sound energy into electrical
energy. Once converted the
electrical signals can be
recorded onto magnetic material
like audiotape in an audio
recording system. The speaker
functions just opposite to that of
the microphone, i.e., it converts
electrical energy back into Sometimes, the speaker can also amplify the
sound energy. electrical signals before converting them,
producing a louder sound.

1) MICROPHONE
A microphone records sound by converting the acoustic energy into electrical
energy.
Based on the constructional features,
microphones may be of two types:
 Moving coil type
 Condenser type
Based on the functional features,
microphones may be divided into three types:
1. Omni-directional
2. Bi-directional
3. Uni-directional

3
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

i) Moving Coil Type: ii) Condenser Type:

A moving coil microphone In this type, diaphragm is actually one


consists of a thin metallic or rubber sheet slate of a capacitor. The incident sound
called a Diaphragm which is flexible and on the diaphragm moves the plate
free to vibrate. An attached coil or wire thereby changing the capacitance and
mounded close to and touching the generating a voltage V. C=Q/V, Where
diaphragm. A Magnet produces a magnetic Q be the charge on the capacitors and C
field that surrounds the coil. As sound be the potential difference between
impinges on the diaphragm causing it to them. Voltage source, V remains
vibrate, it also causes moment of the coil constant.
within the magnetic field.

 An omni-directional
microphone - It is equally
sensitive to sounds coming
from all directions.
 A bi-directional
microphone – It is
sensitive sounds coming
from two directions: the
front an/d rear.
 A uni-directional
microphone – It is
designed to record sound
from a single source, e.g., a
single individual speaking.

A polar plot of a microphone is a


graph plotting the output level of the
microphone against the angle at which
the incident sound was produced. The
polar plot is heart-shaped, due to which
such microphones are also known as
cardiod microphones.

4
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

2) AMPLIFIER
The amplifier takes power from a power supply and uses the energy to produce
an output electrical signal which has the same shape as the input signal but larger in
amplitude.
Amplifier circuits are designated as A, B, AB, and C for analogue design and D
and E for digital design. There are 5 types of class amplifiers. They are:
i) Class-A:
Class-A amplifiers use 100% of the input cycle for generating the
output. There are not very efficient, a theoretical maximum of 50% efficiency is
obtained and are usually used for small signal levels.
ii) Class-B:
Class-B amplifiers only use half of the input cycle for amplification.
Though they produce a large amount of distortion, these amplifiers are more efficient
than class-A because the amplifying element is switched off during half of the cycle.
iii) Class-C:
Class-C amplifiers use less than half of the input cycle for amplification.
Though they produce a huge amount of distortion, they are most efficient. One way of
reducing distortion is to introduce a negative feedback.
iv)Class-D:
Class-D digital amplifiers use a series of transistors as switches. The
input signal is sampled and converted digital pulses using an ADC.
v) Class-E:
Class-E digital amplifiers use Pulse Width Modulation (PWM) to
produce output waves whose widths are proportional to the desired amplitudes.

5
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

3) LOUDSPEAKER
A loudspeaker is a device that converts
electrical energy back to acoustic energy and
therefore, functions just opposite to that of a
microphone. A loudspeaker is divided into
smaller units each of which are tuned for a
small frequency range through the use of
filters known as crossover circuits,
appropriate frequencies and filtered from the
input signal and fed to corresponding units.

These units are the following:


 Woofers handle low frequencies, ranging from 20 Hz to 400 Hz. Such low
frequency sounds are known as bass.
 Mid-range speakers are designed to handle middle frequency ranges
between 400 Hz and 4 kHz.
 Tweeters are designed to handle high frequency ranges between 4 kHz and
20 kHz. Such high-frequency sounds are known as treble.
 Modern speaker systems often include a subwoofer for handling the very
low frequencies below 100Hz.

4) AUDIO MIXER
In professional studios, multiple
microphones may be used to record
multiple tracks of sound at a time.
EX: Recording performance of an
orchestra. Controls are also provided
for adjusting the overall volume and
tempo of the audio, as well as for
providing special effects like chorus,
echo, reverb, panning.

DIGITAL AUDIO
Analog quantity is converted to digital from through the processes of sampling
quantization and cord word generation. The Nyquist Sampling Theory sampling
frequency needs to be twice the input frequency. Human hearing range from 20 Hz to
20 kHz.
Sampling Frequency- High frequency Range ->44 to 48KHzalways sample at 44 KHz

F=2f
Example: Recording Human Speech
Then data rate D of the digitized audio signal is the amount of data flowing/second in
bits/second as, D=f.b.c
Let T be the duration of audio in seconds as, N=F.T
6
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

Let S be the file size of the audio on bits , S=D.T=f.b.c=2f.b.c.t


Converting to bytes gives us (42336000/8) = 5292000 by which is equal to
(529200/1024) = 5/68KB.
Aliasing - It is a consequence of violating the sampling theorem, the highest audio
frequency in a sampling system must be less than or equal to the Nyquist frequency.
Streaming Audio - It is used for downloading files on the internet. The music begins
to play as soon as a buffer memory on the receiving device fills up, while the
remaining portion of the audio continues to be downloaded in the background.
SYNTHESIZERS
Synthesizers are electrical instruments that allow us to generate digital samples
of sounds of various instruments synthetically. There are many types, some of them
are,
1) FM Synthesizer
FM synthesizers generate sound by
combining elementary sinusoidal tones to
build up a note having the desired
waveform.

2) Wavetable Synthesizer
It produced sound by retrieving high quality digital recordings of actual
instruments from memory and playing them on demand. Modern synthesizers are
generally of wavetable type.

3) Polyphony
It refers to ability of playing more than one note at a time. Polyphony is
generally measured or specified as a number of notes or voices. Most of the earth
music synthesizers were monophonic, meaning that they cloud only play on note at a
time.
4) Multi-Timbral
A synthesizer is said to be multi-timbral, if it is capable of producing two or
more different instrument sounds simultaneously.

Musical Instrument Digital Interface (MIDI)


The musical instrument digital interface is protocol or set of rules for
connecting digital synthesizers to PC.
MIDI Hardware- MIDI makes use of a special file conductor cable to connect the
synthesizer ports. The adapter has a one side the familiar 25 pin PC serial connector.

7
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

MIDI Connections
The MIDI data stream is usually originated by a MIDI controller, or a MIDI
sequencer. A MIDI controller is a device that is played as an instrument, like a
keyboard.
a) The instruments might be temporarily stored in another device called in
MIDI sequences which allows MIDI data sequences to be captured, stored, editing,
combined, and replayed.
b) Each sound modules can be configured to play a specific part of the music
Example: specific instrument sound like drums or piano.

8
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

It depicts a PC based MIDI systems, where the music composition is done


using software instead of a keyboard.
MIDI Messages
MIDI based instructions are called messages. The MIDI Messages constitute an
entire music description language in binary form. The single physical MIDI channel
is divided into 16 logical channels by the inclusion of a 4 bit channel. The messages
carry the information on what instruments to play in which channel and how to play
them. Each message consists of two or three bytes: first is the status byte which
contains operation to be performed and the channel number which is to be affected.
The other two bytes are called data bytes provide additional parameters on how to
perform the indicated operation. MIDI messages are classified as two types,
1) Channel Messages 2) System Messages
1) Channel Messages
Channel Messages are apply to a specific channel and the channel number is
included in the status byte for these messages. It may be further classified as either
channel voice messages or channel mode messages.
i) Channel Voice Messages
These carry musical performance data, and these messages comprise most of
the traffic in a typical MIDI data stream. The messages in this categories includes,
a) Note on: In MIDI system the activation of a particular note and the release of the
same note are considered two separate events.
b) Note off: When the key is released the keyboard instrument are controller will send
a note of message which also includes databytes for the key number and for the
velocity with which the key was released.
c) After Touch: Some MIDI Keyboard instruments have the ability to sense the
amount of pressure which is being applied to the key while they are depressed.
d) Pitch Bend: The pitch bend change message is normally sent from a keyboard
instrument in response to changes in position of the pitch bend wheel.
e) Program Change: It tells the synthesizer which patch number sound should be
used for a particular MIDI channel.
f) Control change: It is used to control a wide variety of functions in a synthesizer.

9
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

ii) Channel Mode Messages


These affect the way a synthesizer responds to MIDI data. Controller number
121 is used to reset all controllers.
2) System Message
System messages are not channel specific and no channel number is indicated
in their status bytes. It is classified as being system common messages and system
real-time messages.
System Common Messages – This include the song select message that can store and
recall a number of different songs, and the song position pointer is used to start and
playback of a song at some point other than at the beginning.
System Real-time Messages – This are used to synchronize all MIDI clock based
equipment within a system such as sequences and drum machines include the timing
clock message.
MIDI File Format:
The MIDI specifications made provisions to save synthesizers audio in a
separate file format called MIDI having the extension as .MIDI.
General MIDI Specifications
It defines a standard patch map that should be used by all conforming
instruments.
PC# Family
1-6 Piano
7-16 Chromatic
17-24 Organ
25-32 Guitar
BASICS OF STAFF NOTATION

A Staff is simply five horizontal parallel lines on which we write music. It does
nothing by itself.
Fig:Staff

A note is an open dot or letter O turned on its side. It can have a tail and/or flag
depending on its value. A note represents a single tone and a series of notes represents

10
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

a melody which we play on an instrument. Notes are written on a staff to record a


melody on paper.

Clefs provide the reference point


needed to know what the notes on the
staff mean. There are 3 types: G, F and K.
Where G clef are more commonly
consider are known as treble clef. The G
clef wraps around the G line place on the
lowest line of the staff.
Fig: Clef

There are 12 notes to western music. They are labelled using the first seven
letters of the alphabet. The remaining five notes are labelled by modifying this letter
by adding the word sharp# or flat (b) to it. The twelve notes of the western scale are
called (C, C#/D1, D, D#/E1, E, F, F#/G1, G, G#/A1, A, A#/B1 and B) After that they
keep repeating(C, C#/D1, etc).

Sharp (#) refers to a key directly next to and above, Flat(b) refers to the key directly
next to and below is called C-sharp(C#) and/or D-Flat(Db). Wrting black keys on the
staff is a easy as know, simply put the flat (b), or sharp (#), before the note. A note
that is not sharp, nor flat is called natural. When referring to note (A), it is assumed
that you mean (A) natural.
Music is made up of short and long notes. We measure the length of a note by
counting. The combination of notes into patterns is called Rhythm. Silence or rest is
also a part of music. The time signature denotes how many beats per measure, and

11
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

what note gets one beat. It is composed of two numbers written in fraction form. The
top number indicates how many beats to measure, the bottom tells us what type of
note.

There are two things to note:


1) The repetition of the notes on the keyboard and
2) The long name for the little black keys.
Each note on a keyboard had its own space on the staff. The G note lies on the
G line, The A note lies in the space above, F in the space below, Middle C which draw
a small line called a leger line to extend the staff and draw the note on it. The lines of
the staff from botton to top are E, G, B, D, F, A and the spaces of the staff are F, A, C,
E.
We introduce 7 tones of
the western twelve tone scale.
The other 5 tones are the
black keys are as scan before,
have long names because of
their position on the
keyboard. A black keys lies
between two white keys. It
gets two names because we
have a choice of what to call
it gets one beat.

Fig: Notes on the staff

The time signature is placed after the key signature. Notes written together on the
same beat, are to be played, or sung together. It is a localized place for all sharps of
flats depending on the scale, appearing in the score. Bars denote measures of
appropriate length as determined be the time signature.

SOUND CARD
The sound card is an expansion board in a multimedia PC that interfaces with
the CPU via slots on the motherboard. It is a SoundBlaster from creative labs.

12
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

Basic Components of sound card:


i) Memory Bank- This denotes the local memory of the sound card for storing audio
data during digitization and playback of sound files.
ii) DSP- A multipurpose Digital Signal Processor is the main controller in the sound
card.
iii) DAC/ADC- The digital to analog and an analog to digital converters for digitizing
analog sound and reconverting digital sound.
iv)Synthesizer Chip - A MIDI synthesizer chip is necessary to recognize MIDI
sequence recorded onto the disk or input from an external synthesizer.
v) CD Interface - This is the internal connection between the CD drive of the PC and
the sound card.
vi) ISA connector - Interface for exchanging audio data between the CPU and sound
card.
I/O Ports: Typical sound card contains the following I/O Ports such as,

Line Out - Output port for


attaching speakers and
playback of sound files.
Line In - Input port for
feeding audio data to the
sound card through a
microphone connecting to it.

MIDI - MIDI port for interfacing with an external synthesizer is present in some cards
like SoundBlaster 16.

Color Coding -PC 97, PC 98, PC 99 or PC 2001, specification is a serial hardware


system specifications and recommendations compiled by Microsoft.
 Microphone input -pink
 Line in - light blue
 Line Out - lime green
 Connector type 3.5 mm TRS
13
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

Handling Audio files

WAV Files - The signals


go to an ADC chip that
converts the analog signal to
digital data. The DSP then
sends the data to the PC’s
main processor that in turn
sends the data to the hard disk
to be stored.

MIDI Files - To Handle


MIDI files, the sound card
requires a synthesizer chip that
can recognize MIDI
instruction and produce
corresponding sounds. MIDI
files are textual files that can
be created by writing
instructions using appropriate
software, Example Midisoft
Recording Session.

AUDIO TRANSMISSION
To convey digital audio in real time between different hardware devices, there
must be both a data communications channel and common clock synchronization. In
addition the interconnection requires an audio format recognized by both transmitting
and receiving devices. Some important interfaces are:
i) AES/EBU Standard: The Audio Engineering Society/European Broadcasting
Union - It carries digital audio signals between devices and components published in
1992 and subsequent revised a number of times. It is same as part 4 of the IEC
(International Electrotechnical Commission) standard 60958 and is officially known
as AES3. It specifies the format for serial digital transmission of two channels of
periodically sampled and uniformly quantized audio signals on a single twisted pair
wire.

14
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

The transmission rate is samples of audio data one from each channel are
transmitted in time division mutiplex in one sample period. There are three types of
connectors are defined they are:
Type I: Balanced 3 conductor, 110-ohm twisted pair cabling with XLR
connector.
Type II: Unbalanced 2 conductor 75-ohm coaxial cable with RCA connector
Type III: Optical Fiber with TOSLINK connector.
NOTE: The most common connector is the 3-pin XLR.
ii) S/PDIF Standard: The Sony/Philips Digital Interconnect Format- It is a
standard for transmission of digital audio signal between devices and components. It
was developed from the AES/EBU standard used in DAT systems. Data is sent as a
stream of 192 words for each channel alternately, each word is composed of 32 bits.
A set of words for each sample in each channel is called a data frame.
Identified by a code in the first four bits of each word. The word format provides
support for the SCMS- Serial Copy Management System to control copying. This
standard is used to transmit compressed audio data from DVD player to a home
theater system supporting Dolby Digital or DTS surround sound formats.
iii) TRS Connectors: The most common audio connector is the phone audio jack,
also known as TRS (Tip-Ring-Sleeve) Connector. It is available in 3 sizes: 2.5 mm,
3.5mm and 6.5 mm. The 6.5 mm are used for manual telephone excahnges. The
3.5mm miniature and 2.5mm sub-miniature jack were designed for audio output from
transistor radios. All three inchs are available in 2-conductor (mono) and 3-conductor
(stereo) versions which carrying the left and right channel audio data and the third
conductor is ground.
Male (plugs) and female
(sockets) jacks are as headphone
and earphone jacks of audio
equipments, microphone inputs
on cassette recorders, line-in and
line-out ports of PC sound cards,
audio ouputs for electric guitars
and keyboards, I/O ports if
external audio amplifiers.
Portable devices like walkmans,
also known as TRS jacks which
includes Tip carries the left
channel signal: Ring carries the
right channel signal and Sleeve
which is the ground conductor.

15
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

iv) RCA Connectors: It is used mainly for home applications developed by Radio
Corporation of America (RCA). The male connector consists of a central male
connector surrounded by a metal ring and is found at cable ends. The female (jack)
connector found on device consists of a central hole with a ring of metal around it.
The jack also has a small plastic ring which is color coded for the signal type: yellow
– composite video, red – right audio channel and white or black – left audio channel.
v) XLR Connectors: It is mainly used in professional audio/video recording and
transmission applications. It is manufactured by Canon and referred as Code K, and
versions included as Latch and Rubber gasket which together as XLR, some types are:
1) XLR 3-with three pins connector Used for high quality microphones.
(An XLR3M (Male) connector is pin1–chassis ground, pin2-normal
used for output and XLR3F polarity voltage(hot) and pin3- reverse
(female)connector is used of input.) polarity voltage(cold)

2) XLR4 – with pins connector Used with intercom systems


3) XLR5 – with five pins connector Used for lighting control equipments

vi) TOSLINK Connectors:


It is a fiber-optic connection system
for digital audio signals in consumer audio
systems. Originally developed by Toshiba
for connecting their CD players to audio
equipment. The audio stream carried over the
Note: Its maximum length is 10m connection system complies with the SPDIF
for reliable transmission with standard. This connector is used to connect
3.5mm diameter of audio jack and DVD player output to Digital Dolby/DTS
cable bandwidth is 10 MHz.
decoders.
AUDIO FILE FORMATS AND CODECs
1) WAV: (Waveform) – This audio format is a Microsoft and IBM audio file format
for storing audio on PCs. Its variant of RIFF bitstream format and IFF/AIFF format. It
16
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

is used for uncompressed 8,12 and 16-bit audio files both mono and multi-channel.
The sampling rate is 44.1 kHz and hold an audio compression using lossless CODECs
2) AIFF : (Audio Interchange File Format) – It is used for storing audio data in PCs
was co-developed by Apple based on Electronic Arts Interchange File Format (IFF).
The Audio in AIFF file is uncompressed so it is much larger than files that use
lossless or lossy compression. Types of chunks found in AIFF as common, sound data
chunk, marker, instrument,comment,name,author, and copyright chunk. The following
are the structure related to AIFF are:
a) IFF: (Interchange File Format) – It is used in Machintosh systems to facilitate
data transfer between software programs of different vendors. It is built up from
chunks, each chunks begins with a TypeID or OSType followed by 32-bit integer
specifies the size of the following data. Chunks hold different datatypes like text,
numerical or raw data.
b) RIFF: (Resource Interchange File Format) – It is used in Microsoft in Windows
for multimedia files. It is designed for 68k processor used for Apple Macintosh.
Microsoft file formats with WAV and AVI uses RIFF as their core design. The
optional INFO chunk allows RIFF files to be tagged with information into a number
of categories as copyright (ICOP), comments (ICMT) and artist (ART).
c) NIFF: (Notation Interchange File Format) – It is a musical notation based on
Microsoft RIFF structure. It is designed for exchanging musical information between
different musical editing and typesetting programs. OCR scans musical scoresheet and
converts the data as NIFF file.
d) RMI: (RIFF based MIDI) – It is introduced by Microsoft with standard MIDI file
enclosed in a RIFF wrapper. Later in embraced by MMA (MIDI Manufacturers
Association) are an extended MIDI format served as a container for MIDI and DLS
files.
3) MID: (MIDI) - It is textual files contain instructions on how to play a piece of
music in digital musical instruments published by MMA (MIDI Manufacturers
Association).
4) DLS: (Downloadble Sounds) – It provides game developers and composers to add
their own custom sounds to the GM sound set stored in a sound card’s ROM and it
automatically download from disk/CD-ROM into system RAM allows MIDI music.
At the same time, it enables the wavetable synthesizers in computer sound cards to
deliver improved audio.
5) XMF: (Extensible Music Format)- It is a container format have one or more
MIDI files, DLS files, WAV and other digital audio files to create a collection of all
resources needed to present a musical piece.

17
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

6) RMF: (Rich Music Format) – It is developed by Beatnil Inc., in which Beatnik is


a software based high-performance music and audio playback with Beatnik audio
engine has 64-voice software synthesizer with MIDI specifications.
7) MOD: (Module) file- It represent computer generated music files interms of note
numbers, track numbers, instrument numbers, etc. It also stores and specifies how
musical patterns should be played having 15 instruments. Later versions have 31.
8) AU: (Audio) – It is developed by Sun Microsystem using Java programming
language consists of a header of six 32-bit words that defines metadata about actual
audio data.
9)VOC: (Voice Creative)- It is a lossless file format used with Creative Sound
Blaster sound cards to encode speech recordings. It consists of 16-bit stereo,markers
for looping and synchronization for multimedia applications with lossless
compression using LPC. Later versions usd CELP.
10) SHN: (Shorten)- It uses lossless compression in CD-quality audio files such as
FLAC. Monkey’s Audio nd TTA. It consists of 44.1 kHz 16-bit stereo PCM for
optimized compressed audio data.
11) FLAC: (Free Lossless Audio Codec) – It achieves lossless compression rates of
30-70% streamed as well as a fast decode time that is independent of compression
level.
12) APE: (Monkey’s Audio) – It is a loseless audio format used to achieve
compression rates slightly better than FLAC and better than SHN. Here both encoding
and decoding is slower.
13) TTA: (The True Audio) – It is a free, simple real time lossless audio format
supporting multichannel 8,16 and 24 bit data of uncompressed input files. It uses a
real-time encoding/decoding algorithm allowing fast operation speed and minimum
system requirements. TTA file header contains unique format identifier which is
followed by two metadata blocks.
14) WV: (Wavpack) – It is a free, open source lossless audio compression format
allow users to compress both 16 and 24 bit audio files in the .WAV format.
15) OFR: (OptimFROG) – It is proprietary lossless audio CODEC optimized for
high compression ratios range from 25% to 70% of the original audio file size. OFR
dualstream is a lossy CODEC aims to fill the gap betweeen perceptual and lossless
coding.
16) MP3: (MPEG-1 Layer III) – It is a lossy compression which removes portions
of an audio file not perceivable by human beings to reduce the file size.
17) WMA: (Windows Media Audio) – It is a proprietary compressed audio file
format used by Microsoft. It is used as the new Advanced Systems Format (ASF) core
18
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

structure. Its features are elliptic curve crytography key exchange, DES block cipher
and SHA-1 hashing function.
18) RA: (RealAudio) – It is developed by RealNetworks, designed to conform to low
bandwidths and used as a streaming audio format by using Real-Time Streaming
Protocol (RTSP) and a Proprietary protocol Real Data Transport (RDT) to send actual
audio data. Latest version supports lossless compression.
19) OGG: (Ogg Vorbis) – It is a free and open audio compression project to create
multimedia and signal processing [Link] replaced MP3 as the de-facto standard
audio CODEC with many newer video game titles employing Ogg Vorbis.
20) ACC: (Advanced Audio Coding) – It is a part of MPEG-2 audio standard and
improvements over MP3 to provide better sound quality at samplig frequencies of 8-
96kHz, subsequently updated in MPEG-4 (Audio) Part-3 with Perceptal Noise
Substitution(PNS) and Long Term Predictor(LTP).
21) OMG, OMA, AA3:- It is used to store information on Minidiscs and other Sony-
branded audio players using ATRAC(Adaptive Transform Acoustic Coding) audio
compression algorithm. ATRAC3 LP2 mode uses a 132kbps data rate, ATRAC3 LP4
mode reduces the data rate to 66 kbps by using joint stereo and lowpass filter of
around 13.5 kHz.
22) MPC: (Musepack)- It is an open source version of MP2 (MPEG-1 layer2) format
uses Huffman coding, noise substitution techniques and variable bit rate between
3kbps and 1.3Mbps. It uses APEv2 metadata container and is also a streamable format
23) SPX: (Speex)- It is a patent free audio compression format designed for speech
coding in VoIP applications which needed to handle lost packets. CELP is a encoding
technique with three sampling rates as 8,16 and 32 kHz. It supports Variable Bit Rate
(VBR) for specific sounds to improve sound quality and Voice Activity Detection
(VAD) allows encoder to detect the presence of speech for silence and adjust bit rates.
DTX(discontinuous transmission) allows the encoder to start and stop transmission
asynchronously.
24) M4A, M4P: (MPEG-4 Audio, Part-14)- It is a container file format uses Apple
Lossless Audio Codec(ALAC) or Apple Lossless Encoder(ALE) for lossless encoding
of digital music, where data is stored within an MP4 container in the filename
extension M4A and developed by Apple computer.

25) AMR: (Adaptive Multi Rate)- It is a patented file format for speech coding and
adopted by 3GPP as the standard speech CODEC for mobile phones. In supports a
sampling frequency of 8kHz at 13-bit depth and is partitioned into 20ms audio frames
each containing 160 samples. It uses 8 different CODECs having different bit rates
ranging from 4.75 kbps to 12.2kbps.
19
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

AUDIO RECORDING SYSTEM


A Dolby laboratory was founded by Ray Dolby in 1965 in England, but the
company was later moved to United States in 1976. The first product he made was a
simple compander called Dolby Type-A Noise Reduction. It is used in analog
Magnetic tape recording. Dolby Types are

 Dolby A type NR
 Dolby B type NR
 Dolby C type NR
 Dolby S type NR

Dolby A type NR
It is introduced in 1965 was Dolby Laboratories first innovation. It was
originally intended for use by professional recording studio to make quieter master-
tape recordings.
Dolby B type NR
It is the original system designed for consumer tapes and is included nowadays
in most tapes-recording systems.
Dolby C type NR
It acts on the same basic principle but improves on the B-type by providing
twice the noise reduction.
Dolby S type NR
It is a highest-performance Dolby system for analog cassette recording, is
derived from Dolby SR and shares several its advanced features.

a) Cinema Systems

Dolby Digital AC-ꝫ


It is the surround channel technology that device high quality original audio for
up to 5.1 discrete channel. The five speaker channels produce a directional and more
realistic effect and that frequency effects channel can often be felt as well heard.

Dolby Digital surround Ex


It adds a third surround the Dolby Digital format. The third channel is
represented by real wall surround speakers, While the left might surrounding
channels are reproduced by speakers on the side walls.

Dolby SR
It is Dolby’s next advancement in analog film sound. This technology delivered
a significantly improved dynamic range over Dolby stereo and is still included today
on nearly 5 mm film prints.
20
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

Dolby Stereo
Dolby stereo is the original Dolby multi-channel film sound format that
revolutionized the movie experience. This analog optical technology was developed
for 35 mm prints and is encoded with four sound channels: Left / Center /Right and
surround for ambient sound and special effects.

b) Encoding/Decoding

Dolby AC-₂
Dolby AC-2 is an adaptive-transform-based algorithm that combines
professional audio quality with a low reducing bit rate, substantially reducing the data
capacity required in such applications as satellite and terrestrial link.

Dolby AC- ꝫ
It refers surround sound technology that devices high quality digital audio for
up to 5.1 discrete channel. The live speaker channels produce a directional and more
realistic effect and the low-frequency effects channel can often be felt as well.
Dolby Digital EX
It refers to surround sound format that introduces a center real channel of the
5.1 playback format of Dolby digital.
Dolby Digital Live
It is the real-time encoding technology that brings surrounding sound too
interactive audio, such as video gaming and PCs.
Dolby Digital Plus
It is a highly sophisticated and versatile audio CODEC based on Dolby Digital
and designed specifically to adapt to the changing demands of future audio, video
delivery.
Dolby E
It is a professional audio coding developed to assist conversion of broadcast
and other channel facilities to multi-channel audio. Other benefits, Dolby E encoded
audio can be edited, de-encoded many times without any degradation.
Meridian Lossless Packing
Meridian lossless Packing is a true “lossless” coding system defined for DVD-
Audio that compact PCM data with bit-for-bit accuracy, unlike “lossy” perceptual
coding such as Dolby digital.

c) Matrix

Dolby Pro Logic -It is the foundation of the multi-channel home theatre experience.
This technology decodes source audio encoded in two-channel Dolby surrounding for
four-channel playback.
21
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

Dolby Pro Logic II - It is a sophisticated, matrix decoding technology that expands


any two source audio, such as CDs and stereo-encoded video cassettes and video
game to a five-channel full bandwidth playback resulting in a surrounding experience.

Dolby Pro Logic IIx - It is a state-of-the-art matrix decoding technology that expands
native two and 5.1-channel source audio for 6.1-or7.1 –channel playback resulting in a
seamless, wrap around sound field.

Dolby Surround - Dolby surrounding is the original Dolby multi-channel film sound
format that revolutionized movie soundtracks. This technology encodes four channels
of audio onto just two audio tracks for media such as TV broadcast.

d) Virtual

Dolby Headphones - Dolby Headphones is a revolutionary signal-processing


technology that delivers channel surrounding sound over any pair of headphones for a
richer, more spacious and less fatiguing listening experience.

Dolby Virtual Speakers -Dolby virtual speakers is a highly advanced signal-


processing technology that delivers 5.1-channel surround sound from just two
speakers.

DIGITAL THEATRE SYSTEM (DTS)

It is a multi-channel surround sound format used for both commercial and


consumer grade applications. DTS is used for in-movie sound both on film and on
DVD. In 1991, the Dolby Labs created new CODEC named as Dolby Digital. It
consists of 5.1 channel system supporting five primary speakers and a sub-woofer
referred to as an LFE (Low Frequency Effects) channel. Some variants of DTS are as,

1) DTS-DS (Digital Surround): It is the standard for providing 5.1 channels of


discrete digital audio for consumer electronics and software. Information in the form
of a modified time code is optically imaged onto the film, a DTS processor in the
projection booth uses this time code to synchronize the projected image with the
soundtrack audio is recorded in compressed form on standard CD-ROM.

2) DTS-ES (Extended Surround): It is digital audio format delivering 6.1 channels


of discreate audio and fully backwards compatible with DTS decoders. Star Wars:
Episode 1 was the first film uses a channel routed to the array of speakers along the
back wall of a cinema. It decode the center surround channel information to be heard.

22
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

3) DTS Neo-6: It provides up to six channels of matrix decoding from stereo matrix
material, Sound systems offer discrete multi-channel sound and provide optimum
decoding of extended surround matrix soundtracks and can also generate a center
surround channel from 5.1 material.

4) DTS 96/24: It offers an unprecedented level of audio quality for multi-channel


sound on DVD video and is fully backward-compatible with all DTS decoders. 96
refers to 96 kHz sampling rate and 24 refers to 24-bit word length. The stereo CD is a
16-bit medium with a sampling rate of 44.1 kHz.

AUDIO PROCESSING SOFTWARE:


It allows to open,edit, manipulate, transform and save digital audio sound files
in various formats. Some of the commercial software are Sound Forge XP or CoolEdit
permits pictorial view of the audio waveform and then allow editing to be done by
selecting specific points or ranges on the waveform. Some of the features in audio
editing software are:
1) Displaying Audio Content: An audio editor allows to open an existing sound file
and view the audio waveform with time along the horizontal axis and amplitude along
the vertical axis. A measurement units for time is samples and seconds, amplitude is
decibels and percentage. This enables quick location of a specific portion of the audio
without actually listening to the audio.

2) Playing Audio Content: It is initiated by pressing play button in the toolbar. Other
funcions are stop, pause, rewind, goto start,goto end and loop playing are also
executed. Editors also positioning the head at a specific point in the file by mentioning
the time in hh:mm:ss format and the sound is heard on the speakers.
3) Pasting and Mixing: A portion of the audio waveform can be selected, copied and
pasted either in the same or different file. It will reflect the duration of the original
file, hence the file size changes as new data is inserted. A mix function allows one

23
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

sound to mix with two sounds so that both of them are heard simultaneously and the
total duration of the sound remains unchanged in which new samples not added, but
existing samples changed.
4) Changing Digitization Parameters: It provides functions to range the audio
digitization parameters such as sample rate(2 to 96 kHz), bit depth (8-16 bits) and
number of channels as one channel(mono) to two channels (stereo).
5) Cut and Trim: Cut function enables the user to select a portion of the audio file
and discard that portion. The file duration will be shortened in this case. Trim function
allows one to select a portion of a file and discard the remaining portion. The file
duration is same as the duration of the selected portion.
6) Zooming: It implies displaying a magnified view of the sound waveform without
actually changing the data stored in the file. When an audio waveform is displayed
pictorially, there is mapping between the number of samples and pixels is expressed as
a zoom ratio. We can zoom-in and zoom-out the view along the time axis as well as
the amplitude axis. Ex: 1: 64 means 64 samples of the sound are represented by 1
pixel in the pictorial representaion.
7) Amplitude Normalization: If in an audio waveform the maximum amplitude level
falls below the upper ceiling of the dynamic range. The normalize function can be
used to raise the amplitude levels so that they just cover the entire dynamic range
without any clipping.
8) Recording: It consists of two types: external and internal recordings.
a) External recording – It is done by using an external device and connecting it to
the computer for recording purposes such as voice and music recording.
 Voice Recording – Microphone needs to be connected to the input port
of the sound card using a cable and connector
 Music Recording – It is done by connecting the output port of an
external playback device to the sound card.

24
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

b) Internal Recording – It means playing a sound file on a computer and recording


the sound on the same computer. During recording user needs to specify the
digitization parameters.
 For music, Audio CD quality requires sampling rate as 44.1 kHz, bit depth as
16 bits and stereo channels.
 For speech, recording could be at a much lower rate such as 11 kHz using 8-bit
and mono format.
9) Noise Removal: This function use noise gates to remove noise from silent portions
of an audio signal. Paramaters like attack time and release time controls how fast the
noise gate closes to block noise and opens to allow desirable sound to go through.
10) Filters: It changing the nature of sound clip in some pre-determined way for
giving special effects. Ex: echo, reverb, chorus,..etc. For making changes, some filter
algorithm is used and the user may also allow to change the waveform manually by
dragging different points of the wave.
AUDIO AND MULTIMEDIA:
a) Types of Audio in a Presentation: There are several types of audio are as output -
speech, music and sound effects can be incorporated into multimedia.
i) Speech – It is an improvement element of human communication and can be used
effectively to transmit information. There are two types:
1) Digitized Speech: It provides high quality, natural speech but requires
significant disk storage capacity.
2) Synthesized Speech: It is not as storage-intensive but may not sound as
natural as human speech.
ii) Music: It is an important component of human communication and used to set a
tone or mood, provide connections or transitions, add interest and excitement and
evoke emotion with speech and sound can greatly enhance on-screen presentations of
text and visuals.
iii) Sound Effects: It is used to enhance or augment the presentation of information or
instruction. There are 2 types natural sounds that occur in a common place around and
synthetic sounds are produced electrically or artificially.
There are two general categories of sound effects:
 Ambient Sounds: This sound are the background sound which communicate
the context of the scene to the listener.
 Special Sounds: It is uniquely identifiable sounds such as telephone ring,
visuals and slam of the door, etc.

25
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

b) Reasons for using audio: To ensure that the goal and the objectives are met. In
which the redundancy is the key to better communication to the receiver or learner
through two sensory channels. The capacity of the human information system is
limited and hence the information loss and error do occur. Redundancy is the key to
minimizing these factors.
There are two conditions influence the effectiveness of audio are,
 Information presented in each mode should be congruent and not contradictory.
Identical Presentations of words in sound and text should be avoided.
 The audio is connected to motivation. A multimedia developer needs to
simulate the user’s motivation and curoisity to involve in all parts of the
program. It mainly act as a learning mode to attract the listeners.
c) Suggestions for using Audio Content:
i) Speech: To produce high quality recorded speech a script should be written and
professionally recorded. High quality microphones and a professional narrator with
clear diction and the ability to correctly pronounce the words are plus points. In
general, an effective narrator should be:
 Vary intonation to motivate, explain, emphasize,etc.
 Use a conventional tone.
 Be amiable, sincere, straight-forward.
 Avoid a lecturing tone.
To develop narrative scripts for integrating speech, multimedia developers should:
 Write the way people speak.
 Use language that the audience can understand and write in clear straight-
forward and in a short sentence.
 Avoid slang vary informal language.
 Interpret what the user is seeing rather that just describing it.
 Adhere to time limit requirements.
 Synchronize narration with visuals wherever possible.
ii) Music: A variety of pre-recorded music is available in digital form on CDs. In
addition, music can be recorded and edited by connecting synthesizers, keyboards and
other musical instruments with computers using MIDI. Music can be used to establish
mood, signal a turn of extents, provide transitions and continuity, accompanying titles
and introductory information, emphasize important points with visual information.
The following provides a rough guidelines for music:
 Choose a music style that conveys the mood you wish to create.

26
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

 Use different styles of music and instrumentation to suggest time periods,


cultures, locations and a sense of place.
 Music should be lower volume when used simultaneously with narration
so that the latter can be heard clearly.
iii) Sound Effects: It provide information about the following:
 Physical Events: Click of the keys of the keyboard, telephone ring or
breaking a glass.
 Dynamic Changes: As we pour liquid into a glass, we can hear when it if
full.
 Spatial relations: Judging the distance and direction of a person walking
by the sound of the footsteps.
There are three significant considerations should govern the use if sound
effects they are:
1) They must be clear and easily identifiable.
2) They should not overwhelm the primary message.
3) They should be appropriate to the intended audience.
As with music, sound effect libraries are also available which can serve
as sources of specialized sounds. These can be divided into categories,
transportation, backgrounds, military, household, machinery, animals,
environmental etc.
VOICE RECOGNITION AND RESPONSE:
The capability of approaching a user through computers by voice recognition
and response verbally too. This is done by I/O devices in a multimedia systems using
sound card. Voice recognition and response of the computer system are fully
understand normal spoken language for command and data input are able to formulate
responses in natural sounding voice. Example: Fiction movie – Star Trek.
In voice recognition, products are categorized as either continuous or discrete
recognition to handle either a small or large vocabulary and as either speaker-
dependent or speaker-independent.
Continuous Recognition: Normal human speech is continuous with unlimited
vocabulary and speaker independent.
Discrete Recognition: In products that utilize discrete recognition, the speaker must
pause briefly between words and phrases, the delay normally in the range of a few
milliseconds.

27
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

In a voice- dictation system, this would mean that all the words would be
recognized or processed, where voice- command system only the command words be
picked out of a continuous stream of speaking and the other words being ignored.
Because of this difficulty discrete world systems would be much implemented.
Small vocabulary systems size implies less than 1000 words or phrases,
while large vocabulary means larger number of words or phrases. Vocabulary
size can be processing the times as well as resources required for recognition
can grow in a non-linear fashion as vocabulary size increases.
Large vocabulary systems base their recognition on elements smaller than
a word such as syllable ir phoneme. Phonemes are the smallesr distinguishable
sounds in the dialect of a language. Each phoneme represents a family of
sounds, since the actual pronounciation of a specific phoneme varies according
to the surrounding phonemes, an effect called Co-articulation. Recognition by
phonemes be the right direction for tackling problems related to large
vocabularies, since there are only about 40 phonemes in the English language
which comprises of over 40000 words.
[Link]. Speaker- Dependent Systems Speaker- Independent Systems
It needs to be trained to A general voice model is sufficient
1. recognize each user’s voice and given all differences in
patterns. accent,pitch, etc.
Commercial Systems used and Recognize and respond to hundreds of
2. recognize about 30000 words words on a single Digital Signal
and 250 phrases are recited by Processing (DSP) Chip.
the speaker.
A recent development is a speaker adaptive system which constantly
uodates its word models based on the actual speaking patterns employed during
use and allow an operator to start using the system sooner.
In voice response, it is divided into synthesized and digitized speech. In
the latter case, words or phrases are spoken, recorded, indexed and saved. These
speech fragments are then pieced together by the application to form the
spokem response. Although the vocabulary is limited the quality is high since it
is actual recorded human speech.
Synthesized speech systems allow a much larger vocabulary but
their quality is limited by that of the synthesis hardware/software. The voice
produced may also be sound mechanical. Speech recognition is nowadays

28
SUB: MULTIMEDIA TECHNOLOGY Class: III BSc., (CS&IT)

implemented by using statistical probability models like Hidden Markor Models


(HMM).

29

You might also like