by Steven G. Estrella, Ph.D.
Today many intrepid people are taking the plunge into multimedia. Multimedia development environments such as Flash and Director can help you create interactive content that communicates with clarity and style. Creating sound for use in these environments requires a minimal understanding of the science behind sound and a knowledge of some of the jargon involved in multimedia.
If a tree falls in the forest and no living creature is there to hear it,
does it make a sound? The answer is no. Sound is a perceptual phenomenon only.
When a tree falls, a person speaks, or a violin string vibrates, the
surrounding air is disturbed, causing changes in air pressure that are called
sound waves. When sound waves arrive at our ears they cause small bones in our
ears to vibrate. These vibrations then cause nerve impulses to be sent to the
brain where they are interpreted as sound.
Figure 1 - Sound Waves
A vibrating string on a violin causes disturbances in the surrounding air.
These sound waves cause our ears to send nerve impulses to the brain
which interprets the disturbance as sound.
Sound waves can be transduced (converted to another form) using a microphone. A microphone is similar to the human ear in that it has a diaphragm which vibrates in response to changes in air pressure. The movements of the diaphragm within an electromagnetic field cause changes in electrical voltage. These voltage changes can be directed to a tape recorder which alters the magnetic particles on the tape to correspond to the voltage changes. A "picture" of the sound then exists on the tape. When you press play on the tape recorder, the "picture" is read back as a series of voltage changes which are then sent to a speaker. The voltage changes cause an electromagnet within the speaker to push and pull on a diaphragm. The movement of the diaphragm then causes air pressure changes which our ears interpret as the original sound. This process is known as analog recording because the picture of the sound on the tape is analogous to the original changes in air pressure caused by the sound event.
Figure 2 - analog recording
When sound waves strike a microphone,
they are converted to an electrical signal
which is then etched onto a magnetic tape.
Usually we represent sound visually as a waveform. The height is called the amplitude and represents volume. The distance between cycles is called the period or wavelength. The number of cycles per second is called frequency and is interpreted by our ears as pitch. Frequency is measured in hertz (Hz) or kilohertz (kHz).

The waveform above is a simple sine wave. Typical sounds are more complex in appearance. Here is a waveform of a short spoken phrase. Note the frequent changes in wavelength, amplitude, and frequency.

Digital recording differs from analog recording in that the "picture" of the sound is created by measuring the voltage changes coming from the microphone and assigning numbers to each measurement. The term "sampling" is used to describe the process of measuring an electrical signal's voltage thousands of times per second at a given level of precision (resolution). The number of measurements per second is called the "sampling rate" and is expressed as kilohertz (kHz). A rate of 11,000 measurements per second is thus designated as 11 kHz. Sampling rates range from 5 kHz to 48 kHz and beyond with higher rates being used for the best quality recordings. Harry Nyquist (1889-1976), a Swedish-born U.S. communications engineer, discovered that the frequency range of a digitized sound is limited to one-half the sampling rate. Since humans can hear frequencies in a range of 20 hertz to about 20 kilohertz, it is necessary to sample at more than 40 kilohertz to capture the full range of frequencies perceptible to the human ear.
The number of measurements per second, however, is only part of the picture. The degree of precision within each measurement is also important. This is known as "sampling resolution". Sampling resolution is used to divide the total range of the electrical voltage into discrete parts. Common sampling resolutions in use today are 8-bit and 16-bit. Sampling at 8-bits divides the voltage into 256 parts (2 to the 8th power). Sampling at 16-bits divides the voltage into 65,536 parts (2 to the 16th power). Using a higher sampling resolution creates cleaner recordings with less background noise. Higher sampling resolutions also capture a wider dynamic range. For example an 8-bit digitizer will only capture sounds up to 48 decibels (DB). Any portion of the sound that is louder than48 DB will be clipped and the resulting sample will sound distorted. 16-bit digitizers, however, capture up to 96 DB of volume. The dynamic range of the human ear extends to 120 DB.
Quantization is the term that describes the process of measuring the amplitude of a sound and rounding off the measurements according to the sampling resolution. For example, an 8-bit sound digitizer will assign integer values of between 0 and 255 for the amplitude of each sample. The result is that the original smooth waveform is reconstructed as a staircase shape with only 256 discrete levels of amplitude and noise is introduced into the signal. 16-bit digitizers, on the other hand, assign amplitude values on a scale of 0 to 65,535. At that level of precision, the reconstructed waveform is almost identical to the original and almost no noise is introduced.

All of these measurements are made by an analog-to-digital converter. The
measurements can then be stored as binary numbers in a file on a computer's
hard disk. To play back the sound, the computer sends the information in the
file to a digital-to-analog converter which reproduces the original electrical
signal. That signal is then sent to a speaker which produces the sound as
described earlier.
Maximum precision per measurement combined with maximum sampling rates
produces the highest quality recordings. To describe a digital recording of a
sound, therefore, one can speak of the sampling rate and resolution. For
example, sound recorded at a sampling rate of 22 kHz with 8-bit resolution is
considered to be of a quality similar to that of a telephone call. Sound
recorded at 44 kHz and 16-bits is considered the minimum quality for compact
disc recordings because it captures the full range of human hearing. In
multimedia production work, 11 kHz, 8-bit sound is sometimes acceptable for
speech recordings and 22 kHz, 8-bit resolution or 11 KHz, 16-bit resolution is
often considered acceptable for music. For the highest-level multimedia work,
however, nothing short of 44 kHz, 16-bit sound is acceptable.

When sound waves strike a microphone, they are converted to an electrical
signal which is measured many thousand times per second by an analog-to-digital
converter chip. The measurements are stored in the computer as binary
numbers.
The higher the quality of sound, the more space it takes to store the sound.
A compact disc can store about 74 minutes of stereo sound at 44 kHz, 16-bit.
If you reduce the quality to 22 kHz, 8-bit stereo sound, however, you can store
approximately 300 minutes of audio on the same disc. In other words, one minute
of stereo sound takes 10 megabytes of storage at 44 kHz, 16-bit quality, and
only 2.5 megabytes of storage at 22 kHz, 8-bit quality. When producing sound
for multimedia, therefore, one must consider not only sound quality, but also
how the sound will be distributed. If your multimedia program will be
distributed on CD then you may have enough storage space to justify using the
best quality. If the program will be distributed on disk or through the
internet, however, you would consider using lower quality sound to avoid having
to distribute many disks or subject your users to long download times. Demonstration
1 below has about 3 megabytes of embedded audio. Please click the link below
only if you have a high-speed connection.
Demonstration 1 - Digital Audio Sampling Rates and Resolutions
When sound is digitally recorded to a hard disk, a file format is assigned by the recording software. Today's disk-based sound file formats allow you to record music of any length and quality. You are only limited by the amount of available storage space on your hard drive. Disk-based sound file formats are ideal for longer and/or higher-quality samples. AIFF (Audio Interchange File Format) is one of the most commonly-used disk-based file formats on Macintosh, Windows, and even Unix computers. WAV is another popular file format, especially on Windows. If you use the internet frequently you may have encountered sound files in AU format. The AU file format is an old file format used by computers running the UNIX operating system but is limited to 8-bit samples. Sound editing software can convert among these and many other file formats.
Uncompressed audio and video files are far too large for practical transmission on the Internet. A single minute of stereo audio at CD quality, for example, takes up about 10 megabytes of data. When compressed, however, the same audio often takes up only 1 megabyte or less. The best compression technologies today, such as MP3 for audio and MPEG-4 for video, create very small media files that are almost as good as the original uncompressed files. Compression technologies for audio and video reduce file size by removing data from the file according to a set of rules called a codec (COmpression DECompression). When the visitor receives the smaller file, the media player application (RealOne, QuickTime, or Windows Media Player) attempts to reconstruct the original file using the same set of rules.
One way to understand compression-decompression is to imagine you need to send someone a series of important dates from the 20th century such as the following dates when major wars began. A very simple codec might remove the first two numbers of each set of four. The dates could then be transmitted in a file half the size of the original.
| Uncompressed Data | Compressed Data |
| 1917194519501961 | 17455061 |
When the visitor receives this file, our simple codec inserts 19 before each set of two numbers to reconstruct the original file. This simple example would result in a lossless compression because the final reconstructed file would be identical to the original. Unfortunately, audio and video compression is more complex and the final reconstructed file is not identical to the original. This is known as lossy compression. Today's compression technologies, however, are so refined that the loss of quality is often minimal and the quality is more than acceptable for most people. One great compressor for voiceover narrations is the Qualcomm PureVoice Compressor but MP3 does a good job on voice-only files as well. Two of the best codecs for music today are MP3 and QDesign. Demonstration 2 below has about 3 megabytes of embedded audio. Please click the link below only if you have a high-speed connection.
Demonstration 2 - Audio Comparison of CODECs
The Musical Instrument Digital Interface (MIDI) is a hardware and software
standard that, among other things, allows users to record a complete
description of a lengthy musical performance using only a small amount of disk
space. Standard MIDI Files can be played back using the sound synthesis
hardware of a Mac or PC. Using MIDI, Beethoven's Fifth Symphony uses about 1.3
megabytes of storage and can fit on one floppy disk. Using a digital audio file
format like AIFF, the same symphony uses over 300 megabytes of hard disk
storage. One problem with MIDI is that the quality of the actual sound you hear
will vary depending on the quality of your computer's sound hardware. For
educational applications, however, MIDI-generated sound can be used to
demonstrate musical ideas quite effectively. Another problem with MIDI in the
past was the lack of a standard sound set. A MIDI file designed to be played
with piano and flute sounds might be realized with organ and clarinet on
another person's computer. This problem was partially solved by the advent of
the General MIDI standard which created a standard set of 128 sounds. Virtually
all MIDI files today are distributed in General MIDI format. Still it was left
to the owner of each computer to be sure their sound hardware could play the
General MIDI sounds. Apple Computer solved the problem in 1996 by including a
bank of General MIDI instruments in its QuickTime software. As a result, you
can open any MIDI file in QuickTime Player Pro, save it as a QuickTime movie
and embed it in a Web page as seen below.
MIDI file of J. S. Bach's Invention No. 4 converted to QuickTime movie
You are welcome to download the original MIDI file, bachinv4.mid, for use in your music sequencing or music notation applications. Web sites can be used to exchange MIDI files, collaborate on MIDI sequences, and engage in group compositions. To learn more about MIDI, see Dr. Estrella's Incredibly Abridged Guide to MIDI at http://www.stevenestrella.com/midi/default.html.
One of Apple Computer's most brilliant innovations is the continuing
development of QuickTime. QuickTime began as a set of system extensions to
Macintosh System 7 to allow users to play digitized video in a small window on
the screen. Today QuickTime is a comprehensive multimedia tool for storing
video, animations, and sound in a variety of formats. It is also a
cross-platform tool, meaning that QuickTime movies can be viewed and heard
using computers running Mac OS or Windows.
So what does Apple's QuickTime technology have to offer musicians? The answer
is plenty. The free version of QuickTime, available from Apple's web site at www.apple.com, comes with QuickTime Player to
play back QuickTime content. Content creators must purchase the "Pro" version
of QuickTime for $30. The "Pro" version comes with QuickTime Player Pro which
can convert standard MIDI files into QuickTime movies that can be played back
by any Macintosh computer or any PC with a sound card and Windows 98 or later.
QuickTime MIDI movies use just a little more disk storage space than the MIDI
files on which they are based. The actual sound is produced by a software
synthesizer that QuickTime installs on your computer's hard disk.
QuickTime Player Pro can be used to convert audio from compact discs into
QuickTime movies that can be used in multimedia presentations. QuickTime Player
can be used to add sound and text tracks to digital video. Using a video
recorder, Apple's free iMovie software, and a Mac equipped with video input,
you could record a movie demonstrating instrumental techniques and then use
QuickTime Player to add a voiceover narrative. You could also add a descriptive
voice narrative to a QuickTime MIDI movie containing a full performance of a
complex work. QuickTime comes with several software CODECs
(compressor/decompressor) to reduce file size while retaining quality. For
music, the QDesign Music Compressor is excellent. For speech, the QualComm
PureVoice Compressor is a good choice. For video, the Sorenson compressor does
an impressive job of reducing file size for the visual portion of the video.
When used in combination with the QDesign or QualComm audio compressors, file
size can be made manageable for transmission over the internet. A "Fast Start"
feature is also available to allow the movie to begin playing while still
downloading to the user's computer. QuickTime also allows for streaming live
content as well.
QuickTime movies can be loaded onto any Web server and included in Web pages by using the appropriate EMBED tag contained within an OBJECT tag. This method of embedding QuickTime movies is compatible with all major browsers but the most recent method of embedding is even better. See http://www.makepages.com/freepages/makequicktime.html for more details.
<object classid="clsid:02BF25D5-8C17-4B23-BC80-D3488ABDDC6B" width="200"height="20"
codebase="http://www.apple.com/qtactivex/qtplugin.cab">
<param name="SRC" value="doodle1622.mov">
<param name="AUTOPLAY" value="false">
<param name="CONTROLLER" value="true">
<embed src="doodle1622.mov" width="200" height="20" autoplay="false" controller="true" pluginspage="http://www.apple.com/quicktime/download/">
</embed>
</object>

Figure 4 - QuickTime
Apple Computer's QuickTime software can be used to create movies with any
combination of video, audio, MIDI data, text, and animations.
QuickTime is perhaps the best choice out there today for deliverying high-quality audio and video presentations that are linear in nature. By that I mean, presentations that are meant to be experienced from beginning to end. That's one reason why so many movie trailers are delivered in QuickTime format. If your project requires high-level interactivity, however, Flash is a better choice.
Flash began life a silent animation tool but soon added audio and recently video to its arsenal of tools. Flash also has a sophisticated programming language to allow designers to create highly-interactive animations with synchronized audio and graphics.
Like Apple with QuickTime, Macromedia publishes its file format known as SWF (pronounced "swiff") as an open standard so other software makers can also offer products to allow you to create SWF files. That's one reason why Flash is so popular on the Web. Another reason is that the Flash plugin comes pre-installed on all major Web browsers today. As a result, designers can create Flash files and be assured of a sizable audience. Flash uses MP3 compression to create very small file sizes for audio and animation files.
You can learn more about Web Audio, QuickTime, and Flash in the relevant sections of this Web site http://www.makepages.com.
[ Return to Multimedia Basics] [ Back to Web Audio ]