Low level audio on the iPhone : a brief introduction

Sound on the iPhone

If you want to do high level (i.e. "easy") audio on the iPhone, I suggest looking at the Celestial framework. It has what you need to load and play media files. You can also check out GraphicsServices...it has a method that will allow you to play a sound simply by specifying the path.

But if you want to do low-level stuff, i.e. get down to the individual samples, then you're probably going to need AudioQueue. What is AudioQueue? It's like a brief vacation in hell...well, at least the way it's implemented on the iPhone. An official SDK is supposedly going to be released next week--perhaps it will shed light on this...but then again, documentation for AudioQueue is already up on Apple's website (for OS X 10.5), and it's still a bitter journey. Not that it's bad inherently...if you are merely looking at the API, it doesn't seem so tough. But the way it's implemented on the iPhone, it's hardly ideal for 3rd party developers, especially those trying to do input and output at the same time. Do this wrong or in the wrong thread, then the AudioQueue will stop...do this, and it will lose routing, or something will go haywire. Again, this may be due to my ignorance...I'm figuring it out as I go. [A more technical definition: AudioQueue is a group of C routines used for output and input of low-level audio.]

Who needs AudioQueue? If you want to simply play wav files, then you probably don't need to use low-level stuff...stick with Celestial. If you want to synthesize sounds, or analyze sounds via a Fast Fourier Transform, then you probably will have to use AudioQueue.

To use AudioQueue, you will need the AudioQueue.h header...if you are using the toolchain, then you will probably need to find the header yourself. Search for it on the Internet. It may be available in the XCode 3.0 download from Apple.

Low-level sound in a nutshell
Here is a very quick intro to low level audio that isn't iPhone specific. If you are new to all of this, it's probably best to do a search of the Internet to learn more about audio formats, samples, etc. Moreover, I recommend downloading audio programs like Audacity (freeware/open source) or Goldwave (shareware) to see what all this stuff means...use these programs to experiment.

Sound is all in your mind...what is actually out there in the real world is varying waves of pressure. As the air pressure rises and falls (perhaps 1000 times/sec), your eardrums move back and forth, and this is translated into sound by your brain. Imagine your eardrum as being horizontal...when the air pressure changes, it moves up and down. If you were to graph the location of the ear drum as time went by, you would end up with one of those waves you see in Audacity:

(You should open up a sound in Audacity, and zoom in until the line looks curvy and smooth.)

This is an "analog" wave. However, on a computer, you are going to have to be able to express this wave in terms of 0s and 1s...i.e. discrete measurements. So typically, an analog-to-digital converter measure the y-value (i.e. the height) of the wave. This will be repeated at a certain time interval. This is called "sampling", and each measurement is called a "sample". So instead of a smooth wave, you now end up with a digitized wave.

This is an image from Goldwave, which shows what a digitized sound wave actually looks like. It's not as smooth as a real one.

The rate at which the wave is measured is called the "sample rate", and is expressed in units of Hertz. (1 Hertz means once per second.) A sample rate of 8000 Hz means the wave is being measured 8000 times every second, with each measurement being 0.125 milliseconds after the previous one. This sounds like a lot, but really, 8000 Hz isn't great in terms of quality. The higher the sample rate is, the "less choppy" the wave. (See image above).

Also affecting quality is the number of bytes used to specify the y-value. For example, a byte (aka 8 bits) represents a value from -128 to 127. (This is called a "signed byte" because a negative sign is allowed...an "unsigned byte" would have values from 0 to 255). If we increase the sample size from 8 bits to 16 bits (i.e. 2 bytes), we'll be able to express the y-value more precisely.

The term "channel" is how many simultaneous sound waves you have..."mono" means 1 channel, while "stereo" usually means 2 channels. Surround sound uses multiple channels.
The term "sample frame", often just "frame", represents a set of samples (across all the channels) that were recorded at the same time. Mono audio only has one channel, so a frame will only contain a single sample. With two channels, a frame will contain two samples. (Those samples will have occurred simultaneously). If you record two channels (stereo) at a sample rate of 8000 Hz for exactly 1 second...you will end up with a total of 8000 frames, but actually 16000 samples. (Note that sample rate usually refers to the number of samples per channel...in other words, the number of frames.)

What AudioQueue does
AudioQueue will allow you to take an array, with each member of the array representing a sample, and feed that array into it to produce sound. It also allows the reverse to occur. You can save the samples you get from the iPhone's microphone. Funiculus Musical Instrument Tuner does exactly this...it reads in samples, and continually analyzes them for pitch.

What AudioQueue doesn't do
Work. At least not all the time unless certain arbitrary conditions are perfect...again, this could be due to my own ignorance.

To be continued...

Low level audio on the iPhone : a brief introduction

Tags:

0 TrackBacks

Leave a comment

Search

About this Entry

Categories

Monthly Archives

Pages