Pitch Detection in iOS 4.x

Pitch detection is a relatively common thing to do in the audio realm. After scouring the web I found a lot of useful Core Audio resources, but nothing directly related to pitch detection. In my searching I also found others who seemed to be looking for a way to do this. I decided to write a little app which demonstrates how pitch detection can be accomplished in iOS.

This is my first real tutorial so feedback is greatly appreciated. If you find this tutorial useful, let me know!

Cheers,

Demetri


A few things before we get started:

  • I’m assuming readers are somewhat familiar with Xcode and iOS development. If you are just learning the ropes, this tutorial may not be the best place to start. Apple has plenty of resources on their developer’s website to get started.
  • You need iOS 4.0 or higher to use this application because I use the new iOS Accelerate framework’s FFT functions for frequency analysis. You can download the latest SDK from Apple’s iOS Dev Center.
  • To keep the tutorial size from exploding, I’ve skimmed over some of the CoreAudio details and tried to focus on the pitch detection piece. If there is confusion with my code, let me know and I’ll elaborate.
  • I have stripped out some functionality of the app for clarity’s sake. I tried to remove most references to old code/comments though it’s possible I may have missed a few references.
  • Finally, I couldn’t have completed this app without the help from Michael Tyson’s post on RemoteI/O. If you want a better understanding of different parts of Core Audio, I highly recommend checking out his blog post on the topic: http://atastypixel.com/blog/using-remoteio-audio-unit/
  • Disclaimer 1: I know I glazed over a lot of details for the signal analysis portion of this application. It was never my intention for this tutorial to be the “end-all be-all” for signal analysis and pitch detection, but rather a foothold for those who want to learn to get started.
  • Disclaimer 2: This application is a good proof of concept but if you are planning on writing a tuner application for the iPhone and releasing it to the public, you’ll want to refine your frequency analysis algorithm (among other things).

EDIT: I’ve moved PitchDetector to github and also fixed a bug that is reproducible only on the device. Follow the link below to download/checkout the source code.

Note: Below is a link to the source code on github.com. The code is commented generously. If you have questions, let me know in the comments and I’ll elaborate as soon as possible.


Click here to view PitchDetector project page

Pitch Detector

  • Application overview: The application uses the CoreAudio Remote I/O API to sample and analyze input signals from an iOS devices microphone. The resulting frequency is then displayed on screen.
  • I’ll be walking through the Pitch Detector application lifecycle from the point a user clicks the “Begin Listening” button. Elaborating on certain pieces of the application as they are encountered.

Now without further ado…

Listener Controls

The “Begin Listening” button is linked to a toggle method which starts and stops the listener (In this context, listener is an entity which listens for audio signals).

Below is the startListener method (see ListenerViewController – [line 37]):

- (void)startListener {
	[rioRef startListening:self];
}

rioRef is ListenerViewController’s reference to RIOInterface (see ListenerViewController.h). RIOInterface (read: Remote I/O interface) is the bread and butter of this application. All the CoreAudio components needed are contained within this class. RIOInterface is singleton class since we will only ever be dealing with a single Remote I/O Audio Unit…more on that later.

RIOInterface.mm

Let’s look at RIOInterface’s  - (void)startListening:(ListenerViewController*)aListener method (RIOInterface.mm – [line 62]):

- (void)startListening:(ListenerViewController*)aListener {
	self.listener = aListener;
	[self createAUProcessingGraph];
	[self initializeAndStartProcessingGraph];
}

The two important methods here are [self createAUProcessingGraph] and [self initializeAndStartProcessingGraph]. Let’s discuss the former first (RIOInterface.mm – [line 240]):

- (void)createAUProcessingGraph {
	OSStatus err;
 
	AudioComponentDescription ioUnitDescription;
	ioUnitDescription.componentType = kAudioUnitType_Output;
	ioUnitDescription.componentSubType = kAudioUnitSubType_RemoteIO;
	ioUnitDescription.componentManufacturer = kAudioUnitManufacturer_Apple;
	ioUnitDescription.componentFlags = 0;
	ioUnitDescription.componentFlagsMask = 0;
 
	// Declare and instantiate an audio processing graph
	NewAUGraph(&processingGraph);
 
	// Add an audio unit node to the graph, then instantiate the audio unit.
	AUNode ioNode;
	AUGraphAddNode(processingGraph, &ioUnitDescription, &ioNode);
 
	AUGraphOpen(processingGraph); // indirectly performs audio unit instantiation
 
	// Obtain a reference to the newly-instantiated I/O unit. Each Audio Unit
	// requires its own configuration.
	AUGraphNodeInfo(processingGraph, ioNode, NULL, &ioUnit);
 
	// Initialize below.
	AURenderCallbackStruct callbackStruct = {0};
	UInt32 enableInput;
	UInt32 enableOutput;
 
	// Enable input and disable output.
	enableInput = 1; enableOutput = 0;
	callbackStruct.inputProc = RenderFFTCallback;
	callbackStruct.inputProcRefCon = self;
 
	err = AudioUnitSetProperty(ioUnit, kAudioOutputUnitProperty_EnableIO,
							   kAudioUnitScope_Input,
							   kInputBus, &enableInput, sizeof(enableInput));
 
	err = AudioUnitSetProperty(ioUnit, kAudioOutputUnitProperty_EnableIO,
							   kAudioUnitScope_Output,
							   kOutputBus, &enableOutput, sizeof(enableOutput));
 
	err = AudioUnitSetProperty(ioUnit, kAudioOutputUnitProperty_SetInputCallback,
							   kAudioUnitScope_Input,
							   kOutputBus, &callbackStruct, sizeof(callbackStruct));
 
	// Set the stream format.
	size_t bytesPerSample =[self ASBDForSoundMode];
 
	err = AudioUnitSetProperty(ioUnit, kAudioUnitProperty_StreamFormat,
							   kAudioUnitScope_Output,
							   kInputBus, &streamFormat, sizeof(streamFormat));
 
	err = AudioUnitSetProperty(ioUnit, kAudioUnitProperty_StreamFormat,
							   kAudioUnitScope_Input,
							   kOutputBus, &streamFormat, sizeof(streamFormat));
 
	// Disable system buffer allocation. We'll do it ourselves.
	UInt32 flag = 0;
	err = AudioUnitSetProperty(ioUnit, kAudioUnitProperty_ShouldAllocateBuffer,
								  kAudioUnitScope_Output,
								  kInputBus, &flag, sizeof(flag));
 
	// Allocate AudioBuffers for use when listening.
	// TODO: Move into initialization...should only be required once.
	bufferList = (AudioBufferList *)malloc(sizeof(AudioBuffer));
	bufferList->mNumberBuffers = 1;
	bufferList->mBuffers[0].mNumberChannels = 1;
 
	bufferList->mBuffers[0].mDataByteSize = 512*bytesPerSample;
	bufferList->mBuffers[0].mData = calloc(512, bytesPerSample);
}

Some of the comments have been removed for conciseness. Just know this method builds the AUGraph, instantiates the Remote I/O node, and specifies all the different properties needed for input to occur (e.g. enabling input, stream type, sample rate, etc). Anyone who has done work with Core Audio knows setting these properties correctly is half the battle and can cause lots of headache if done incorrectly.

Note the input data grabbed from the microphone is represented by a Singed 16-bit integers in LinearPCM format. Initially I was using the 8.24 fixed-point format specified in the documentation, which lead to many many hours of debugging other parts of my code :)

Audio Processing

Once we’ve set up all the components, we need to tell them to start! This happens in the second method we mentioned above

(RIOInterface.mm – [line 75])

- (void)initializeAndStartProcessingGraph

This method calls AUGraphStart(processingGraph) which tells the device to begin sampling audio signals and passing them to the callback function we declared when setting up the AUGraph.

Now it’s time for the good stuff! When the device has a set of audio data to give us, it calls OSStatus RenderFFTCallback(…) (RIOInterface.mm – [line 90]):

OSStatus RenderFFTCallback (void					*inRefCon,
					   AudioUnitRenderActionFlags 	*ioActionFlags,
					   const AudioTimeStamp			*inTimeStamp,
					   UInt32 						inBusNumber,
					   UInt32 						inNumberFrames,
					   AudioBufferList				*ioData)
{
	RIOInterface* THIS = (RIOInterface *)inRefCon;
	COMPLEX_SPLIT A = THIS->A;
	void *dataBuffer = THIS->dataBuffer;
	float *outputBuffer = THIS->outputBuffer;
	FFTSetup fftSetup = THIS->fftSetup;
 
	uint32_t log2n = THIS->log2n;
	uint32_t n = THIS->n;
	uint32_t nOver2 = THIS->nOver2;
	uint32_t stride = 1;
	int bufferCapacity = THIS->bufferCapacity;
	SInt16 index = THIS->index;
 
	AudioUnit rioUnit = THIS->ioUnit;
	OSStatus renderErr;
	UInt32 bus1 = 1;
 
	renderErr = AudioUnitRender(rioUnit, ioActionFlags,
								inTimeStamp, bus1, inNumberFrames, THIS->bufferList);
	if (renderErr < 0) { 		return renderErr; 	} 	 	// Fill the buffer with our sampled data. If we fill our buffer, run the 	// fft. 	int read = bufferCapacity - index; 	if (read > inNumberFrames) {
		memcpy((SInt16 *)dataBuffer + index, THIS->bufferList->mBuffers[0].mData, inNumberFrames*sizeof(SInt16));
		THIS->index += inNumberFrames;
	} else {
		// If we enter this conditional, our buffer will be filled and we should
		// perform the FFT.
		memcpy((SInt16 *)dataBuffer + index, THIS->bufferList->mBuffers[0].mData, read*sizeof(SInt16));
 
		// Reset the index.
		THIS->index = 0;
 
		/*************** FFT ***************/
		// We want to deal with only floating point values here.
		ConvertInt16ToFloat(THIS, dataBuffer, outputBuffer, bufferCapacity);
 
		/**
		 Look at the real signal as an interleaved complex vector by casting it.
		 Then call the transformation function vDSP_ctoz to get a split complex
		 vector, which for a real signal, divides into an even-odd configuration.
		 */
		vDSP_ctoz((COMPLEX*)outputBuffer, 2, &A, 1, nOver2);
 
		// Carry out a Forward FFT transform.
		vDSP_fft_zrip(fftSetup, &A, stride, log2n, FFT_FORWARD);
 
		// The output signal is now in a split real form. Use the vDSP_ztoc to get
		// a split real vector.
		vDSP_ztoc(&A, 1, (COMPLEX *)outputBuffer, 2, nOver2);
 
		// Determine the dominant frequency by taking the magnitude squared and
		// saving the bin which it resides in.
		float dominantFrequency = 0;
		int bin = -1;
		for (int i=0; i dominantFrequency) {
				dominantFrequency = curFreq;
				bin = (i+1)/2;
			}
		}
		memset(outputBuffer, 0, n*sizeof(SInt16));
 
		// Update the UI with our newly acquired frequency value.
		[THIS->listener frequencyChangedWithValue:bin*(THIS->sampleRate/bufferCapacity)];
		printf("Dominant frequency: %f   bin: %d n", bin*(THIS->sampleRate/bufferCapacity), bin);
	}
 
	return noErr;
}

In case you didn’t recognize already, this is a C function. The first parameter is a pointer to the Objective-C class that we then cast in order to get references to the class members. After all variables have been set up, we then call:

renderErr = AudioUnitRender(rioUnit, ioActionFlags,
								inTimeStamp, bus1, inNumberFrames, THIS->bufferList);

This method “Renders” sampled audio data from the microphone into bufferList. (Recall we allocated bufferList when we set up the AUGraph earlier).

The next if-else statement copies the buffered data just sampled into a larger buffer (dataBuffer). If dataBuffer is not filled, the callback is finished. When the buffer is filled, we perform the FFT.

The first step in performing the FFT is converting our buffered data into the correct format. After a little bit of setup, we can let the CoreAudio API do this for us. You can take a look at the code yourself in void ConvertInt16ToFloat(…).

Once our data is in floating point format, we can use the Accelerate Framework Apple released for iOS 4.0. These method calls have been highly optimized by the engineers at Apple. If you want to learn more, there is a video you can download from WWDC 2010 that discusses performance and comparisons to other libraries (e.g FFTW).

The theory behind the FFT is beyond the scope of this tutorial (nor am I the one who should be explaining it), but you should know that when all is said and done, the sampled data will reside in the frequency domain. All that is left is to apply the algorithm of your choosing to the data to get the dominant frequency. As I said beforehand, the algorithm I use is by no means the best, but it gets the job done.

After we have performed our FFT, we clear the buffer and update the UI to show the current frequency. That’s it!