What can I do with music, really? FFT and the XNA content pipeline

jwatte's picture

I was unhappy with the low resolution of the visualization data that you get out of the XNA framework MediaPlayer. If you want to synchronize gameplay to music, the data you get is not sufficient.

To work around this, I first wrote my own song player, playing uncompressed data. This gives me the full sound waveform at runtime, which is useful, but it also makes each song be 40 megabytes! This takes a while to load from disk on the Xbox, and also severely limits the number of songs you can include in a given download size. Not to mention that the raw sound sample data is not the best form for analyzing rhythm.

I then took another tack, and extended the song processor to create my own content type that uses the built-in Song, but also annotates it with extracted data. To get the extracted data, I run an FFT on successive blocks of the music in the content processor, and extract the magnitude of the frequency bins. I then throw away most of the high frequency bins, and downsample the resolution to normalize each frequency band stored in a value between 0 and 1 in a byte. With 32 bins and one sample every block of 1024 sound frames, this ends up being about 300kB of data instead of 40 MB of data. The actual song data is stored compressed by the Song class itself; I'm not changing that.

One additional trick: for better resolution I run a 2048 point FFT, and only use the bottom 32 bins (minus the bottom-most). However, I only slide the window forward 1024 frames per sample, so there is a 50% overlap between each analysis frame.

To run the FFT, I use the DJBFFT library. It's written in old-school C and only comes with make files for UNIX, but setting up a Visual Studio project that includes the appropriate files and adds dllexport for the symbols you need isn't that hard. Then I use P/Invoke to import the FFT functions into the C# XNA content pipeline project, and call it straight. Works great!

break_the_ice_fft.jpg220.03 KB