Decoding VOX Files in C# (Converting VOX Files to WAV Files)

I wrote a C# class to decode VOX files into WAV files. It follows the Dialogic ADPCM specification strictly. If you read through that specification, the code below will become a lot clearer, otherwise you might think you’re reading another language altogether. The specification is really quite simple and nice once you boil it down. Note that the Dialogic ADPCM specification is different from the way NMS Communications libraries create VOX files as their file format is slightly different, and for files such as those, the code below will not work without some tweaks.

My implementation to decode from VOX to WAV files is as follows:

using System;
using System.IO;

class VOXDecoder
{

    static float signal = 0;
    static int previousStepSizeIndex = 0;
    static bool computedNextStepSizeOnce = false;
    static int[] possibleStepSizes = new int[49] { 16, 17, 19, 21, 23, 25, 28, 31, 34, 37, 41, 45, 50, 55, 60, 66, 73, 80, 88, 97, 107, 118, 130, 143, 157, 173, 190, 209, 230, 253, 279, 307, 337, 371, 408, 449, 494, 544, 598, 658, 724, 796, 876, 963, 1060, 1166, 1282, 1411, 1552 };

    public static void Decode(string inputFile, out string outputFile)
    {
        outputFile = String.Format("{0}\\{1}.wav", Path.GetDirectoryName(inputFile), Path.GetFileNameWithoutExtension(inputFile));
        using (FileStream inputStream = File.Open(inputFile, FileMode.Open))
        using (BinaryReader reader = new BinaryReader(inputStream))
        using (FileStream outputStream = File.Create(outputFile))
        using (BinaryWriter writer = new BinaryWriter(outputStream))
        {
            // Note that 32-bit integer values always take up 4 bytes.
            // Note that 16-bit integer values (shorts) always take up 2 bytes.
            // Note that HEX values resolve as 32-bit integers unless casted as something else, such as short values.
            // ChunkID: "RIFF"
            writer.Write(0x46464952);
            // ChunkSize: The size of the entire file in bytes minus 8 bytes for the two fields not included in this count: ChunkID and ChunkSize.
            writer.Write((int)(reader.BaseStream.Length * 4) + 36);
            // Format: "WAVE"
            writer.Write(0x45564157);
            // Subchunk1ID: "fmt " (with the space).
            writer.Write(0x20746D66);
            // Subchunk1Size: 16 for PCM.
            writer.Write(16);
            // AudioFormat: 1 for PCM.
            writer.Write((short)1);
            // NumChannels: 1 for Mono. 2 for Stereo.
            writer.Write((short)1);
            // SampleRate: 8000 is usually the default for VOX.
            writer.Write(8000);
            // ByteRate: SampleRate * NumChannels * BitsPerSample / 8.
            writer.Write(12000);
            // BlockAlign: NumChannels * BitsPerSample / 8. I rounded this up to 2. It sounds best this way.
            writer.Write((short)2);
            // BitsPerSample: I will set this as 12 (12 bits per raw output sample as per the VOX specification).
            writer.Write((short)12);
            // Subchunk2ID: "data"
            writer.Write(0x61746164);
            // Subchunk2Size: NumSamples * NumChannels * BitsPerSample / 8. You can also think of this as the size of the read of the subchunk following this number.
            writer.Write((int)(reader.BaseStream.Length * 4));
            // Write the data stream to the file in linear audio.
            while (reader.BaseStream.Position != reader.BaseStream.Length)
            {
                byte b = reader.ReadByte();
                float firstDifference = GetDifference((byte)(b / 16));
                signal += firstDifference;
                writer.Write(TruncateSignalIfNeeded());
                float secondDifference = GetDifference((byte)(b % 16));
                signal += secondDifference;
                writer.Write(TruncateSignalIfNeeded());
            }
        }
    }

    static short TruncateSignalIfNeeded()
    {
        // Keep signal truncated to 12 bits since, as per the VOX spec, each 4 bit input has 12 output bits.
        // Note that 12 bits is 0b111111111111. That's 0xFFF in HEX. That's also 4095 in decimal.
        // The sound wave is a signed signal, so factoring in 1 unused bit for the sign, that's 4095/2 rounded down to 2047.
        if (signal > 2047)
        {
            signal = 2047;
        }
        if (signal < -2047)
        {
            signal = -2047;
        }
        return (short)signal;
    }

    static float GetDifference(byte nibble)
    {
        int stepSize = GetNextStepSize(nibble);
        float difference = ((stepSize * GetBit(nibble, 2)) + ((stepSize / 2) * GetBit(nibble, 1)) + (stepSize / 4 * GetBit(nibble, 0)) + (stepSize / 8));
        if (GetBit(nibble, 3) == 1)
        {
            difference = -difference;
        }
        return difference;
    }

    static byte GetBit(byte b, int zeroBasedBitNumber)
    {
        // Shift the bits to the right by the number of the bit you want to get and then logic AND it with 1 to clear bits trailing to the left of your desired bit. 
        return (byte)((b >> zeroBasedBitNumber) & 1);
    }

    static int GetNextStepSize(byte nibble)
    {
        if (!computedNextStepSizeOnce)
        {
            computedNextStepSizeOnce = true;
            return possibleStepSizes[0];
        }
        else
        {
            int magnitude = GetMagnitude(nibble);
            if (previousStepSizeIndex + magnitude > 48)
            {
                previousStepSizeIndex = previousStepSizeIndex + magnitude;
                return possibleStepSizes[48];
            }
            else if (previousStepSizeIndex + magnitude > 0)
            {
                previousStepSizeIndex = previousStepSizeIndex + magnitude;
                return possibleStepSizes[previousStepSizeIndex];
            }
            else
            {
                return possibleStepSizes[0];
            }
        }
    }

    static int GetMagnitude(byte nibble)
    {
        if (nibble == 15 || nibble == 7)
            return 8;
        else if (nibble == 14 || nibble == 6)
            return 6;
        else if (nibble == 13 || nibble == 5)
            return 4;
        else if (nibble == 12 || nibble == 4)
            return 2;
        else
            return -1;
    }
}

It is easily called through the following two lines:

string outputWAVFilePath;
VOXDecoder.Decode(pathToYourVOXFile, out outputWAVFilePath);

Give it a shot with this sample Dialogic ADPCM VOX audio file.

Alexandru

"To avoid criticism, say nothing, do nothing, be nothing." - Aristotle

"It is wise to direct your anger towards problems - not people; to focus your energies on answers - not excuses." - William Arthur Ward

"Science does not know its debt to imagination." - Ralph Waldo Emerson

"Money was never a big motivation for me, except as a way to keep score. The real excitement is playing the game." - Donald Trump

"All our dreams can come true, if we have the courage to pursue them." - Walt Disney

"Mitch flashes back to a basketball game held in the Brandeis University gymnasium in 1979. The team is doing well and chants, 'We're number one!' Morrie stands and shouts, 'What's wrong with being number two?' The students fall silent." - Tuesdays with Morrie

I'm not entirely sure what makes me successful in general programming or development, but to any newcomers to this blood-sport, my best guess would be that success in programming comes from some strange combination of interest, persistence, patience, instincts (for example, someone might tell you that something can't be done, or that it can't be done a certain way, but you just know that can't be true, or you look at a piece of code and know something doesn't seem right with it at first glance, but you can't quite put your finger on it until you think it through some more), fearlessness of tinkering, and an ability to take advice because you should be humble. Its okay to be wrong or to have a bad approach, realize it, and try to find a better one, and even better to be wrong and find a better approach to solve something than to have had a bad approach to begin with. I hope that whatever fragments of information I sprinkle across here help those who hit the same roadblocks.

Leave a Reply

Your email address will not be published. Required fields are marked *