Sound File Formats ================== RIFF (.WAV file format) ----------------------- * Covered in Microsoft Windows Multimedia SDK, Microsoft Windows 3.1 SDK, IBM OS/2 redbooks and Microsoft Developer Network CD (search for ADPCM). The RIFF subset that describes sampled audio is compatible between Microsoft 3.1 and OS/2 2.x. * Many sampling formats fit in the standard (i.e. the standard allows these formats), but only a few are "encouraged". The only format that is supported by all drivers and programs is PCM. 8-bit PCM is unsigned (0..255, 128 is silence), 16-bit PCM is signed (-32767..32767, 0 is silence). The reason for this discrepancy is probably the way hardware devices were implemented. * In practice, many applications only support a fixed subset of the flexible RIFF format. Because only few applications parse the flexible "chunked" header (similar to that of TIFF), and assume a fixed header instead, it is best to adhere to that standard. This also means that extensions (such as compression will not be handled well by most existing software. * Although any sample rate can be specified in the header of the RIFF file, only 11025 Hz, 22050 Hz and 44100 Hz are supported by many drivers. See also the notes on sampling frequencies below. * When seen as a fixed header, the header of the .WAV file format is: struct WAVEFMT { char signature[4]; // must contain 'RIFF' long RIFFsize; // size of file (in bytes) minus 8 char type[4]; // must contain 'WAVE' char fmtchunk[4]; // must contain 'fmt ' (including blank) long fmtsize; // size of format chunk, must be 16 int format; // normally 1 (PCM) int channels; // number of channels, 1=mono, 2=stereo long samplerate; // sampling frequency: 11025, 22050 or 44100 long average_bps; // average bytes per second; samplerate * channels int align; // 1=byte aligned, 2=word aligned int bitspersample; // should be 8 or 16 char datchunk[4]; // must contain 'data' long samples; // number of samples }; The .VOC file format (Creative Inc.) ------------------------------------ Creative .VOC files start with a header with the following format: offset size description ---------------------------------------------------------------------- 00h 14h Contains the string "Creative Voice File" plus an EOF byte. 14h 2 The file offset to the sample data. This value usually is 001Ah. 16h 2 Version number. The major version is in the high byte, the minor version in the low byte. 18h 2 Validity check. This word contains the complement (NOT operation) value of offset 16h added to 1234h. 1Ah ... Start of the sample data. All 16-bit and 24-bit values are stored in Little Endian (Intel format). Audio data is split in blocks (often there is only one data block in the file). Blocks start with a four byte header. The first byte of this block header is the "block type". The other three bytes give the length of the block excluding the header. The terminator block is an exception, only the "block type" byte is stored (the "data length" bytes are absent). type description ---------------------------------------------------------------------- 0 Terminator No extra data. According to the specification, this bloc should be present in the "in memory" representation of the voice file, but it is absent in most .VOC files. 1 Voice data The first two bytes of extra data give the playback speed and the compression mode. The rest of the extra data are the encoded samples. byte: playback speed = 256 - 1000000/sample_frequency byte: compression 0 = none (8 bits unsigned PCM samples) 1 = 4-bits packed (two 4-bit samples per byte) 2 = 2.6 bits packed (two 3-bit samples and one 2-bit sample) 3 = 2 bits packed (four 2-bit samples per byte) 2 Voice cont. This block contains only samples, no data on playback speed or compression. Therefore, a "Voice data" block must have preceded this block. 3 Silence Three bytes extra data: word: silence period in samples minus one (i.e. for the exact period you must add 1 to the value found here. byte: playback speed, see block #1 ("Voice Data"). 4 Mark Two bytes extra data: word: mark value. This block updates an internal status variable in the driver. This variable can be queried by the application software, for example to synchronize the sound with animation. 5 Text The extra data of this block stores a zero-terminated string. This block is only for additional information of the application software, it is ignored by the driver. 6 Repeat start Two bytes of extra data: word: the repeat count. The voice data is played once more than indicated in this count. All voice data block between this block and a block #7 ("Repeat End") is played count+1 times. Repeats cannot be nested. 7 Repeat end No extra data. 8 Extra info Must be followed by block type 1 and supersedes the playback information in that block. Four bytes with extra data: 2 bytes: playback speed = 65536 - 256000000/sample_frequency 1 byte: compression, see block type 1 1 byte: mode, 0 = mono, 1 = stereo In case of stereo samples, the compression must be 0 (none) and the frequency calculated from the first 2 bytes must be halved. Through its block structure, .VOC files support "silence compression". Most .VOC players expect only a single "voice data" block however. This is also the way most .VOC files are: a header, optionally an "extra info" block for stereo playback and a "voice data" block. The more powerfull compression scheme used by .VOC files is an ADPCM variant, called "packing". In fact, it is not ADPCM, because it stores the samples instead of the differences between the samples. The data can be packed into 2-bit, 2.67-bit or 4-bit samples (2.67 bit, or 2.5 bit as it is sometimes referred to, means that three samples are packed into one byte; two 3-bit codes and one 2-bit code). Only the format of 4-bit packing is covered here. For 4-bit packing, two samples are stored in one byte (so one sample per nibble). Each nibble has one sign bit (bit 3) and three magnitude bits (0..2). The value in the magnitude bits (in the range 0..7) is multiplied with a "step" value in that ranges from 1 to 10. The result is a signed PCM value between -70 and 70. Depending on the value of the magnitude bits, the step value is incremented or decremented. It is incremented if the magnitude is 5 or above and it is decremented if the magnitude is 0. Valid values for the step value are 1, 2.5, 5 and 10.