Home > Computers and Internet > What is MP3?

What is MP3?

MP3 is actually only the extension of a file that complies to the "MPEG Audio Layer III" standard.
MPEG Audio is a sub-part of a complete MPEG encoding algorithm family, including video and multimedia.
MPEG Video, for instance, is used as the encoding technique by CD- and DVD-Video.

MPEG Audio is available in the following "flavours":

    * MPEG Audio 1:
      High quality audio encoding for 32kHz, 44.1kHz and 48kHz sampling rates
    * MPEG Audio 2:
      Low quality audio encoding for 16kHz, 22.05kHz and 24kHz sampling rates

There’s also one "unofficial" MPEG Audio version to mention: MPEG Audio 2.5
MPEG Audio 2.5 was added to the list by the inventors of MPEG Audio: Fraunhofer-Gesellschaft IIS in Germany and provides encoding of audio at very low sampling rates: 8kHz, 11.025kHz and 12kHz.

Each MPEG Audio "version" can be encoded in:

    * Layer I:
      Low compression, fast algorithm
      Compression ratio of 1:4 (@192 kbps per audio channel)
    * Layer II:
Medium compression, fairly fast algorithm
      Compression ratios of 1:6..8 (@128..96 kbps per audio channel)
    * Layer III:
      High compression, slow algorithm
      Compression ratios of 1:10..12 (@64..56 kbps per audio channel)

(Ratios based on "CD-Audio" quality)

Layer I isn’t much used anymore, since the speed of processors have been increased dramatically.
Layer II is only used with CD-Video and some professional equipment, but is also dying out.
Layer III is used the most nowadays, since current processors are capable of encoding layer III data
faster than real-time.

I refer to processors in general, not only to processors in PC’s.

Basically the technique behind the MPEG Audio algorithm is that it "abuses" the limitations of the human hearing.

Imagine you’re standing in the middle of a big city with lots of cars that are making a lot of noise and all of the sudden you see a pal of yours at the other end of the street. You are shouting at him because you want to make an appointment with him tomorrow.But because of the noise from the cars, he doesn’t hear you and just walks along. You could have spend all the energy for something else because he didn’t notice you after all.

This is exactly what the MPEG Audio algorithm does: it discards the audio that isn’t hearable in the (uncompressed PCM-Wave) source file; it just filters out everything we humans can’t hear. The only big problem is: computers "think" digital, in 1’s and 0’s, and we humans "think" analog, which means that we have a minimum and a maximum value with everything in between.

So the algorithm has to calculate what pitches (frequencies) of sound are in the total audio and their strength (volume). It does this by doing a so called Fast Fourier Transformation (FFT) on the signal to determine this.

Then it applies a psychoacoustic model (this can be compared to a database containing our hearing limitation) to leave out everything we can’t hear.

After that it tries to fit the remaining data into the space (bandwidth) we’ve set (the "Bitrate"). If not all bits fit there’s always the option to store some of the data-bits from the current frame into one of the adjecent frames if possible (this is called a bit-reservoir). This is the reason the decoder need to load multiple frames in a buffer, because it might need the next frames to decode the current frame that is being processed. The result of this buffering is that the decoder is somewhat behind in time (called latency). All decoders and players, like WinAmp, have this problem although this can be hidden to the user with some tricks.

The MPEG algorithm is a so called "Streaming format" which means that it doesn’t really have a beginning or an end. It doesn’t even have to be on disk but can also be transfered over, for instance, a network (like ShoutCast, etc.).

Because it’s a streaming format you can just "jump in" anywhere in the stream and go decode it. Problem only is that you need to know how to "read" the stream; ie. you need to know the stream properties (bitrate, samplerate, number of channels, etc.). This is solved by dividing the stream into chunks called "Frames", where each frame has its own "Header" describing these properties.

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: