This is the second part of a new series on digital music. The first part, chronicling the astonishing deterioration in recorded music quality, is here.
The sounds we hear are usually created and always heard in what is termed an ‘analog’ process. Earlier types of recording and playing back music used analog processes at every step of the process from the microphone, through the mixing and mastering and then the duplicating and eventual playing back of the sound (typically on a record or some type of tape).
In time, parts of the process became digitized – the mixing, mastering, and archival storage of music, and although there have always been potential benefits from digital processing, it is interesting to note that some of the earlier digitally processed recordings were inferior to their analog processed counterparts.
For a while, there was a type of technological snobbery and automatic assumption that digital must be better than analog, because it was ‘cleverer’. Those early days are largely behind us and modern digital technologies are massively improved on the first generation that came out in the 1970s.
The digital parts of the process became more common and the analog steps less common, and from the early 1980s, the consumer level playback of music also became increasingly based on digital formats too – first on compact discs and subsequently through MP3 and other types of digital music files that could be stored on computers and specialized music playing devices (and now on phones, tablets, and most other types of digital portable electronics as well as internet streaming services).
The Problems With All Digital Formats
Digital formats have some enormous advantages over analog formats, but they are not perfect, and some of their advantages are related to the ‘management’ of the overall sound recording process rather than related to the sound quality itself. Furthermore, digital formats add an extra element of potential impact on the music you hear – the music which starts off in analog format is then converted into digital format, and then, prior to being heard by you, has to be converted a second time back into analog format.
In very rough analogy, this double conversion can be thought of invoking a process a bit like photocopying an original document, and then photocopying the photocopy. You’ve surely seen how the quality of a copy of a copy of an original can be very much less than the quality of the original, and you’ve probably also noticed that some photocopiers do a better job of photocopying than others. It is the same with digitizing.
You’ve probably seen sample illustrations of a typical ‘pure’ sine wave (such as shown here), but most sound that we hear is nothing at all like that
It has an incredibly varied mix of different frequencies and wave forms and intensities, all jumbled up together to create a very complex waveform shape, as is shown in the second picture, a screenshot of an oscilloscope.
The key challenge is that an analog wave form is curved and continuous.
A Popular Misunderstanding : What Happens When You Digitize an Analog Waveform?
Now for a key point which is often misunderstood. There are two ways of digitizing an analog wave form – a simple and obvious way, and a much clever and not intuitive way. Many people think that music is digitized the simple way, but it is not – it is digitized the clever way.
Let’s first explain the simple obvious way. This would be to convert from smooth (or even jagged) curves into a series of flat steps, each step having a straight vertical line up/down and a straight horizontal line across. Make the steps big and infrequent, and there’s a huge difference in shape and therefore sound as between the original analog sound and the final digital representation. So the steps have to be as small as possible, so as to more closely (but never exactly) approximate the analog waveforms they are trying to represent.
And therein lies a huge problem – getting the digital representation of the analog waveform as close as possible to the curves, but being limited only to straight lines, either horizontal or process. The image on the left clearly resembles the pure sine wave form shown above, but equally clearly, is also an imperfect representation rather than an exact copy.
The digitization process has to somehow convert this continuously variable complex shape into a simple series of electronic ‘bits’ – ones and zeros, if you prefer. And then, when playing back, the ones and zeros have to be reassembled into a form as closely resembling the original waveform as possible.
Although the waveform shapes look very complicated when seen on a screen, they all comprise a very simple ‘two dimensional’ form, a varying level of amplitude (generally shown as how far the wave form moves above or below the horizontal line) over the passing of time (which is portrayed by the horizontal line itself). To digitize an analog waveform, you need to create a series of ‘slices’ of this wave form, each of a fixed amount of time, and showing its amplitude at that time.
Now for the two big things that determine how accurate the digital representation of the analog wave form will be. The ‘thinner’ each slice is – ie, the more slices per second, the better (each slice, by the way, is called a ‘sample’ and the number of samples per second is called the sampling frequency).
But, and here’s the big thing. This is not how music is digitized, because it is clumsy and inefficient and ineffective. There’s a better way.
The True Way Music is Digitized
If you’d like to understand how this is possible, this is perhaps the best written and understandable explanation. It was written back in 2004, so some of the writer’s conclusions about the ill-advisedness of sampling at ‘excessive’ rates are no longer quite so valid, because computers these days are much faster, more powerful, and affordable. But his conclusion that there is no need to sample faster remains as rock solid and valid today as it always has been.
A possible analogy that might be meaningful to you is to compare the difference between bit mapped and vector graphics. The Nyquist-Shannon sampling process is akin to vector graphics, where the curves are created by mathematical functions rather than by simply building up a picture out of a series of dots. Vector graphics are infinitely expandable, without requiring larger data files or losing any resolution. There are sort of similar benefits, when digitizing music, if you do it via the Nyquist-Shannon process rather than time slicing the wave form into steps.
The Implications of the Common Misunderstanding about Digital Sampling
Our point here is perhaps surprising. The analogy or explanation used by many writers to explain how music is digitally sampled is wrong.
And now for the really big thing. Not only is the explanation wrong, but so too is their derivative conclusion – that a faster sampling rate will therefore give a better approximation of the music being sampled.
The sampling rate on normal CDs today – 44.1 kHz – actually gives a near perfect representation of all frequencies up to about 20 kHz. Increasing the sampling rate will not improve the clarity or quality or anything about frequencies under 20 kHz. It will only impact on frequencies above 20 kHz.
The only reason to sample at a greater rate is if there is additional relevant sound information that should also be recorded, and which is possible to be then played back again, above 20 kHz.
We discuss that later in this article.
Sampling Depth Issues Explained Also
The ‘step/slice’ analogy/misunderstanding can also make it possible to misunderstand the ‘depth’ needed for each sample as well as the number of samples per second.
More ‘depth’ doesn’t mean more accurate ‘steps’, because we’re not talking about steps. All that the sample depth does is give us a greater range between the minimum and maximum sounds that can be sampled.
For example, a CD has a sampling frequency of 44,100 samples per second – shown as a frequency of 44.1 kHz. It has a sampling depth where each sample is recorded in a 16 bit computer ‘word’. That does not mean that the sample has one of 16 possible values, but rather one of 216 different values, enabling it to show the sound level at any one of 65,536 different values.
This means that is can handle a range of sound levels from quietest to loudest of about 96 dB, and with some clever ‘cheating’ (called ‘dithering’) this range can be extended to slightly over 100 dB. As we discuss below, that is more than sufficient for any imaginable recording and playback requirement.
Answering the Question – Measuring the Quality of Digital Music Recording
The higher the sampling frequency, the better, and the larger the sampling depth, also the better. So, at least in theory, a 25 kHz sampling frequency is not as good as a 50 kHz sampling frequency, and a 100 kHz sampling frequency is best of all. Similarly, if you had to choose between an 8, 16, or 24 bit sample depth, it would seem the 24 bit depth would be best.
And this answers the question we posed as the article heading – what makes digital music good or not so good in quality? Two things are most important : sample frequency and sample depth.
So far, so good, and probably – hopefully – this has also been acceptably easy to follow. But now for where things start to become a bit more esoteric and complicated.
First, we’ll simply state, without describing in detail, that there are a number of other issues involved in digital music capture, processing, and so on, all of which impact on quality. Most of these issues are subtle rather than obvious, and only become relevant when you’re exploring the top end of the very best quality equipment for recording, storing, and playing back music. So we’ll ignore them, but be aware that as you start to move towards the theoretical best sampling rate and depth, there are other issues that are probably not as clearly defined and described as sample rate and depth which will increasingly become relevant factors in the total quality of the sound you experience.
At What Point Do Improvements Become Meaningless?
Secondly, presumably there comes a point where the digital representation of the analog sound becomes so good that there’s no possible perceptible improvement by increasing the sample rate or depth any further.
This is the key point to understand – at what point do diminishing returns start to set in, and at what point is any further upgrading pointless?
There’s a third issue in the background as well – the difference between lossless and lossy forms of digital storage. We discussed that recently in our article about how modern digital music seems to be going down rather than up in quality, as well as many years ago when limitations on computer hardware made lossy compression more necessary than it is these days. We’ll not even link to these ten-year old articles, because so much has changed since then – although, sadly, the one thing that hasn’t really much changed is the prevalence of low quality lossy digital music formats, albeit now without any type of associated ‘excuse’ for why we should have to accept such low quality music any more.
To answer the third issue briefly, and particularly if you don’t want to read the full article linked in the previous paragraph, suffice it to say that if you’re seeking to create a library of high quality digital music these days, the best approach is to store your music in FLAC format.
Now let’s look carefully at the two parts of the second of these three issues.
How Much Sampling Frequency and Sampling Depth is Necessary?
If you have a pair of truly very high quality headphones (which these days can typically be expected to have reasonably flat frequency response way over 20kHz, maybe even over 30kHz), you can test your own hearing, for example, on this page, to see what your current upper frequency limit is.
If you’re not using high quality headphones, or if you’re using regular speakers (or, even worse, computer speakers), you’ll not be able to differentiate between your own hearing ability and the ability of the speakers to play the high frequencies and might not get an accurate result.
The human ear can also hear sounds down to about 20 Hz, but this is not so relevant in this discussion, because when digitizing a sound, the lower the frequency the sound, the easier it is to digitize accurately. This is because the sound amplitude is changing so quickly at a higher frequency, there needs to be a lot of slices in order to accurately match its wave form.
Even if We Can or Can’t Hear Highest Frequencies, Are They Recorded and Played Back?
Here’s an interesting question. Let’s look not at what we can hear, but on what sounds can be recorded and played back. At least until recently, few microphones had much frequency response above about 15 kHz, and the same for loudspeakers – their frequency response would drop off above that sort of range as well.
In the days before digital recording and mastering, music was stored on reel to reel tape, and that had a high frequency limit as well, probably in the order of again about 15 kHz.
Much first and second generation digital recording was done at 44.1 kHz or 48 kHz sampling rates, limiting the frequencies stored to slightly less than half these rates too.
Note also that while there is some high-end gear these days with truly awesome recording and playback capabilities, any musical group recording on a budget is less likely to be at a studio that charges potentially tens of thousands of dollars to use their top end facilities and recording staff, and is more likely to be at a studio with ‘middle of the road’ gear and which charges ten times less for their facility.
So, much of the time, whether it is relevant/important or not, the highest frequencies are/were not recorded and even if they are present in the recording, unless you have high-end speakers or headphones, may not be able to be played back.
Should We Bother About Frequencies We Can’t Hear?
Now let’s consider – are frequencies above 20 kHz really irrelevant or not? A related point is ‘what is the highest frequency sound a musical instrument makes’? This is harder to answer than you might think, because most instruments play a sound that comprises both the main frequency of the note and also a series of overtones – notes higher than the original frequency.
Some of these overtones are even multiples of the base frequency – these are called harmonics of that frequency, for example, if playing a standard A at a frequency 440 Hz, an instrument will almost surely also include some sound from the next A up an octave, and up two octaves, and so on. Harmonics are common. Overtones with different values are less common but also exist.
This varying mixture of overtones is part of what makes, eg, a violin sound different to a trumpet, and so on. On the left are wave patterns from different instruments, all playing the same note, but creating very different sounds.
So although the highest frequency note on a piano (C8) is ‘only’ 4186 Hz, its overtones go way, way, over 20 kHz. Even a much lower note, at, say, 1 kHz is still emitting possibly significant sound energy way up at 100 kHz.
The article also has a table of instruments and what percentage of their sound is over 40 kHz. You will note that these percentages often seem small, but when you convert even, say, 0.1% into a sound decibel difference, you are only talking about a shift of 30 dB. That’s a small difference in sound levels, even though an apparently large difference in percentages.
If what we just said sounds confusing, we’re simply saying that there truly is much more sound energy out there from common musical instruments than the bit we primarily focus on, below 20 kHz.
But does that matter – why should we record something we can’t hear?
There are two parts to answering this question. Firstly, all the overtones we can’t hear modulate and change the shape of the sound we can hear and can also interfere with the recording process.
Secondly, we have additional hearing abilities through paths other than our ears – bone conduction, for example. Studies have been done which have reasonably convincingly shown that even when the subjects have reported not hearing the sounds, their bodies and brains do respond to frequencies well above 20 kHz.
Maybe there is some relevance to frequencies that based on ear-sensitivity alone might seem meaningless.
So while in theory we need only be interested in recording and playing back sounds with frequencies up to the limit of our hearing (which in my case is currently about 13 kHz), in reality, maybe there are some valid reasons to try and expand the range of frequencies we capture and potentially play back again.
In other words, we’d argue for a sampling rate somewhat greater than 44.1 kHz.
What is the Optimum Sampling Depth?
Sampling depth gives us a different thing to sampling frequency. The depth of sampling relates to the dynamic range as between the quietest and loudest parts of a piece of music that can be stored and played back.
So how much dynamic range do we need, and how many bits of sampling are required to give us that?
Would you be surprised to learn that this too is not an easy question to answer! Indeed, if you have the time and patience, here’s a very well written article that clearly shows you never need more than 16 bits of sampling depth, which is followed by, currently, almost 2000 responses and arguments (a sort of final summation by the original poster, here, is also well worth reading). Sure, much is repetitive and some is totally nonsensical, but clearly there is a lot of uncertainty about this point!
In case you didn’t click the links, or wish a simpler explanation, sound is commonly measured in decibels, and this is a logarithmic scale rather than a linear one. Each 10 dB increase represents a ten-fold increase in sound energy, so, for example, a 90 dB sound would be 1000 times more powerful than a 60 dB sound. To put it another way, each 3 dB represents an approximate doubling in sound level.
Few people can sense a difference in sound level of 1 dB or less, most people can notice a difference in sound of greater than 1 dB and less than 2 dB. In theory, a person with good hearing can start to notice sounds as quiet as 1 dB. In practice, very quiet sounds are usually overlooked or ignored, unless ‘unusual’ or alarming.
But in terms of the gap between the quietest sound we can hear and the loudest sound we can hear, there are two additional considerations. The first is that even a quiet room is actually not totally silent. I’m in a ‘silent’ room in an empty house in a quiet suburb at present, and while I couldn’t hear any specific sounds, two different sound meters were reporting background noises of 29.5 dBA and/or 42 dBC (the last letter implies different ‘weighting’ measures, irrelevant for this discussion). On occasion – birds chirping, cars driving past, planes flying overhead – the sound level can jump up a further 3-6 dB. In other words, any music quieter than about 35 dB would be lost in with all the background noise – you see this, in a more extreme example, when driving in your car – the faster you go and the more traffic noise around you, the more you have to turn up the volume to hear the quiet bits.
The other limit is what is the maximum sound you can hear? Well, that’s not actually the right question. A better question is ‘what is the maximum sound level you can hear without pain or discomfort or hearing damage?’. The answer to that question is generally considered to be about 120 dB.
Here’s an interesting table of sound levels and the varying amounts of time you can be exposed to them before you start to suffer hearing loss. As you can see, anything above 85 dB starts to pose problems, and the louder the sound, the quicker the hearing damage.
So what that is telling us is that ideally, we want music that is never quieter than perhaps 40 dB, and never louder than perhaps 100 dB, which gives us about 60 dB of dynamic range or ‘difference’ between the quietest and loudest parts. Note that if you are listening through headphones, then your ‘floor’ – the background noise level – might be a bit lower, allowing for, but not requiring, a wider dynamic range.
There’s another important consideration. How much dynamic range does ‘real’ live music have? Modern pop/rock music actually has very little dynamic range – it is uniformly loud, and probably only has a 20dB – 30dB range. A symphony orchestra can have a range of perhaps 50dB up to a rare and unusual 70dB.
So it seems that the dynamic range we need for playback is similar to or greater than the dynamic range we need when recording. A 12 bit sampling depth would give about a 72 dB dynamic range which would seem adequate for most purposes.
CDs have a 16 bit sampling depth, which in theory allows for about 96 dB of dynamic range (note that terrible term ‘in theory’ appearing once again!). There are other things that can limit the dynamic range down below its digital theoretical maximum and some techniques (‘dithering’) that can actually increase it above its maximum, but clearly at 16 bit, there is a lot of extra room as between the typical dynamic range of whatever is being recorded and the potential dynamic range of when it is played back.
There are some arguments for still greater sampling depth and dynamic range. In particular, more sampling bits makes the recording process easier if you never have to worry about any sounds (‘transients’ in particular) exceeding the maximum recordable level, because when you go over the maximum level with digital recording, the sound deteriorates quickly and enormously. It also means that after manipulating a signal through a dozen different digital filters and processes, the ‘errors’ that unavoidably creep in (sort of again like the concept of a photocopy of a photocopy) remain too small to matter.
If you have 24 bit sample depth – a level commonly advocated by enthusiasts, you have an enormous 144 dB of range and almost 17 billion different sound level values, more than enough surely for anything and everything. Not only that, this also exceeds the dynamic range of the electronic components themselves (all of which have varying degrees of underlying noise floors).
We can’t see any value in 24 bit sampling (other than in the intermediate steps in the recording studio). On the other hand, this paper cogently argues in favor of slightly more than 16 dB. We’d be happy with 18 or 20 bits.
Better than CD Sound
If we were to struggle to summarize the preceding, we could say that CDs have probably inadequate sampling rates but acceptable sampling depths.
In addition to the CD standard, there are a number of other semi-standard sampling rates often used in recording studios, invariably being better than the CD standard. A common frequency is 48 kHz, but while the extra 9% of sampling frequency is welcomed, it is hardly transformative in terms of perceived sound quality improvement.
A double CD frequency (88.2 kHz) and also 96 kHz (ie twice 48 kHz) are also encountered. Higher rates also exist, for example, 192 kHz.
Greater sampling depths are often found, too, with 20 bit and 24 bit both common. It is rare to see sound recorded at greater sampling depths, although sometimes there is pre-processing done at 30 or more bits before being reduced down to 24 bit for storage (and down to 16 for distribution).
The great thing about FLAC format recording is that it supports sampling rates up to 655 kHz and sampling depths of up to 32 bits – but note that not all current hardware will support this. Most recording hardware will top out at between 96 kHz and 192 kHz, and at 24 bits. However, we find it comforting that the current FLAC specification is reasonably ‘future proof’ and will allow for better digitization as and when it may occur.
If we had to choose, we’d like to see a faster sampling rate (ie a minimum of perhaps 60 kHz, or even slightly greater, but we’re unconvinced as to the merits of going above 100 kHz) as the higher priority for quality optimization. We’d not say no to a greater sampling depth, but we only see another two or four bits of sampling depth. Give us a faster sampling rate first, please.
Phew. To try and summarize all the preceding, the current ‘Red Book’ CD standard of a 44.1 kHz sampling rate and 16 dB sampling depth – at least as finally distributed to the public on CDs – is actually very good indeed and more than adequate for most purposes, most of the time, giving a better approximation to ‘real live sound’ than any of the previous analog technologies it displaced.
A slightly faster sampling rate might improve the sound quality slightly further. There’s little need for any additional sampling depth. Anything over 100 kHz and 20 bits is almost certainly overkill.
This was the second part of a new series on digital music. The first part, chronicling the astonishing deterioration in recorded music quality, is here. More parts will follow.