Background
Sound, acoustics & the sense of hearing
Interesting and basic facts about acoustics and our hearing
Chapter 1
What is airborne sound?
Basically, swinging matter (i.e. repeatedly moving around a resting point) sets its surrounding medium (e.g. air) in motion. This can happen, for example, when a guitar string is struck, the human vocal chords or even a loudspeaker membrane. Energy must be expended for the oscillation, i.e. the deflection from the resting position, and thus the sound generation. When the vibrations spread through the surrounding medium to our eardrum, which also starts to vibrate, sound becomes audible to us. Hearing in a vacuum, i.e. in a room where there is no ambient medium, is therefore not possible.
A spreading oscillation is caused by one particle of matter (e.g. an air molecule) setting an adjacent one into oscillation. Although this movement of a single molecule is only minimal, the vibration propagates very quickly and is referred to as a (sound) wave. The speed of sound, i.e. the speed of propagation, is 344 m/s in air.
Waves are divided into longitudinal and transverse waves. While longitudinal waves (longitudinal waves) oscillate in their direction of propagation (e.g. airborne sound), transverse waves (transverse waves) oscillate perpendicular to their direction of propagation, as for example when a rope is oscillating. In liquids and gases, sound propagates only in the form of longitudinal waves. In solids, sound can propagate in the form of longitudinal and transverse waves. If you take a closer look at the acoustic process of a guitar, it becomes clear how the longitudinal waves relevant for the transmission of airborne sound behave in a room: When a string is struck, the vibrating body sets the surrounding air molecules in motion. These in turn set their neighbouring molecules in motion, resulting in air molecule compressions (overpressure) and thinning (underpressure). Sound is therefore nothing more than a propagating pressure fluctuation superimposed on the atmospheric air pressure, or in other words a mechanical oscillation propagating in the elastic medium (deflection from the rest position).
Different sound parameters cause certain sensations in humans. The amplitude is responsible for the perceived volume of the sound event. The frequency represents the pitch of a sound. Humans can hear a range between 20 Hz and 20000 Hz. The frequency spectrum defines the tone colour, because every sound event is a superposition of many sinusoidal oscillations of different frequency, phase and amplitude. More about this in chapter 2.
Chapter 2
Vibration superposition
If several oscillations overlap, acoustic phenomena can occur, e.g. interference or beating.
Interferences
Interference occurs when oscillations of the same frequency are overlaid. Here the phase position of the two signals is important, so that a distinction must be made between constructive and destructive interference: constructive interference occurs when two signals of the same frequency are superimposed in phase. This is also known as a phase difference of 0 degrees. Because the deflection values add up when two signals of the same amplitude are superimposed, when two signals of the same amplitude are in phase, a wave with doubled sound pressure amplitude occurs. When two waves in opposite phases are superimposed (phase difference 180 degrees), the signal is completely cancelled out when the frequency and amplitude are the same, so that in this case we speak of destructive interference.
Beats
Beating occurs when several sound events differ only slightly in frequency. This causes the waves to shift continuously towards each other and a periodic change between constructive and destructive interference occurs. There are therefore moments when the signals are amplified "in phase" or extinguished "in antiphase". We perceive this as a fluctuation in volume. So if a 400 Hz signal and a 405 Hz signal overlap, the beat frequency is 5 Hz.
A practical application of the beat is the tuning of a guitar. Here you can compare the tuning of the strings: If you strike both strings in a detuned state, the beat frequency is high and you can hear a rapid fluctuation in volume. The closer the tuning of the strings is, the slower the fluctuations will be, as the beat frequency decreases.
Complex oscillations
Complex oscillations result from the superposition of waves of different frequencies, which therefore have no phase relationship to each other and are described as incoherent or non-correlated (unlike interference and beating, which affect waves that have a concrete phase relationship to each other). When the individual deflections are added together, the result is not interference but a complex waveform. Because each sound event is a superposition of many sinusoidal oscillations of different frequency, phase and amplitude, an oscillation can be broken down into its individual sinusoidal oscillations. This is done with the discrete Fourier transformation, which as a result provides information about the frequency spectrum and thus about the timbre. Simple examples are shrill or sharp timbres, which therefore have many high frequencies, or a dull timbre, which indicates a few high frequencies.
Difference between pitch and timbre
Not only in the case of a sinusoidal oscillation, but also in the case of a repeating, i.e. periodically occurring complex oscillation, the period duration indicates the time the oscillation requires before it repeats, and the reciprocal value of this is the basic frequency of the complex oscillation, which is perceived as the pitch. The other individual frequencies form the overtones that determine the tone colour. Thus, in the case of a non-periodic vibration with a very irregular waveform (e.g. a cymbal beat), it can happen that no pitch is perceived, but only the timbre produced by the overtones can be defined.
Difference between sound and noise
The difference between sound and noise is that a sound consists of a periodic oscillation so that a basic frequency can be detected and thus a pitch can be assigned to it. The frequencies of the individual sinusoidal oscillations in a sound are integral multiples of the fundamental frequency - the sinusoidal tones thus form a harmonic partial tone series. In practice, the sinusoidal frequencies of a sound are pronounced differently. In the frequency spectrum, the areas between these harmonic partials often contain noise-like frequency components with a continuous spectrum, e.g. the striking noise of a musical instrument.
In contrast to sounds, noises are non-periodic oscillations to which often no pitch can be assigned due to a lack of basic frequency. The oscillation consists of frequencies in a continuous spectrum, between which there is no mathematical connection, so that no partial tone series results from the individual frequencies.
Chapter 3
Sound propagation indoors
How we perceive our surroundings, i.e. the sounds, noises, voices or even music, is significantly influenced by the spatial conditions. Sound cannot propagate unhindered in a room, but is reflected several times on walls and other sound-reflecting obstacles or objects and loses volume in the process. Direct sound is followed by so-called early reflections, of which there are usually six, one for each wall, floor or ceiling surface. Each reflection in turn causes many more reflections. These dense reflections, which increase rapidly in number, form the reverberation, which decreases due to constant energy absorption on reflective surfaces until it finally falls silent. The reverberation time is often given as RT60 value (RT = Reverb Time), which indicates the time until when the reverberation decreases by 60dB (usually related to a frequency of 1000 Hz). The reverberation time can be measured for the best possible accuracy or, if this is not possible, it can be calculated using formulae according to Eiring and Sabine.
In acoustics, a distinction is made between direct sound (sound radiated directly from the sound source) and diffuse sound (sound resulting from the totality of reflections). In the free field, only direct sound is usually present, whereas in a room, direct sound decreases with increasing distance from the sound source according to the law of distance, while diffuse sound is approximately equally loud everywhere. So the further away we are from a sound source in the room, the higher the diffuse sound component in relation to the direct sound. The distance at which direct and diffuse sound are equally loud is called the reverberation radius, which depends on the frequency.
Reflection processes lead to various phenomena in the room. On walls, reflections of all frequencies cause a pressure build-up, which is greatest in the bass range. Therefore, the frequency image near the wall is usually bass-heavy. Another phenomenon is standing waves, which usually occur between two parallel walls when the reflection is in phase with the incident sound wave. Because stationary pressure maxima and minima are produced in this process, sounds at certain points in the room cannot be correctly perceived or measured. This is particularly problematic at low frequencies, where the spatial distance between maximum and minimum pressure levels is large and the sound reverberates for a long time. For high frequencies, however, standing waves are unproblematic. A third phenomenon is flutter echo, which can occur between two parallel walls, usually at least eight metres apart, and ensures that we can hear how the sound is reflected back and forth several times. The comb filter effect is another phenomenon that can occur during the propagation of sound. This is the interference between the direct signal and its reflection, where some frequencies can be cancelled out and others amplified.
Chapter 4
Psychoacoustics & hearing physiology
The auditory system
The human ear is a sensitive and delicate pressure receiver and can be divided into the outer, middle and inner ear. Sound travels through the outer ear, i.e. the pinna and the external auditory canal, to the eardrum, which separates the outer ear from the middle ear and, for reasons of protection, is displaced through the external auditory canal into the inner part of the skull. The eardrum is a thin membrane of fibrous tissue that is set in vibration by sound pressure. In the middle ear, a system of levers consisting of the three ossicles, the hammer, anvil and stirrup, transmits the vibration to the oval window and adapts the force of the sound pressure to the sensitivity of the inner ear. Here, the hammer and stirrup muscles can attenuate the sound pressure by 6 to 10 dB from about 70 to 80 dB(SPL). To enable pressure equalisation, the Eustachian tube connects the middle ear with the oral cavity. In the inner ear, in addition to the floor ducts (organ of equilibrium), there is the cochlea, a helical cavity in the rocky bone, which consists of three fluid-filled chambers: atrial staircase, cochlear canal and tympanic cavity. Between the cochlea and the tympanic membrane is the basilar membrane, which carries the organ of Corti. The hair cells located within it are particularly strongly stimulated and set into resonance at different frequencies in different places. The ear breaks down the complex oscillation into different individual frequencies and transmits this information to the brain via specific auditory nerve fibres.
Music and sound trigger emotions in people, awaken memories, stimulate us or have a calming effect on our general well-being. What exactly happens in the brain during the reception of sound has been scientifically investigated for a long time. After the auditory nerve has first transmitted a sound impression to the brain stem, from there it reaches the auditory cortex. But it is not the only area that plays a role in sound perception: The involvement of one of the two speech centres (Broca's area), motor and visual areas, the limbic system for emotion processing and the reward system shows: sound and music connect our brain areas. There is a release of endorphins, i.e. the body's own happiness hormones, and of the neurotransmitter dopamine, which is of great importance in the brain for the reward system, as well as a decrease in the stress hormone cortisol. Making music together also increases the release of the binding hormone oxytocin. In addition to these positive chemical effects, sound and music also have a structural influence on our brain thanks to their neuroplasticity. Auditory stimuli cause nerve cells to reconnect, so that brain areas are better networked with each other, and stimulate the expression of the corpus callosum, i.e. the connection between the brain hemispheres.
Directional hearing
To help people find their way in their surroundings, they can locate sound sources very precisely. Directional hearing is a complex process that involves the analysis of three parameters: interaural time differences, interaural sound pressure differences and sound colour.
Interaural time differences describe temporal differences when a signal first hits one ear and then the other. To do this, the brain compares the phase between the left and right ear as well as the envelopes of the oscillations, i.e. the volume curve. This mechanism alone would entail the risk of mislocation, since one and the same time difference can always be assigned two possible angles (e.g. -45° and -135°). It is not possible to locate a sound source vertically - i.e. it is not possible to distinguish whether a sound source hits the ears from above or below. The brain can therefore only calculate an ambiguous horizontal angle.
Interaural sound pressure differences are detected by the brain and can give information about which ear the signal hit first, because there is an increased sound pressure there. For sound waves of a frequency of 1.6 kHz or higher, shadows produced by the head become relevant: if a sound signal comes from the left, the sound from this frequency onward can no longer bend completely around the head, so that the difference in sound pressure can be clearly perceived with this lateral incidence. Just as in the analysis of interaural time differences described above, this method only allows a left/right distinction to be made and no reliable information about a sound source from above/below or front/back can be derived.
Tone colour matching is the most important mechanism for locating a sound source. In contrast to what has been described above, this is not an interaural comparison of sound impressions, but rather a comparison of the (monaural) recorded sound colour with a learned correlation pattern stored in the brain. This complex process also makes it possible to differentiate between up/down and front/back. The shape of the human head, especially the shape of the auricles, causes shadowing effects, reflections and resonances at the skull and outer ear and thus influences the timbre of a sound signal in a specific way, depending on the direction from which the sound arrives and reaches the eardrum. Some frequencies are lowered and others are amplified. This filtering effect of the head is described by the Head-Related Transfer Function (HRTF). We do not consciously perceive the specific sound colours, but take relevant directional information from them. The fact that this comparative process must be preceded by a learning process is the reason why children have a worse directional hearing than adults. According to Theile's association model, the process of analysing the sound direction on the basis of sound colour is divided into two stages, the direction-determining stage (in which the angle of incidence is determined by pattern matching) and the shape-forming stage (in which the actual sound colour is determined after the HRTF inversion).
Distance hearing
Factors that make it possible to estimate the distance of a sound source are the volume, timbre (distance-related drop in frequencies, influence of air absorption, ...), the pre-delay (i.e. the time interval between the direct signal and the first reflection), the reverb component and, last but not least, the experience with which all impressions are balanced.
Auditory masking
Auditory masking provides acoustic masking. There is spectral masking and temporal masking.
With spectral masking, we cannot hear a sound if a louder sound in the same frequency range is heard at the same time.
Background: The threshold of quiet hearing is the sound pressure level at which the human ear can just about perceive sounds. This is frequency-dependent: For low and very high frequencies, the threshold of quiet hearing is high, while it is relatively low for medium frequencies. A resting hearing threshold curve thus indicates the sound pressure level at which a certain frequency becomes audible to us when there is otherwise silence.
When several sound events meet, an existing event reduces the sensitivity to newly occurring sound events. This results in a new hearing threshold, which depends on the volume and frequency spectrum of the existing masking sound event: the monitoring threshold. If a sound signal lies below this monitoring threshold of the masker, it is not audible due to total masking. Partial masking occurs when a sound signal is at or only slightly above the monitoring threshold. The masking effect is generally stronger at low frequencies than at high frequencies.
With temporal masking, a louder sound is not heard at the same time, but with a time offset, either before or after. In premasking, a sound is masked because a sound with a higher sound pressure level is heard shortly afterwards (maximum 20 ms). The reason for this is that a sound signal of higher amplitude is processed faster by the brain than a quiet signal. In postmasking, a sound is masked because a louder sound was heard shortly before. This effect occurs up to 200 ms after the end of the masking sound event. Like spectral masking, temporal masking is frequency-dependent.
To counteract masking effects, e.g. in music production, filters and equalizers are used to reduce or emphasize individual frequency ranges and to order the sound image. Simple volume and panorama controls also contribute to this. The result is a sound that sounds tidy and does us good.