We saw earlier how to play sound samples and change their parameters. Though this technique is simple and easy to begin with, it is not enough for making breakthrough sound art projects. One way to achieve this is by generating and synthesizing sounds and not using samples at all. Another way is to use samples as raw material for processing methods such as morphing and granular synthesis. Both ways are based on using low-level algorithms, which construct sounds as an array of audio samples in real time.
openFrameworks uses low-level sound input and output, and we use C++ for processing it, so our sound processing pipeline can perform almost any trick with sounds, will work fast, and with only small lags.
There is one thing that is currently not so convenient to implement with openFrameworks. This is processing a sound stream using a variety of standard filters and effects. To do this, you need to program filters yourself or use libraries or addons. Also, you can use software such as Max/MSP or Ableton Live for sound generation and then control it from openFrameworks via OSC protocol. See Chapter 11, Networking for more details.
For generating sound in real time, you need to start the sound output stream and then provide audio samples for the sound when it is requested by openFrameworks. The corresponding additions to the project's code are as follows:
testApp
class declaration as follows:ofSoundStream soundStream; void audioOut( float *output, int bufferSize, int nChannels );
testApp::setup()
function definition add:soundStream.setup( this, 2, 0, 44100, 512, 4 );
Here this
is a pointer to our testApp
object which will receive requests of audio data from openFrameworks by calling our testApp::audioOut
function.
Subsequently, 2
is the number of output channels (hence, stereo output), 0
is the number of input channels (hence, no input), and 44100
is a sample rate, that is, the number of audio samples played per second. The value 44100
means CD quality and is good in most situations. The last two parameters 512
and 4
are the size of the buffer for audio samples and the number of buffers respectively. This is discussed later.
void testApp::audioOut( float *output, int bufferSize, int nChannels ){ //... fill output array here }
This is the function that should fill the output
array with the audio samples' data. This function actually generates the sound. Values of output
should lie in the range from -1.0
to 1.0
. In the opposite case, audio clipping will occur (you will hear clicks in sound). The size of output
is equal to bufferSize * nChannels
, and the samples in the channels are interleaved. Namely, if nChannels
is equal to 2
, then this is a stereo signal, so output[0]
and output[1]
mean the first audio samples for the left and the right channels. Correspondingly, output[2]
and output[3]
mean the second audio samples, and so on.
Also, there are a number of functions for managing audio devices. They are as follows:
soundStream.listDevices()
function prints to console the list of devices.soundStream.setDeviceID( id )
function selects a device, where id
has type int
. You should call this before soundStream.setup()
. If no soundStream.setDeviceID( id )
was called, then the default system device is used.soundStream.stop()
function stops calling audioOut()
.soundStream.start()
function starts calling audioOut()
again.soundStream.close()
function ends using audio device by soundStream
object.There are two important things about the sound generating function audioOut()
. Firstly, the function is called by openFrameworks independent of the update()
and draw()
functions' calls. Namely, it is called at the request of the sound card, when the next buffer with audio samples for playing is needed:
Secondly, audioOut()
should work fast. In the opposite case, the sound card did not receive the buffer in time, and you will hear clicks in the output sound. You can tune this by changing the two last parameters in the following line:
soundStream.setup( this, 2, 0, 44100, 512, 4 );
512
is a buffer size. If the buffer is bigger (for example, 1024), then it is rarely requested, so you have more time for filling this, so more robustness. On the contrary, a lower value of the buffer size, for example, 256
, leads to the better responsivity (smaller latency) of audio. The reason is that the delay between buffer filling and its playing through the audio system will be smaller. The last parameter, 4
, is the number of buffers used by the sound card for storing sound. Similarly, increasing the parameter leads to better robustness and decreasing them leads to better audio responsivity.
Now, we will consider an example of sound generation.
Warning
When using ofSoundStream
for sound output in your projects, be careful! Due to possible errors in the projects' code and for other reasons, it can suddenly generate clicks and very loud sounds. To avoid the hazard of damaging your ears, do not listen to the output of such projects using headphones.
Let's build a simple sound generator using Pulse Width Modulation (PWM). In electronics, PWM is a method of sending analog values through wires using just two levels of voltage (logical 1 and 0). The value is coded by changing the length of the pulse with logical value 1, with the overall cycle length fixed. In the following diagram, coding val in range from 0 to 1, with fixed cycle length c is shown. You can see that an output signal is a periodic wave, with the wavelength equal to c, and the wave consists of two segments with values 1 and 0, with lengths val * c and c - val * c respectively:
Such a signal can be considered as a sound wave, with the wave frequency equal to 1.0 / c.
If val is equal to 0.5, then 1 and 0 values have equal length in the wave, and such a waveform is called a square wave.
Let's consider an example of PWM sound generation. The frequency and PWM value of the wave will depend on x and y mouse coordinates, so when you move the mouse, you will hear the sound changing.
This example is based on the emptyExample
project in openFrameworks.
Add the next code to testApp.h
, in the class testApp
declaration. Note that the sound control parameters are userFreq
and userPwm
— a frequency and PWM value. And there are separate variables for these parameters freq
and pwm
which will change relatively slowly. This lets us always obtain a smooth sound, even when the user changes sound parameters fast (that is, moves the mouse rapidly).
//Function for generating audio void audioOut( float *output, int bufferSize, int nChannels ); ofSoundStream soundStream; //Object for sound output setup //User-changing parameters float userFreq; //Frequency float userPwm; //PWM value //Parameters, used during synthesis float freq; //Current frequency float pwm; //Current PWM value float phase; //Phase of the wave //Buffer for rendering last generated audio buffer vector<float> buf;
At the beginning of the testApp.cpp
file, after the #include "testApp.h"
line, add declarations of some constants as follows:
int bufSize = 512; //Sound card buffer size int sampleRate = 44100; //Sound sample rate float volume = 0.1; //Output volume
The setup()
function sets the initial values and starts the sound output:
void testApp::setup(){ userFreq = 100.0; //Some initial frequency userPwm = 0.5; //Some initial PWM value freq = userFreq; pwm = userPwm; phase = 0; buf.resize( bufSize ); //Start the sound output soundStream.setup( this, 2, 0, sampleRate, bufSize, 4 ); }
The update()
function is empty, and the draw()
function draws the buffer with audio sample values on the screen:
void testApp::draw(){ ofBackground( 255, 255, 255 ); //Set the background color //Draw the buffer values ofSetColor( 0, 0, 0 ); for (int i=0; i<bufSize-1; i++) { ofLine( i, 100 - buf[i]*50, (i+1), 100 - buf[i+1]*50 ); } }
Also we need to fill the mouseMoved()
function to change the parameters according to the mouse move. The userFreq
frequency will change in a range from 1
to 2000
Hz, and the PWM value userPwm
will change in a range from 0
to 1
:
void testApp::mouseMoved( int x, int y ){ userFreq = ofMap( x, 0, ofGetWidth(), 1, 2000 ); userPwm = ofMap( y, 0, ofGetHeight(), 0, 1 ); }
Finally, add the audioOut()
function that generates the sound. You can see how we change the freq
and pwm
values with each cycle loop to approach userFreq
and userPwm
smoothly. Also note that phase
is a value in a range from 0
to 1
and it changes in correspondence with freq
and sampleRate
at each audio sample generation.
void testApp::audioOut( float *output, int bufferSize, int nChannels ){ //Fill output buffer, //and also move freq to userFreq and pwm to userPWM slowly for (int i=0; i<bufferSize; i++) { //freq smoothly reaches userFreq freq += ( userFreq - freq ) * 0.001; //pwm smoothly reaches userPwm pwm += ( userPwm - pwm ) * 0.001; //Change phase, and push it into [0, 1] range phase += freq / sampleRate; phase = fmodf( phase, 1.0 ); //Calculate the output audio sample value //Instead of 1 and 0 we use 1 and -1 output values //for the sound wave to be symmetrical along y-axe float v = ( phase < pwm ) ? 1.0 : -1.0; //Set the computed value to the left and the right //channels of output buffer, //also using global volume value defined above output[ i*2 ] = output[ i*2 + 1 ] = v * volume; //Set the value to buffer buf, used for rendering //on the screen //Note: bufferSize can occasionally differ from bufSize if ( i < bufSize ) { buf[ i ] = v; } } }
Run the code and move the mouse left-right and up-down. You will hear a distinctive PWM sound and will see its waves:
Move the mouse and explore the sound when the mouse is in the center of the screen and in the screen borders. Because the x coordinate of the muse sets the frequency and the y coordinate of the mouse sets the PWM value, you will notice that moving the mouse in the middle of the screen gives a fat square sound, and moving the mouse at the very top and bottom of the screen gives glitch-like pulse signals.
If you change the values 0.001
to 0.0001
in lines freq += ( userFreq - freq ) * 0.001;
and pwm += ( userPwm - pwm ) * 0.001;
then freq
and pwm
will slowly move to userFreq
and userPwm
. So while moving the mouse, you will hear a glide effect used in synthesizers. On the contrary, if you set these values to 1.0
, freq
and pwm
will just be equal to userFreq
and userPwm
, and you will hear a raw sound, rapidly changing with the mouse moving.
In some compilers, you need to perform the Rebuild command for your project in order for the audioOut()
function to be linked to the project correctly. If the linking is not correct, you will just see a straight line on the screen and hear nothing. If you see the PWM waves on the screen but do not hear the sound, check your sound equipment and its volume settings.
You can extend the example by adding control to its parameters by using some analysis of live video taken from the camera or 3D-camera data.
We will go further and see an example of transcoding image data into a sound signal directly.
Let's get an image and consider its center horizontal line. This is a one-dimensional array of colors. Now get the brightness of each color in the array. We will obtain an array of numbers, which can be considered as PCM values for some sound, and used for playing in the audioOut()
function.
Certainly, there exist other methods for converting visual data to audio data and back. Moreover, there exist ways to convert audio and video to commands, controlling robot motors, 3D printers, smell printers, and any other digital devices. All such transformations between different information types are called transcoding. Transcoding is possible due to the digital nature of representation of all the information in the computer. For example, number 102 can be simultaneously interpreted as a pixel color component, an audio sample value and an angle for a robot's servo motor. For detailed philosophical considerations on transcoding, see the book The Language of New Media, Lev Manovich, The MIT Press.
Such an algorithm is a transcoding of image to audio data. Let's code it using frames from a camera as input images. For details on using camera data, see Chapter 5, Working with Videos.
This example is based on the emptyExample
project in openFrameworks. Add the following code to testApp.h
in the class testApp
declaration:
//Function for generating audio void audioOut( float *output, int bufferSize, int nChannels ); ofSoundStream soundStream; //Object for sound output setup ofVideoGrabber grabber; //Video grabber
At the beginning of testApp.cpp
, after the #include "testApp.h"
line, add constants and variables:
//Constants const int grabW = 1024; //Width of the camera frame const int grabH = 768; //Height of the camera frame const int sampleRate = 44100; //Sample rate of sound const float duration = 0.25; //Duration of the recorded //sound in seconds const int N = duration * sampleRate; //Size of the PCM buffer const float volume = 0.5; //Output sound volume const int Y0 = grabH * 0.5; //y-position of the scan line //Variables vector<float> arr; //Temporary array of pixels brightness vector<float> buffer; //PCM buffer of sound sample int playPos = 0; //The current position of the buffer playing
The setup()
function sets the buffer arrays' sizes, runs the video grabber, and starts the sound output:
void testApp::setup(){ //Set arrays sizes and fill these by zeros arr.resize( grabW, 0.0 ); buffer.resize( N, 0.0 ); //Start camera grabber.initGrabber( grabW, grabH ); //Start the sound output soundStream.setup( this, 2, 0, sampleRate, 512, 4 ); }
The update()
function reads a frame from the camera and writes the brightness of the central line into the buffer
. It saves the pixel's brightness values into array arr
, which has a size equal to the image width grabW
. Next, arr
is stretching the buffer
array, which has size N
, using linear interpolation.
Also, the values of the buffer are shifted so the mean value of its values will be equal to zero. Such a transformation is the simplest method for DC-offset removal. Methods of DC-offset removal are always used in sound recording for centering recorded signals. This is a crucial procedure in the case of mixing several sounds because it helps to reduce a dynamic range of mixed signals without any changes being heard:
void testApp::update(){ grabber.update(); //Update camera if ( grabber.isFrameNew() ) { //Check for new frame //Get pixels of the camera image ofPixels &pixels = grabber.getPixelsRef(); //Read central line's pixels brightness to arr for (int x=0; x<grabW; x++) { //Get the pixel brightness float v = pixels.getColor( x, Y0 ).getLightness(); //v lies in [0,255], convert it to [-1,1] arr[x] = ofMap( v, 0, 255, -1, 1, true ); } //Stretch arr to buffer, using linear interpolation for (int i=0; i<N; i++) { //Get position in range [0, grabW] float pos = float(i) * grabW / N; //Get left and right indices int pos0 = int( pos ); int pos1 = min( pos0 + 1, N-1 ); //Interpolate buffer[i] = ofMap( pos, pos0, pos1, arr[pos0],arr[pos1] ); } //DC-offset removal //Compute a mean value of buffer float mean = 0; for (int i=0; i<N; i++) { mean += buffer[i]; } mean /= N; //Shift the buffer by mean value for (int i=0; i<N; i++) { buffer[i] -= mean; } } }
The draw()
function draws the camera image, marks the scan line area by a yellow rectangle, and draws the buffer as a graph in the top part of the screen. See the draw()
function code in the example's text.
Finally, the audioOut()
function reads the values from the buffer and pushes them into the output
array. The playing position is held in the playPos
value. When the end of the buffer is reached, the playPos
is set to 0
, so the buffer plays in a loop:
void testApp::audioOut( float *output, int bufferSize, int nChannels ) { for (int i=0; i<bufferSize; i++) { //Push current audio sample value from buffer //into both channels of output. //Also global volume value is used output[ 2*i ] = output[ 2*i + 1 ] = buffer[ playPos ] * volume; //Shift to the next audio sample playPos++; //When the end of buffer is reached, playPos sets to 0 //So we hear looped sound playPos %= N; } }
Run the example and direct the camera somewhere. You will see the camera image with the scan area selected by a yellow rectangle. At the top of the screen, you will see the corresponding graph of sound, and will hear this sound in a loop. Note how bright and dark pixels in the scan line correspond to the high and low graph values. Most likely, the sound you hear will be quite strange. This is because our ears are trained to hear periodic signals but normally, data from a camera image is not periodic.
Now, direct the camera to this stripes image (yes, direct the camera right to this picture in the book, or print it on a paper from the file stripesSin0To880Hz.png
):
If you fit the scan line to the horizontal line of the image, you will hear a sound tone, swiping from a low to a high tone, and see the image as shown in the following screenshot:
Actually, the stripes correspond to a sine wave with the frequency changed from 0 to 800 Hz, with a duration of one-fourth of a second. The corresponding graph of its PCM is shown in the following screenshot:
You can see that the graph of the sound, transcoded from the camera (at the top of the previous screenshot), is noised but nevertheless, is similar to the original graph.
Now move the camera closer to the stripes image. You will notice how the tone of the sound decreases. If you move the camera very close, you will hear a bass sound.
Here is one more stripes image to play with. It codes ar sound (stripesAr.png file
):
We hope that after you finish playing with this example you will understand and feel the nature of a PCM-sound representation in a better way.
Now we will consider how to get sound data from a microphone and other input sound devices.