Open Sound System
|Do you have problems with sound/audio application development? Don't panic! Click here for help!|
This section will explain how certain typical audio applications can be implemented using the OSS API. The most preferred methods will be explained first.
There are few drawbacks in this method. If the application needs to handle keyboard or mouse input at the same time when playing or recording audio this approach may not be adequate. The workarounds for this problem will be described bit later. However surprisingly this is not always the case. You may want to start by using the simpliest possible approach first and then improve the program later later if it appears to be necessary.
About 80% of all programs using audio belong to this class. They just play a sound without doing anything else at the same time. If the playback needs to be stopped it's usually done by killing the application (for example by hitting control-C).
This kind of applications can always depend on the default values selected by OSS. It's only necessary to select the sampling rate, number of channels and sample format. OSS will take care of the lest. The application just uses the write system call to write the data in one or more chunks. It's even possible to load the whole sound "file" to memory and then use single write to play it all at once (however this is usually not recommended).
A very common programmin error is to use calls like select, SNDCTL_DSP_GETOSPACE, SNDCTL_DSP_GETODELAY and some others to avoid "blocking". This is really an error because the write system call itself is designed to handle syncronization automatically. Unfortunately practically all novice programmers seem to do this mistake for some reason. Maybe you are smart enough to avoid it.
The singen.c program is an example of a very minimal audio playback application. It does generate the audio signal using sine function. However this part of the program can be replaced by some other code that for example reads the samples from a disk file.
Actually recording doesn't differ from playback that much. To convert a playback program to do recording you just need to use read instead of write. The other change is that the device needs to be opened with the
O_RDONLY flag instead of
O_WRONLY. Otherwise all said in the previous section is also valid for recording.
The audiolevel.c program is a simple template for a program that does recording.
Duplex means that the same program both records and plays audio. There are two types of duplex:
Doing half duplex is actually very simple. All you need to do is to concatenate a recording and playback program together. When it's time to switch directions you simply close the device and re-open it for the opposite direction. At this moment there is no sample program for this mode. There are more elegant ways to do this but in most cases there is no need to use them.
Doing full duplex is bit more challenging than half duplex. This mode of operation is described in the Using simultaneous audio recording and playback (full duplex) section of this manual.
The simple programs shown in the previous section are perfectly suitable for most purposes. However they are not necessarily suitable to be used in interactive applications that need to respond to keyboard or mouse events. In general such applications will use select or poll system calls anyway so this is probably the most recommended way to work. Use of select will be described bit later.
There is one way to improve responsiveness of audio programs even without using select or poll. It's simply reading or writing less data at the same time. If you always read or write (say) 10 milliseconds of audio data at the same time it's guaranteed that the program will never block longer than 10 milliseconds. However there is one additional requirement which is that the "fragment size" must be equal or smaller that 10 milliseconds (number of fragments doesn't matter). You can find more info about fragments from the Audio timing considerations section. The time of 10 milliseconds was selected just for fun. Usual applications that may use this approach are games and video players that in general don't need better resonse than 1/fps. For example if the frame rate is 25 fps then the application probably wants to write 1/25=40 milliseconds of audio for each frame.
This surprisingly easy method has one additional killer property. The wait times are syncronized with the sample clock wich is usually based on a high precision crystal (there are unfortunately some exceptions). Thanks to this the application will automatically wake up at the right time to compute the next "frame". This means that applications using this approach don't need to use any other kind of timers to maintain the right frame rate.
TODO Some more sample program will be shown here in the future.
Most audio programmers are talking about latencies all the time. Getting lowest possible latencies seem to be the first design goal of many (usually) open source packages written by brilliant young programmers. However seasoned professionals have learnerd how to compute the required latencies very precisely. The exact requirements depend on the application and they usually need to be found out using some kind of listening tests. However in most cases the latencies don't need to be anything lower than what is required for "lip sync". Of course you can try to obtain 1 ms precision in lip sync but something like 40 ms (25 fps) or 33 ms (30 fps) is more common.
Of course the real veterans don't care about the latency thing at all unless it starts causing some problems.
There are two primary methods for controlling latencies. The usual approach is limiting the amount of data buffered by the sound driver. The easiest way to do this simply asking the driver to use smaller buffer.
TODO Some sample program will be shown here in the future.
Another way is constantly checking how much data there is in the buffer and then avoiding writing to the device until the buffer gets emptied. This second approach is difficult to implement and unreliable. For this reason it cannot be recommended. We don't see it necessary to provide a sample program for this.
Yet different approach is using large audio buffer that in general gives longish latencies (up to seconds). However the listener doesn't notice any problems if the application takes this delay in account when updating it's screen. This method is difficult to implement but it may give superior results in some applications. Use of large buffers makes the application immune to the system load peaks caused by the other concurrently running applications. However still the display/graphics stays exactly in sync with audio.
TODO Some sample program will be shown here in the future.
Decreasing the buffer size will start causing dropouts after the latency drops below certain limit. The reason is not a driver or device malfunction but something very simple. The fact is that if the buffer size is (say) 1 milliseconds then the application must be able to feed new audio data to the device once per every 1 ms period (1000 times per second). This really means once per each 1/1000th of second period and possibly 24 hours every day. It's not enough that the application doesn't write anything during some period and then tries to compensate it by writing twice as much data during the next period.
What makes this to fail after some system dependent limit is that there are other applications and devices in the system. They all require some time to run. This may prevent the audio application from running. The maximum value of this "system latency" defines the smallest audio buffer that can be used in the system. It's possible to improve the situation slightly by using linear priorities or some other operating system dependent tricks.
The latencies during recording behave in entirely different way. The latency doesn't get smaller when the buffer size is decreased. For this reason OSS always allocates the maximum available buffer for recording. Please see the Audio timing considerations section for more info.
Audio applications have traditionally used the mixer to do things like recording source selection or control of playback and recording volumes. This has caused endless problems because some audio devices simply don't have related mixer device. In particular professional devices use fixed input and output volumes and have independent recording devices for each channel.
This kind of ugly hacks are no longer necessary with OSS 4.0 since we have extended the audio API to provide features for this kind of purposes. Please see the Audio input and output volumes and routings section for more info.
Great magicians can fake you to believe that they have cut their partner in small pieces and the brought them back to life using some advanced magic. Everybody knows the magic is not some powerfull treatment but just a method to fake the audience. However it's still very impossible to see what actually happened.
Equally well the great audio masters have this kind of tricks in their hats. It's often possible to avoid using nasty programming techniquest that require days or weeks of work and that may never work reliably. All you need to do is trying to understand what the audience really can hear or see. There is no need to make things to work better if nobody can notice any difference. It may be possible to make some difference between applications by using advanced measurement devices. But after all this kind of results are just plain numbers without any significance. We recommend getting a good book on psychoacoustics if you are interested about things like that.
The pause/continue mechanism used in all sound/media/video players is a good example of this kind of trick. Programmers often ask why OSS doesn't have any ioctl calls for pause and resume. The reason is very simple. You don't need them. Simply close the audio device when the user hits the pause key and re-open it when it's the time to continue (this can be done exactly in the same way than in the beginning of the program.
The trick here is the fact that nobody cares if the playback doesn't start at exactly the right sample after the moment when pause was hit. The pause feature is usually used when telephone or doorbell rings. When playback continues the listener has now chance to remember where exactly it stopped.
The select system call is the traditional method for implementing applications that respond to the events from multiple sources. The
poll system call is another variant of select. Both of them will work with OSS equally well but we have used select in our sample programs because it may be supported in larger number of environments.
There is a simple but powerfull sample program for using select to serve MIDI input at the same time with audio output. The softsynth.c program waits for the MIDI input (for example key press/release messages from a MIDI keyboard or from some application using the MIDI loopback mechanism. It will then play the input notes using an array of sine wave generators.
In addition to simultaneous audio and MIDI it's possible to handle GUI events at the same time using select/poll. Please look at the documentation of your favourite GUI toolkit for more info. The softsynth_gtk.c program demonstrates integrating a GTK+ GUI to the simple software synth application mentioned above. While this program does MIDI too you can use it as an example for doing audio only programs with GUI. Note that unlike most older applications this one doesn't use anything more than the usual 3-4 ioctl calls. They are simply no longer needed with OSS (in fact they may cause just troubles if used incorrectly).