|
|
Author
|
Topic: How does object oriented audio work?
|
|
David Buckley
Jedi Master Film Handler
Posts: 525
From: Oxford, N. Canterbury, New Zealand
Registered: Aug 2004
|
posted 12-02-2015 05:13 PM
The audio "objects", which are essentially the sound elements, are held as an individual wave files all the way to exhibition; there is no audio "mix". Along with many audio elements, there is metadata that describes where each and every audio element should appear in three dimensional space.
The processor in the cinema is programmed to know the speaker layout in the auditorium. Thus it has a map of three dimension space in the auditorium. In real time, each audio element is steered to the right combination of speakers to make it appear in the right place.
Note the metadata can also specify dynamic changes of position, so stuff can move around in the three dimensional space.
So essentially, every presentation of a movie has a mix created in real time in the processor for that exact auditorium. It is unlikely two random auditoriums will ever have the same "mix", unless the auditoriums are identical with identical speaker layouts, but, and this is the whole point of the exercise, each auditorium should have the same sound element placement, within the capabilities of that auditorium speaker layout.
| IP: Logged
|
|
Carsten Kurz
Film God
Posts: 4340
From: Cologne, NRW, Germany
Registered: Aug 2009
|
posted 12-02-2015 05:15 PM
Well - more individual channels and speakers makes all the difference on the exhibition side ;-)
But theres a bit more than just more channels. Overhead speakers, higher angular/spacial resolution, timbre matching, rear LFE, 5 center systems, etc. Not all of these are used in every object based audio system, though, or, it depends on the implementation.
Here is Dolbys Whitepaper on Atmos.
http://www.dolby.com/us/en/professional/cinema/products/dolby-atmos-next-generation-audio-for-cinema-white-paper.pdf
- Carsten
| IP: Logged
|
|
|
Bobby Henderson
"Ask me about Trajan."
Posts: 10973
From: Lawton, OK, USA
Registered: Apr 2001
|
posted 12-02-2015 05:47 PM
I have some crude screen shot examples I threw together in Adobe Audition.
Conventional audio for playback is channel-based, with each channel playing as is and directed to a specific speaker or array of speakers.
Here's a screen shot of a 2-channel stereo LPCM .WAV file:
Here's an example of a 5.1 audio track from a Blu-ray style M2TS file. It looks like the stereo example, but with more channels of audio. For some reason it opened with an 8-channel layout.
Here is a simple example of object oriented audio, a multi-track layout in Audition. Each track can hold different pieces (or objects) of audio along the time line. Each track can have its own effects, volume controls, surround panning controls, etc. Those parameters can be changed along different points in the time line. This example is limited to a 5.1 audio layout. Audition doesn't have any simple options for creating surround mixes greater than 7.1.
Essentially object-based audio for theater playback (like in Atmos or DTS-X) goes from being like the first couple of examples to something more like the third example. The movie's playback mix can contain a bunch of different pieces of audio with each piece of audio containing its own data to control position, spread across one or more speakers, intensity level, etc. This allows the cinema processor, like a Dolby CP-850, to render the audio mix out to the theater's sound system according to however many speakers and amplifiers it may have. An entirely channel-based mix would not provide that flexibility, hence a lot of DCPs containing 5.1 and 7.1 mixes since 7.1 LPCM doesn't fold down to 5.1 in theaters.
| IP: Logged
|
|
|
Harold Hallikainen
Jedi Master Film Handler
Posts: 906
From: Denver, CO, USA
Registered: Aug 2009
|
posted 12-02-2015 08:56 PM
I did a presentation where I explained it like this.
1. In channel-based audio, the very talented mixer sends panning instructions to the mixing system that results in audio driving a 5.1 or 7.1 speaker system. There's a feedback path where the mixer makes adjustments such that it sounds as it should on the 5.1 or 7.1 speaker system. The resulting 6 or 8 channels of audio that are driving the amplifiers are recorded and distributed for playback in auditoriums with similar speaker configurations.
2. In object-based audio, the mixer does the same thing, adjusting panning controls to place individual audio fragments where desired on a larger group of speakers (typically up to 64). Instead of the resulting audio input to each amplifier being recorded, the mono audio fragment is recorded along with the panning instructions (generally the coordinates of the sound, its "extent," coherence, and other attributes. On playback, the rendering system interprets this metadata and executes the panning instructions for the individual auditorium, which will have different speakers in different positions. For example, the dub stage may have a speaker exactly half way down the left wall , but the auditorium does not. The auditorium rendering system will simulate this by driving speakers on each side of the desired position. Actual systems do a combination of channel and objects. The auxiliary data track has audio directed to specific speakers (perhaps 7.1) and then objects (mono audio fragments with associated metadata) that is added to the audio on these speakers plus others. There are lots of resources on panning for object-based audio. A fairly recent one is at https://patents.google.com/patent/US9197979B2/en .
Harold
| IP: Logged
|
|
|
|
|
|
|
Harold Hallikainen
Jedi Master Film Handler
Posts: 906
From: Denver, CO, USA
Registered: Aug 2009
|
posted 12-05-2015 11:41 AM
My understanding of all this is based on reading the SMPTE RDD on the Atmos bit stream and the MDA bit stream. The audio fragment can be sent to individual speakers or group of speakers through "bed channels" or "render exceptions." These are named speakers at specific locations. I've been (unsuccessfully) advocating use of the coordinate system to identify speakers for render exceptions. Instead, they are identified by name, and not all speakers are identified by name.
An object consists of a series of metadata elements that describe how a waveform that is pointed to by the metadata is to be played. The metadata contains such things as position, "extent" (how big), correlation, "snap" (should the object be rendered by the closest speaker instead of a group of speakers), and a bunch of other stuff. It appears to be extremely flexible.
My best description of this is that instead of distributing the audio for each speaker, we are distributing the audio fragments and panning instructions. On playback, these instructions are followed to pan the audio fragments as appropriate in the auditorium based on what speakers are where. If we had, say, 64 speakers that were all in the same location in every auditorium, we could distribute the audio pre-panned, just as we do for 5.1 and 7.1.
Since the speakers do not move between playouts in the same auditorium, it is possible to "render once, play many." This is what the USL CMS does. After ingest, the content is decrypted, rendered to multiple channels (currently up to 16) based on auditorium speaker positions, re-encrypted and saved for playback. At playback time, the customized channel-based audio is played that same as any other channel-based audio. "Prerendering" takes advantage of idle CPU time and allows for infinite complexity in the rendering since it need not be real time. The prerendering is actually considerably faster than real time since the processor is idle between shows.
Harold
| IP: Logged
|
|
|
All times are Central (GMT -6:00)
|
|
Powered by Infopop Corporation
UBB.classicTM
6.3.1.2
The Film-Tech Forums are designed for various members related to the cinema industry to express their opinions, viewpoints and testimonials on various products, services and events based upon speculation, personal knowledge and factual information through use, therefore all views represented here allow no liability upon the publishers of this web site and the owners of said views assume no liability for any ill will resulting from these postings. The posts made here are for educational as well as entertainment purposes and as such anyone viewing this portion of the website must accept these views as statements of the author of that opinion
and agrees to release the authors from any and all liability.
|