...
Audio Reach | Audio Weaver |
Developed for power constrained mobile products. Single use case. | Developed for high-performance automotive audio. Multiple concurrent use cases. |
Variable processing load. | Constant / deterministic real-time load |
Must keep cores loaded < 70% | Can load cores to 90% |
Separate AMS framework needed for low latency support. End-to-end digital latency of 3 x block size. | Native low-latency support. End-to-end digital latency of 2 x block size. |
TDM ports aligned within 12 samples | TDM ports aligned within 1 sample |
SysMon does not provide actionable information to fully load the system. | Easy to understand profiling. Per module, per thread, and per core. Show average and peak CPU load. |
Only supports Hexagon DSPs; no Arm. | Supports all cores including Arm. |
QXDM is a poor fit for real-time debugging. | Includes integrated visual debugging tools and legacy QXDM. |
Numerous side effects. Many features can only be supported by Qualcomm. | Architecture is fully documented and information is publicly available. |
Anchor | ||
---|---|---|
|
...
This section gives an overview of Audio Weaver’s real-time architecture. Before diving into this section, you should read the Audio Weaver Architecture document, especially Chapter 3 which focuses on real-time audio.
Traditionally, Audio Weaver relied on a single hardware input pin and single hardware output pin for all I/O in the system. All audio channels needed to be sourced by the BSP code and interleaved on a single input. All I/O had to be at the same block size and sample rate. On the Snapdragon, this has been changed to doing I/O with TDM and ALSA modules instead. This gives greater flexibility and works around many of the earlier limitations. We are in the process of updating Audio Weaver Designer to operate without the traditional “HW Input” and “HW Output” pins. For now, you need to include these and they provide real-time I/O on the PC. Future releases will remove them completely.
Audio Weaver’s real-time architecture is based on fully synchronous operation. Audio processing occurs in fixed block sizes, and block sizes through the system are multiples of the fundamental block size of the system. This provides consistent CPU loading (no spikes), fixed scheduling, and efficient use of computational resources.
Audio Weaver utilizes double buffering at the hardware inputs and outputs. In a properly designed system that operates at a fixed block size, Audio Weaver is able to achieve the theoretical lowest latency of 2 x block size. There is no extraneous buffering in Audio Weaver which negatively affects latency.
...
Audio Weaver real-time processing requires one of the TDM audio devices to generate an interrupt at the Basic Block Size of the system. The Basic Block Size is the smallest interval of audio processing and all other threads (layouts) execute at multiples of the Basic Block Size. For systems using Road Noise Cancellation (RNC), we recommend setting the Basic Block Size to 0.25 or 0.5 msecs (12 or 24 samples at 48 kHz). For systems without RNC, we recommend setting the Basic Block Size to 1 msec (48 samples at 48 kHz). ALSA devices cannot be used to generate this interrupt, only TDM devices can.
One of the serial ports in the system operates at the fundamental block size and is used as the underlying “master clock” for the system. This serial port is called the Synchronous Master and is located on the Primary Audio Controller (PAC). The Synchronous Master generates real-time interrupts to the PAC and the PAC signals the other audio processing cores via IPCC when they have work to do as shown below.
The PAC has the highest interrupt rate and the other audio processing cores can operate at multiples of the fundamental block size. For example, the PAC may operate at 0.5 msec while the GPDSP or Arm could operate at 10 msec.
...
This synchronous operation extends to all audio processing threads distributed across the various cores. Consider a 3 core design that operates at the following block sizes:
Core 0: 0.5 msec, 0.5 msec, 4 msec, 20 msec [PAC]
Core 1: 5 msec, 5 msec, 5 msec, 5 msec [Wakes up every 5 msec based on IPCC]
Core 2: 10 msec, 10 msec, 20 msec, 20 msec [Wakes up every 10 msec based on IPCC]
Processing is fully synchronous and triggered by the 0.5 msec interrupt controller. The system wide pattern of audio processing (“pumping”) is shown below:
...
This architecture imposes some restrictions as to how block sizes can be used. Here are some examples:
An algorithm cannot change its block size on the fly at run-time. (To realize this, you’ll need to operate the two algorithms in parallel and Activate / Inactivate the processing at run-time. You’ll need twice the memory, but the CPU load is constrained.)
Algorithms with block sizes of 240 and 256 samples cannot be directly cascaded. (To accomplish this, you would need a basic block size of 16 samples. You would have to first buffer down from 240 to 16 samples and then buffer up from 16 to 256 samples. This places the modules into layouts with clock dividers to 15 and 16.)
Asynchronous sample rate conversion cannot be supported inside the signal flow. This must be done at the edges in specialized modules (like the TDM and ALSA modules).
...
Audio Weaver operation is based on meeting real-time deadlines. In a complicated SOC that is executing multiple applications simultaneously, it is possible that an audio processing thread may not finish in time. Before Audio Weaver “pumps” a thread, it checks if the last pump has already finished. If not, Audio Weaver declares that an overrun has occurred and starts a resynchronization process. This is processed as detailed in Section 3.2.2 of the Audio Weaver Architecture document.
During resynchronization, Audio Weaver stops pumping audio. It resets ChangeThread and BufferUp/BufferDown modules so that the ping-pong buffers are in phase and aligned. Then it restarts pumping. During resynchronization there will be a momentary dropout in audio.
Audio Weaver starts resynchronization if there is an overrun in any thread. And if an overrun occurs, it stops and restarts all audio threads. It is not possible to resynchronize only the thread in which the overrun occurred.
...
When debugging overruns, the SystemVariable module is very helpful. The module can be configured to output the “ResetCounter”. This is a global variable, and the SystemVariable module can be placed on any core and in any thread. The ResetCounter increments whenever an overrun occurs. Often, you may not hear a pop or click, but you can see the moment that the overrun occurred using this module. The module can also count resets during long unsupervised tests.
You can go a step further and report overruns to the HLOS using the SystemVariable module together with an Event module. A reference implementation is shown below. The Event module must have an objectID assigned to it so that it is visible to TinyMix.
...
This processing can run on any audio processing core and detects overruns throughout the system (even on another core). It can run in a lower priority thread to save MHz. We recommend that it runs every 100 msec.
The SystemVariable module outputs the number of times that there was an overrun in any real-time Audio Weaver thread (in a layout). The Delay and Compare modules are additional logic that cause the Event module to trigger when the reset count changes.
After an overrun has been reported, an application can separately query per-thread profiling information. This is available through the AWECore tuning command “PFID_GetAllProfiling” which returns detailed per-thread profiling information. This call can be used to identify exactly which real-time thread is overflowing. The command is issued to one of the audio processing cores (e.g., ADSP or GPDSP), and it can be used to read information for one thread (layout) or from multiple threads (layouts). For each layout, it returns:
timePerProcessMeasured - number of clock ticks between calls to the layout’s pump function. This is measured in real-time.
timePerProcessCalculated - number of clock ticks between calls to the layout’s pump function. This is computed theoretically based on the profiling clock speed and the pump rate of the thread. For example, if the thread is executing every 1 msec, then the ADSP is using a profiling clock of 19.2 MHz, this will be 19200.
averageCycles - average number of clock ticks needed to complete execution. This is smoothed with a first order smoother.
instantaneousCycles - instantaneous clock ticks needed to complete the last pump of the layout. This changes continuously.
peakCycles - the peak number of clock ticks that have been measured since system startup.
overflowCount - number of times that this layout has not completed in real-time and caused a systemwide reset event.
...
Audio Weaver uses a time base for computing real-time loads. On the Hexagon DSP, we utilize the 19.2 MHz hardware “tick timer”. On Linux, we use the High Precision Event Timer which runs at 10 MHz. Suppose you are running audio processing on the Hexagon ADSP with a block size of 48 samples at a 48 kHz sample rate. The block time is 1 msec and Audio Weaver computes the number of hardware ticks available as 19.2 MHz x 1 msec = 19,200. This is computed based on known quantities and is not measured at run-time. Next, Audio Weaver measures how long it takes for the 1 msec processing thread to compute. Suppose it takes 6,000 hardware ticks. Audio Weaver will then report the load as:
6,000/19,200 = 31.3%
If the ADSP is running at 1.344 GHz, then the reported load will be 421 MHz.
...
In some systems, there is a requirement to stop audio processing and then restart at a later time. When you do this, it is important to clear out internal state variables. Otherwise, when you wake up the system, a burst of audio will be heard due to state variables holding audio samples when the system entered the low power sleep state.
TinyMix allows you to send a “Stop Audio” command to the Audio Weaver processing cores. This command halts all audio DMA and also resets all internal state variables and clears out wire buffers. Issue this command prior to entering the low power state. Later, to “wake up” the audio processing, just issue the “Start Audio” command. DMA will resume and no pops or clicks will be heard4.
Anchor | |||
---|---|---|---|
|
...