Introduction
This document describes how Audio Weaver has been integrated on the Qualcomm Snapdragon SOC using low-level APIs. It bypasses Audio Reach and provides significant performance improvements. This document should be used in conjunction with the generic description of the Audio Weaver architecture described in Audio-Weaver--Architecture. The overall design is flexible enough to handle all automotive use cases and configurations. You should be able to fully engineer your automotive audio system using Audio Weaver without having to write custom Hexagon code1.
The platform specific code needed to wrap Audio Weaver is referred to as the Board Support Package (BSP). We will use the term BSP throughout this document to describe this code. The BSP code is based on Qualcomm low-level AudioLite APIs. This is a low-level software layer which provides an RTOS, TDM port I/O, and basic system features.
The document covers integrations on the Gen4 SA8255 and the Gen5 SA8397 / SA8797 “Nordy” chipsets. The integrations are very similar and when they diverge, we will separately document each integration.
Platform Features
Graphical development
Multicore support - distribute audio processing across all Hexagon DSPs and the Arm.
Unified signal flow showing an overall view of all cores and threads
Integrated profiling
Highly optimized including HVX support on Gen5 chipsets
Access to IP
Over 500 Audio Weaver modules
Qualcomm voice IP
Custom module API
3rd party ecosystem
Real-time audio features
TDM serial port configuration via Audio Weaver modules
Independent TDM ports automatically synchronize to within 1 sample
ALSA I/O configuration via Audio Weaver modules
Low latency - as low as 0.5 msec digital-in to digital-out using a 0.25 msec block size
Early audio within 1.5 seconds
Resynchronizes automatically after CPU overrun
Software integration
BSP configurable via a text initialization file
Run-time control via TinyMix APIs
Text based control API with integrated data type, range, and array bounds checking
Subsystem Restart (SSR) feature reboots DSPs and restarts the Audio Service if there is a critical run-time failure
Asynchronous event handling
Integrated full-featured Matlab API for scripting and regression testing
Supports all automotive use cases with concurrent operation
TFLM and ONNX support
Fully documented
Online training available
Comparison with Audio Reach
Audio Reach | Audio Weaver |
Developed for power constrained mobile products. Single use case. | Developed for high-performance automotive audio. Multiple concurrent use cases. |
Variable processing load. | Constant / deterministic real-time load |
Must keep cores loaded < 70% | Can load cores to 90% |
Separate AMS framework needed for low latency support. End-to-end digital latency of 3 x block size. | Native low-latency support. End-to-end digital latency of 2 x block size. |
TDM ports aligned within 12 samples | TDM ports aligned within 1 sample |
SysMon does not provide actionable information to fully load the system. | Easy to understand profiling. Per module, per thread, and per core. Show average and peak CPU load. |
Only supports Hexagon DSPs; no Arm. | Supports all cores including Arm. |
QXDM is a poor fit for real-time debugging. | Includes integrated visual debugging tools and legacy QXDM. |
Numerous side effects. Many features can only be supported by Qualcomm. | Architecture is fully documented and information is publicly available. |
Snapdragon Specific Modules
Audio Weaver includes a set of Snapdragon specific audio modules. These modules provide additional control and capabilities. These modules are found in Audio Weaver in the “Snapdragon” folder of the Module Browser.
We also describe how the Audio Weaver Event module operates in the context of a Snapdragon system. Each module is described in turn.
QXDM Logging
This module works in conjunction with the Qualcomm diagnostics and logging APIs. It allows you to stream real-time data from the Audio Weaver signal flow to the Qualcomm QXDM logging APIs. Once the data has been captured by QXDM, it can be parsed by the Qualcomm “QCAT” tool to generate WAV files.
The module has a single input pin and accepts fract32 data. It supports any number of channels and any block size. The module works on any Hexagon DSP and you can have multiple instances of the module throughout your design.
Module Arguments:
logCode. This is a UINT16 that is entered as a hexadecimal string (e.g., 0x1586). It is used together with logTapID to uniquely identify this module. There are a fixed number of logCode’s available and you select from a droplist:
If you are logging PCM data, then it doesn’t matter which logCode you use; it only matters that you pick the matching one in the QXDM application.
logTapID. This is a UINT16 that is entered as a hexadecimal string (e.g., 0x1234). It is used together with logCode to uniquely identify this module.
format. An integer which specifies the format of the data. Allowable values are: 0=PCM (the default), 1=Bitstream, and 2=Raw. The format flag is used by the Qualcomm tools to interpret the data. In Audio Weaver, you should stay with PCM (the default).
logBehavior. An integer which specifies in which thread the logging occurs. 0=deferred. 1=immediate (the default). If immediate, then the QXDM calls happen from the module’s real-time thread. If deferred, then several blocks are buffered up and then the calls to QXDM happen from a non-real-time thread.
deferredBufferSize. An integer which specifies the size of the circular buffer used for deferred logging. This specifies the number of blocks of data to store in the circular buffer.
transpose. Specifies if the inspector is drawn horizontally (the default) or vertically
logCode and logTapID are used to generate the file names by the QCAT utility.
The module can operate in “Immediate” mode which causes the QXDM functions to be called directly from the module’s real-time thread. Data is logged one block at a time. In “Deferred” mode, the data is added to a circular buffer. When the circular buffer is half full, then the module’s Set() function is called using Audio Weaver’s deferred functionality. The Set() function repeatedly calls the QXDM function to log the data. Note that if the number of blocks in the deferred buffer is large, then you may overwhelm the QXDM subsystem. We recommend keeping the deferredBufferSize between 6 and 12 blocks.
Inspector
The module’s inspector is shown below when there are 6 channels being logged. This is the default horizontal orientation.
The controls are as follows:
Master Enable - A single Boolean which enables or disables the logging. By default, this is TRUE.
Chan N Enable - This is an array of Boolean values, one per channel. This allows you to log a subset of the channels. By default, channelEnable is all ones and all channels will be logged.
Blocks Logged - this is a counter which increments every time a block of data is logged.
Overrun Count - This increments anytime that Audio Weaver is unable to log a block of data. This may be because the QXDM API returns an error, or the buffer of deferred data is full.
Reset Counts - This is a momentary switch which resets Blocks Logged and Overrun Count.
Example
This example streams 2 channels of audio data to QXDM. The first channel is a sine wave and the second is pink noise.
The QXDM module has the default settings and streams to logCode 0x1586 and logTapID 0x0000.
Next, start the Qualcomm “QUTS Status App” and connect to your target device
Then launch QXDM. Go to the Tools menu and select “CFG File Generator”. On the right side of the dialog, search for the logID that you selected. If you do not select this, then the QXDM module will not log packets and you will get overruns.
Click “OK” to dismiss the window. At this point, data will be logged by QXDM. Build and run the system in Designer. The inspector will display the number of Blocks Logged:
The number of blocks logged should match the pump rate of your module. In this case, it updates 1000 times per second.
Return to QXDM to export your data. Right click and select “Copy All Items to File…” This saves an HDF file to disk.
Next, using the QCAT utility to convert the HDF file to WAV data. Start by opening the HDF file (File→Open).
Then on the View menu select “Vocoder Playback”. Select “Replace Dropped Frames”. This option will use the time stamps of the data packets to reconstruct the WAV data. If there are missing frames (because of dropped QXDM packets, or because you disabled recording or certain channels) then zeros will be inserted.
Click “Process” in the upper left corner to generate the WAV files. The QCAT utility generates 3 files per recorded channel: labels, raw data, and WAV data. In this example, it generates the files:
Interpreting the last row:
0x1586 = logCode
0x0 = session ID
0x1 = tapID
0x2 = channel number
Tx = direction. All from modem perspective. Tx = upstream. Rx = to loudspeaker
48 kHz = sample rate
Q31 = numeric format
0 = ???
The file names have the logCode, logTapID, and channel name embedded. Opening the WAV files in Audacity, we see:
The gap around 11 second was because the “masterEnable” inspector button was toggled. The gap in channel 2 at 15 seconds was because the channelEnable” was toggled. If you do not select “Replace Dropped Frames” in QCAT, then the files will be shorter (no zero periods) and discontinuous.
Troubleshooting
Overruns are occurring
Did you remember to enable your logID in QXDM?
You may be overwhelming the QXDM subsystem and need to reduce the amount of data being logged. Reduce the number of channels or the sample rate of the data collected.
The WAV data is discontinuous
It is possible that no overruns are reported and yet the collected WAV data is discontinuous. When data is passed from Audio Weaver to QXDM, there is an indication that the data was successfully received by QXDM (if not, then Overrun Count increments). There is a second step where QXDM sends the data to the PC. If this fails, there is no indication back to Audio Weaver. Audio Weaver delivered the data to QXDM but QXDM had trouble logging it to the PC. As above, you’ll need to reduce the data rate.
Clock Settings
This module is part of R4.0 and will be replaced by the Clock Voting module in R4.1
This module allows you to change the Hexagon DSP’s clock speed and the speed of its DDR memory access. The module has no input or output pins, and can be placed in any Hexagon thread. The module does not do audio processing per se, but its Set function sets the internal Hexagon state. The module can be placed into any Hexagon thread (it doesn’t matter what thread it runs in).
When the Hexagon DSP first boots, it uses the clock speed settings from the awe_config.xml file. Next, when the Audio Weaver signal flow is loaded (the AWB), if it contains a Clock Settings module, then this module will override the initial settings. Finally, this module can be used to adjust the Hexagon clock during run-time. Adjustments can be made using the inspector or with a ParamSet module.
If your design needs a fixed clock speed, then just update the awe_config.xml file. Only use this module if you need to dynamically adjust the clock speed.
Clock Voting
This module allows you to change the Hexagon DSP’s clock speed and the speed of its DDR memory access. The module has no input or output pins, and can be placed in any Hexagon thread. The module does not do audio processing per se, but its Set function sets the internal Hexagon state. The module’s Get function reads back the actual clock settings. This allows you to verify the actual clock settings without using SysMon. The module can be placed into any Hexagon thread (it doesn’t matter what thread it runs in).
When the Hexagon DSP first boots, it uses the clock speed settings from the awe_config.xml file. Next, when the Audio Weaver signal flow is loaded (the AWB), if it contains a Clock Settings module, then this module will override the initial settings. Finally, this module can be used to adjust the Hexagon clock during run-time. Adjustments can be made using the inspector or with a ParamSet module.
If your design needs a fixed clock speed, then just update the awe_config.xml file. Only use this module if you need to dynamically adjust the clock speed.
Shared Memory Mapper
This module is used to map shared memory between a client application running in the HLOS and an audio module running in the Audio Weaver signal flow. This module is not used by any of the supplied Audio Weaver modules, but is provided to allow customers to develop their own custom modules that leverage shared memory. The module appears in Designer as shown below.
The module has a single module argument which allows you to set the sample rate of the module’s output wire.
The module has an internal 3 element array “regionInfo” which is used to exchange memory region information between the HLOS and the audio processing cores. This array is initially all zeros when the system is built. During operation, the client application requests memory from the Qualcomm Audio Service (QAS). The application specifies:
Size in bytes of the allocation
Name of the Shared Memory Mapper module in their signal flow to share the memory width
The QAS performs the following operations:
The QAS allocates the memory with the correct sharing attributes
The 64-bit physical address and size of the region is written to the module’s regionInfo array via TinyMix commands. This is a 3 element array consisting of 32-bit values:
regionInfo[0] → Low 32-bits of the 64-bit physical address
regionInfo[1] → High 32-bits of the 64-bit physical address
regionInfo[2] → Size of the memory region, in bytes
The module’s set function converts the physical address to a virtual address usable by the audio processing core (Hexagon or Arm).
The module then outputs the following 3 values on its output wire:
Wire[0] = Low 32-bits of the virtual address
Wire[1] = High 32-bits of the virtual address
Wire[2] = size of the shared region, in bytes
The Set function waits pumpDelay pump cycles before returning to the QAS.
The QAS then returns the SMMU mapped address to the client and also a handle for deallocating the region later.
After regionInfo is written and successfully converted, the inspector will update:
“Is Initialized” becomes 1
Addr Low shows the low 32-bits of the virtual address
Addr High shows the high 32-bits of the virtual address (On the Hexagon the high 32-bits will always be zero)
Region Size - size of the shared memory region, in bytes.
We expect that custom modules will use this information to read and write the shared memory. If the size of the region is zero, then this indicates that the memory has not been allocated, or that it has subsequently been deallocated.
When the client application is finished with the shared memory, it should stop accessing the memory and call the TBD QAS deallocation function. This function:
Writes zeros to the named Shared Memory Mapper module’s regionInfo array.
The module’s Set function sets the output wire information to 0 and then waits pumpDelay pump cycles.
Next it unmaps the memory region
Then returns to the QAS, and the QAS returns to the client application.
The pump delay is provided to allow any downstream modules to have sufficient time to stop accessing the memory region.
TBD. Instead of basing the delay on the number of pumps, should we have just used a sleep() command? As designed, the mapping and unmapping only works when the real-time audio is pumping.
Remove the pumpDelay. Just let the client application signal all of the modules that are using the mapped memory to stop using it.
Event Module
This module is part of the standard Audio Weaver distribution and is useful for SOC integration like the Snapdragon. The module generates asynchronous events that originate in the Audio Weaver signal flow and need to be routed to a client application running on the HLOS. Client applications subscribe to the events through the TinyMix API. In addition to originating in a module, events can also originate in the BSP itself and be routed to the HLOS.
Typical uses are:
Sending a trigger word notification to the HLOS
Have some data reported recurrently, e.g. VAD or RMS measurements.
The BSP will use the same event propagation mechanism to convey internal information, like:
AWB has been successfully loaded on an instance
AWE core is pumping now (i.e. an SSR is ready)
An audio overrun has happened in the system (real-time performance impacted)
When peak CPU load of DSP exceeds 90%
This module was recently added to Audio Weaver in release D-8.D.2.6. An example of how the Event Module would be used to send a wake word event notification is shown below. The first input pin “Trigger” contains Boolean data and is used to trigger the event. The second input pin specifies the data payload to send back to the HLOS. The payload can contain up to 64 words of data, or 256 bytes. In the example below, the payload for the Sensory wake word is which command was detected.
The system designer can place an arbitrary number of Event modules in the signal flow and they can be placed on any audio processing core.
Module Arguments
The Event module has a configurable eventType argument which is sent at design time. This eventType is a signed 32-bit integer which can be arbitrarily set by the system designer. The eventType is returned to the HLOS as part of the event notification.
There are no predefined event types and it is up to the system designer to define them. For example, you may have 4 wake word engines in the signal flow, one per seat location. Each wake word engine would have a corresponding Event module. The same eventType would be used for all of these wake word events. The payload would differentiate the seat location.
Module Inspector
The module inspector is shown below.
This provides further control of the module
Trigger Behavior - specifies when the event notification occurs. The default is “Deferred” which means that the notification occurs in Audio Weaver’s deferred thread. This causes the notification to occur in Audio Weaver’s non-real-time thread. The other option is “Instance” which causes the notification to occur in the real-time thread.
Trigger Type - specifies how the data on the trigger pin input is used:
Rising Edge - trigger when the input goes from 0 to 1 (or larger)
Falling Edge - trigger when the input goes from 1 (or larger) to 0
High - trigger any time the input is 1 or larger
Low - trigger any time the input equals 0
None - never trigger. This can be used to disable the event generation.
Callback Registered - Boolean readback variable that shows 1 if the Event module properly registered itself with the BSP. If this is 0, then the BSP hasn’t registered the event callbacks.
Successful Trigger Count - counter that increments whenever the BSP’s trigger callback successfully executes. On the Snapdragon, this means that the event structure was successfully added to the event queue in shared memory.
Failed Trigger Count - counter that increments whenever the BSP’s trigger callback returns an error. On the Snapdragon, this means that the event structure could not be added to the event queue in shared memory. It means your event queue is too small and you should adjust its size in your BSP configuration file.
Reset Counts - this is a push button which sets the two trigger counters to zero.
Receiving Events by Client Applications
In order for events to be visible by TinyMix, you have to assign an objectID to each event module. Then you use TinyMix as described in TBD to register and receive event callbacks.
System Variable
The SystemVariable module is a standard part of Audio Weaver. We mention it here because it is very useful for system bring up and debugging. It can also be used for detecting CPU overruns.
The module has an output pin with 1 channel and a block size of 1 (it is a control signal). The output wire will hold a single floating-point value. The module’s inspector allows you to choose which of 12 system variables to output:
This module returns internal information from the AWECore library and the layout (thread) that the module is running in. The following items can be selected:
SampleRate - this is the “Sample rate” field taken from the target information. It corresponds to the sample rate of the hardware input and output pins. In units of Hz.
ProfileClockSpeed - this is the “Profile clock rate” field taken from the target information. It corresponds to the speed of the profiling clock used for module profiling. In units of Hz. On the Hexagon, this is 19.2 MHz; on Linux, this is 10 MHz.
BlockSize - this is the “Basic block size” field taken from the target information. It corresponds to the fundamental block size of the BSP code and usually corresponds to the DMA buffer size. In units of samples.
CoreClockSpeed- this is the “CPU clock rate” field taken from the target information. It corresponds to the overall processor speed. In units of Hz.
AverageTime - this is the number of seconds required to complete the processing of the layout containing this module. This information is averaged at run-time with a first order smoother. In a single threaded system, this information corresponds to the “Average ticks per block” shown on the block-by-block profiling window.
PeakTime - this is the maximum number of seconds that was ever required to complete the processing of the layout containing this module. This information is tracked during real-time processing. In a single threaded system, this information corresponds to the “Peak ticks per block” shown on the block-by-block profiling window.
NumPumps - this is the number of times that this layout was pumped. It should increase linearly and the rate depends upon the clock divider of the hosting layout (thread).
CycleTime - this is the number of seconds between calls to the layout’s pump function. This information is averaged at run-time with a first order smoother. In a single threaded system, this information corresponds to the “Total ticks per block process available” shown on the block-by-block profiling window. For example, if you are processing audio at a 1 msec block size, then this value should be 0.001.
ElapsedTime - this is the total elapsed time in seconds for the layout containing this module. It equals (NumPumps x CycleTime).
PercentCPU - this is the percentage of CPU consumed by this layout. It is in the range of [0 100]. This is calculated per-frame. AverageTime/CycleTime approximates PercentCPU.
InstCycleTime - this is the same as CycleTime but is unsmoothed. It is the number of seconds between calls to this module’s process function. It is the instantaneous information for the previous pump cycle. No averaging occurs.
ResetCount - this is the number of times that the containing layout was reset. Reset occurs when there is an overrun in any audio processing thread throughout the system. This variable will be the same across all layouts. When debugging a system, you only need to watch for resets on a single thread and the ResetCount will reflect system-wide behavior.
You can combine the System Variable and Event modules as shown below to generate event notifications whenever the ResetCount variable changes.
Use Case Examples
We should include a full-featured block diagram which shows how to implement a complete automotive system in Audio Weaver. We could have placeholders for some functions, like RNC and Telephony, but it should be multicore and highlight best practices for designing systems.
Appendix A. System Configuration File
The R4.0 system uses an XML file to configure the Audio Weaver and the underlying BSP. This is converted to a binary file and loaded at system boot. An annotated version of the XML file is shown below.
<config platform="8755">
<awe_process>
<!--Unused. Will be removed in the future-->
<enable type="BOOL">true</enable>
</awe_process>
<params>
<log_level type="UINT32" max_level="2">
<!-- Specifies the log levels for each core. -->
<adsp type="UINT32">0</adsp>
<gpdsp0 type="UINT32">0</gpdsp0>
<gpdsp1 type="UINT32">0</gpdsp1>
<arm type="UINT32">0</arm>
</log_level>
<thread_num>
<!-- Number of child audio processing threads per core-->
<adsp type="UINT8">4</adsp>
<gpdsp0 type="UINT8">10</gpdsp0>
<gpdsp1 type="UINT8">10</gpdsp1>
<arm type="UINT8">4</arm>
</thread_num>
<adsp_thread_priority>
<!-- thread_parent is the priority of the main audio processing interrupt.
This is the highest priority audio thread. -->
<thread_parent type="UINT8">50</thread_parent>
<thread_child>
<!-- Next separate priorities for each child thread.
The number of child threads is specified above.
The child threads should be in decreasing priority from the parent_thread. -->
<thread_0 type="UINT8">51</thread_0>
<thread_1 type="UINT8">52</thread_1>
<thread_2 type="UINT8">53</thread_2>
<thread_3 type="UINT8">54</thread_3>
</thread_child>
</adsp_thread_priority>
<gpdsp0_thread_priority>
<thread_parent type="UINT8">50</thread_parent>
<thread_child>
<thread_0 type="UINT8">51</thread_0>
<thread_1 type="UINT8">52</thread_1>
<thread_2 type="UINT8">53</thread_2>
<thread_3 type="UINT8">54</thread_3>
<thread_4 type="UINT8">55</thread_4>
<thread_5 type="UINT8">56</thread_5>
<thread_6 type="UINT8">57</thread_6>
<thread_7 type="UINT8">58</thread_7>
<thread_8 type="UINT8">59</thread_8>
<thread_9 type="UINT8">60</thread_9>
</thread_child>
</gpdsp0_thread_priority>
<gpdsp1_thread_priority>
<thread_parent type="UINT8">50</thread_parent>
<thread_child>
<thread_0 type="UINT8">51</thread_0>
<thread_1 type="UINT8">52</thread_1>
<thread_2 type="UINT8">53</thread_2>
<thread_3 type="UINT8">54</thread_3>
<thread_4 type="UINT8">55</thread_4>
<thread_5 type="UINT8">56</thread_5>
<thread_6 type="UINT8">57</thread_6>
<thread_7 type="UINT8">58</thread_7>
<thread_8 type="UINT8">59</thread_8>
<thread_9 type="UINT8">60</thread_9>
</thread_child>
</gpdsp1_thread_priority>
<arm_thread_priority>
<thread_parent type="UINT8">63</thread_parent>
<thread_child>
<thread_0 type="UINT8">62</thread_0>
<thread_1 type="UINT8">61</thread_1>
<thread_2 type="UINT8">60</thread_2>
<thread_3 type="UINT8">59</thread_3>
</thread_child>
</arm_thread_priority>
<!-- Number of audio processing cores -->
<num_of_instances type="UINT32">4</num_of_instances>
<adsp>
<!-- Core ID (or instance ID) of the core. -->
<coreID type="UINT8">0</coreID>
<!-- Sizes of the Audio Weaver heaps, in units of 32-bit words. -->
<fastheapA type="UINT32">250000</fastheapA>
<fastheapB type="UINT32">250000</fastheapB>
<slowheap type="UINT32">250000</slowheap>
<cpuclock type="UINT32" unit="KHz">1344000</cpuclock>
</adsp>
<gpdsp0>
<coreID type="UINT8">1</coreID>
<fastheapA type="UINT32">250000</fastheapA>
<fastheapB type="UINT32">250000</fastheapB>
<slowheap type="UINT32">250000</slowheap>
<cpuclock type="UINT32" unit="KHz">1708800</cpuclock>
</gpdsp0>
<gpdsp1>
<coreID type="UINT8">2</coreID>
<fastheapA type="UINT32">250000</fastheapA>
<fastheapB type="UINT32">250000</fastheapB>
<slowheap type="UINT32">250000</slowheap>
<cpuclock type="UINT32" unit="KHz">1708800</cpuclock>
</gpdsp1>
<arm>
<coreID type="UINT8">3</coreID>
<fastheapA type="UINT32">7000000</fastheapA>
<fastheapB type="UINT32">70000</fastheapB>
<slowheap type="UINT32">70000</slowheap>
<cpuclock type="UINT32" unit="KHz">2100000</cpuclock>
</arm>
<!-- Size of the shared memory heap. This is visible to all cores.
In units of 32-bit words -->
<shared_heap_size type="UINT32" unit="UINT32">262000</shared_heap_size>
<!-- These values specify the "targetInfo" which is read back by the Audio Weaver
Server when it connects to the target. This will be removed in the future. -->
<block_size type="UINT16">48</block_size>
<sample_rate type="UINT16">48000</sample_rate>
<in_channel type="UINT16">16</in_channel>
<out_channel type="UINT16">16</out_channel>
</params>
</config>
Appendix B: Performance Benchmarks
This section contains some performance benchmarks for this release. During the measurements, processor clock and DDR speeds were set to maximum.
Interprocessor Communication
This is the measured MHz to transfer 16 channels of 32-bit data at 48 kHz between cores. Transfers were done with 1 msec blocks. This test uses the ChangeThread module and part of the reported MHz is on the sending core and part is on the receiving core.
To ADSP | To GPDSP0 | To GPDSP1 | To Arm | |
From ADSP | 2.5 MHz | 24.5 MHz | 23.5 MHz | 7.8 MHz |
From GPDSP0 | 26.8 MHz | 1.7 MHz | 23.6 MHz | 7.8 MHz |
From GPDSP1 | 26.6 MHz | 24.4 MHz | 1.4 MHz | 7.8 MHz |
From Arm | 25.5 MHz | 24.3 MHz | 23.5 MHz | 2.5 MHz |
When sending data between different cores, it is going through the carveout shared memory. When sending audio between the same core, this means that data is going to another hardware thread on the same core and non-shared is used.
TDM I/O
The limiting factor for serial port I/O is the speed of the LP DMA memory. This is 85 MB/second for reading and 170 MB/second for writing. These tests were with 48 sample block sizes.
TDM Source
16 channels @ 48 kHz. 16-bit samples: 28.3 MHz (theoretical limit of 24 MHz)
16 channels @ 48 kHz. 32-bit samples: 49 MHz
TDM Sink
16 channels @ 48 kHz. 16-bit samples: 14.8 MHz (theoretical limit of 12 MHz)
16 channels @ 48 kHz. 32-bit samples: 25.6 MHz
Input to Output Latency
This was measured by connecting the TDMSource directly to the TDMSink as shown below.
We measured the delay with an oscilloscope and the path included:
A/D → A2B → TDMSource → Copy → TDMSink → A2B → D/A
The latency varied based on the block size as shown below:
Block Size (samples) | Total Latency (msec) | Analog Latency (msec) | Digital Latency (msec) |
12 | 1.6 | 1.1 | 0.5 |
24 | 2.1 | 1.1 | 1.0 |
48 | 3.1 | 1.1 | 2.0 |
In R4.1 provide measures with Synchronous Unaligned Ports
ALSA I/O
This test measures the overhead of streaming audio data between the HLOS and Audio Weaver. 16 channels of data streamed at 48 kHz. 1 msec block size in Audio Weaver and a 10 msec block size at the HLOS.
ALSA Source SRC [R4.1]
HLOS sends data at 44.1 kHz and Audio Weaver converts to 48 kHz.
Core | MHz / 16 chan | MHz / channel |
ADSP | 52.3 | 3.3 |
GPDSP0 | 51.2 | 3.2 |
GPDSP1 | 51.9 | 3.3 |
Arm | 18.7 | 1.2 |
ALSA Sink SRC (R4.1)
Audio Weaver converts 48 kHz to 44.1 kHz for the HLOS.
Core | MHz / 16 chan | MHz / channel |
ADSP | 62.4 | 3.9 |
GPDSP0 | 60.8 | 3.8 |
GPDSP1 | 60.1 | 3.8 |
Arm | 7.2 | 0.45 |
TDM + ALSA Latency
In this test, we measure the latency from analog in to analog out including a round trip through the HLOS using ALSA modules. The path is:
A/D → A2B → TDMSource → ALSA Sink → HLOS → ALSA Source → TDMSink → A2B → D/A
An application was running on the HLOS which would read the ALSA Source and send it to the ALSA Sink. A 48 kHz sample rate was used throughout and Audio Weaver was processing at a 48 sample block size. The HLOS application was using a 240 sample block size. The ALSA settings used in the test were:
ALSA Sink
Buffer size: 960 samples
Block size: 240 samples
startThreshold: 0 samples
stopThreshold: 0 samples
ALSA Source
Buffer size: 960 samples
Block size: 240 samples
startThreshold: 480 samples (prefill to avoid underruns)
stopThreshold: 0 samples
The measured latency was 13.2 msec. The breakdown is:
1 msec analog latency
2 msec TDM digital latency (using 1 msec block size)
10 msec ALSA latency (using 5 msec block size)
We then modified the test to include sample rate conversion between Audio Weaver and the HLOS. The measured latency was now:
ALSA Settings | Measured | |||
Sample Rate | Block Size | Buffer Size | Latency (msec) | |
8000 | 80 | 160 | 18.2 | |
11025 | 120 | 240 | 21.6 | |
12000 | 120 | 240 | 21.8 | |
16000 | 160 | 320 | 15.6 | |
22050 | 220 | 440 | 20.0 | |
24000 | 240 | 480 | 15.4 | |
32000 | 320 | 640 | 17.8 | |
44100 | 480 | 960 | 19.8 | |
48000 | 480 | 960 | 16.2 |
Maximum CPU Loading
In this test, we used the BiquadLoading module to load each thread in the system. We measured how many Biquad stages we could run before we started having CPU overruns. We used a 1 msec block size on the Hexagon DSPs and a 10 msec block size on the Arm. This test measures code and framework efficiency.
ADSP
Thread | BiquadStages | % Loading |
1A | 1600 | 95% |
1B | 1720 | 95% |
1C | 1720 | 95% |
1D | 1720 | 95% |
GPDSP0
Thread | BiquadStages | % Loading |
1A | 2000 | 90% |
1B | 2000 | 90% |
1C | 2000 | 90% |
1D | 2000 | 90% |
1E | 2000 | 90% |
1F | 2000 | 90% |
GPDSP1
Thread | BiquadStages | % Loading |
1A | 2000 | 90% |
1B | 2000 | 90% |
1C | 2000 | 90% |
1D | 2000 | 90% |
1E | 2000 | 90% |
1F | 2000 | 90% |
Arm (only loaded a single thread)
Thread | BiquadStages | % Loading |
10A | 5000 | 50% |
Early Audio KPIs
In release R4.0, we are not able to fully measure this KPI. However, we were able to measure the Audio Weaver contribution to the boot time. We instrumented the code and measured the time that the main() function was reached on the ADSP until the time that real-time audio interrupts started.
For the measurement, we used the file “SA8255_Early_Audio_Parallel_with_Load.awd”. This contains 1320 modules spread across all four cores. It also had a TDM input and TDM output port, and would generate audio on the A2B output. There was a signal generator on each core (sine wave or noise). The signal flow was designed so that you could distinguish the sound of each core and verify that each core was properly running just by listening. The top-level is simple and there were additional subsystems that would load up each of the cores.
All Hexagon DSPs are part of the early audio group while the Arm booted later. The combined AWB file was 335,760 bytes long and this was split into 4 separate AWBs, one per core:
SA8255_Early_Audio_Parallel_with_Load_0.awb [ADSP. 84,644 bytes]
SA8255_Early_Audio_Parallel_with_Load_1.awb [GPDSP0. 83,656 bytes]
SA8255_Early_Audio_Parallel_with_Load_2.awb [GPDSP1. 83,824 bytes]
SA8255_Early_Audio_Parallel_with_Load_3.awb [Arm. 83,680 bytes]
The DSPs log information when they boot. We observed:
00:00:43.617500 [awe_bsp.c 1140] AWE ADSP:awe cfg is ready!
00:00:43.635000 [awe_bsp.c 1417] AWE ADSP:The first time to pump audio
00:00:43.617500 [awe_bsp.c 1140] AWE GPDSP0:awe cfg is ready!
00:00:43.638750 [awe_bsp.c 1417] AWE GPDSP0:The first time to pump audio
00:00:43.617500 [awe_bsp.c 1140] AWE GPDSP1:awe cfg is ready!
00:00:43.645000 [awe_bsp.c 1417] AWE GPDSP1:The first time to pump audio
The time from the ADSP booting until it is ready to generate audio is 17.5 msec
The time from the ADSP booting until GPDSP0 and GPDSP1 are ready to generate audio is 27.5 msec.
A2B Board Setup
The board shown below is the “Rev D” version. There are 4 smaller jumper switches and they need to be set as shown:
1 Unless, of course, you want to write your own custom modules that execute on the Hexagon.
5 This is how varying sample rates are handled in Audio Weaver. The buffers are oversized for the worst case transfer size and side information - in the timing information pins - is used to regulate the flow.
6 The discussion here is based on block times which are easier to follow. The actual configuration is based on block sizes.
4 This is a new feature which was recently implemented in Audio Weaver. If your design has custom modules, then you may need to add a custom resetState function.
2 The objectID of this SourceInt module is specified in the system configuration file.
3 You will also need a Matlab license. We recommend Matlab 2022b.