Notice: Pre-Release Documentation
This document is part of a prerelease and is currently a work in progress. Some content may be incomplete, subject to change, or marked as TBD. We are actively updating this documentation and will continue to provide the most accurate and up-to-date information as development progresses. Thank you for your understanding!
This section contains some performance benchmarks for this release. During the measurements, processor clock and DDR speeds were set to maximum.
Interprocessor Communication
This is the measured MHz to transfer 16 channels of 32-bit data at 48 kHz between cores. Transfers were done with 1 msec blocks. This test uses the ChangeThread module and part of the reported MHz is on the sending core and part is on the receiving core.
To ADSP | To GPDSP0 | To GPDSP1 | To Arm | |
From ADSP | 2.5 MHz | 24.5 MHz | 23.5 MHz | 7.8 MHz |
From GPDSP0 | 26.8 MHz | 1.7 MHz | 23.6 MHz | 7.8 MHz |
From GPDSP1 | 26.6 MHz | 24.4 MHz | 1.4 MHz | 7.8 MHz |
From Arm | 25.5 MHz | 24.3 MHz | 23.5 MHz | 2.5 MHz |
When sending data between different cores, it is going through the carveout shared memory. When sending audio between the same core, this means that data is going to another hardware thread on the same core and non-shared is used.
TDM I/O
The limiting factor for serial port I/O is the speed of the LP DMA memory. This is 85 MB/second for reading and 170 MB/second for writing. These tests were with 48 sample block sizes.
TDM Source
16 channels @ 48 kHz. 16-bit samples: 28.3 MHz (theoretical limit of 24 MHz)
16 channels @ 48 kHz. 32-bit samples: 49 MHz
TDM Sink
16 channels @ 48 kHz. 16-bit samples: 14.8 MHz (theoretical limit of 12 MHz)
16 channels @ 48 kHz. 32-bit samples: 25.6 MHz
Input to Output Latency
This was measured by connecting the TDMSource directly to the TDMSink as shown below.
We measured the delay with an oscilloscope and the path included:
A/D → A2B → TDMSource → Copy → TDMSink → A2B → D/A
The latency varied based on the block size as shown below:
Block Size (samples) | Total Latency (msec) | Analog Latency (msec) | Digital Latency (msec) |
12 | 1.6 | 1.1 | 0.5 |
24 | 2.1 | 1.1 | 1.0 |
48 | 3.1 | 1.1 | 2.0 |
In R4.1 provide measures with Synchronous Unaligned Ports
ALSA I/O
This test measures the overhead of streaming audio data between the HLOS and Audio Weaver. 16 channels of data streamed at 48 kHz. 1 msec block size in Audio Weaver and a 10 msec block size at the HLOS.
ALSA Source SRC [R4.1]
HLOS sends data at 44.1 kHz and Audio Weaver converts to 48 kHz.
Core | MHz / 16 chan | MHz / channel |
ADSP | 52.3 | 3.3 |
GPDSP0 | 51.2 | 3.2 |
GPDSP1 | 51.9 | 3.3 |
Arm | 18.7 | 1.2 |
ALSA Sink SRC (R4.1)
Audio Weaver converts 48 kHz to 44.1 kHz for the HLOS.
Core | MHz / 16 chan | MHz / channel |
ADSP | 62.4 | 3.9 |
GPDSP0 | 60.8 | 3.8 |
GPDSP1 | 60.1 | 3.8 |
Arm | 7.2 | 0.45 |
TDM + ALSA Latency
In this test, we measure the latency from analog in to analog out including a round trip through the HLOS using ALSA modules. The path is:
A/D → A2B → TDMSource → ALSA Sink → HLOS → ALSA Source → TDMSink → A2B → D/A
An application was running on the HLOS which would read the ALSA Source and send it to the ALSA Sink. A 48 kHz sample rate was used throughout and Audio Weaver was processing at a 48 sample block size. The HLOS application was using a 240 sample block size. The ALSA settings used in the test were:
ALSA Sink
Buffer size: 960 samples
Block size: 240 samples
startThreshold: 0 samples
stopThreshold: 0 samples
ALSA Source
Buffer size: 960 samples
Block size: 240 samples
startThreshold: 480 samples (prefill to avoid underruns)
stopThreshold: 0 samples
The measured latency was 13.2 msec. The breakdown is:
1 msec analog latency
2 msec TDM digital latency (using 1 msec block size)
10 msec ALSA latency (using 5 msec block size)
We then modified the test to include sample rate conversion between Audio Weaver and the HLOS. The measured latency was now:
ALSA Settings | Measured | |||
Sample Rate | Block Size | Buffer Size | Latency (msec) | |
8000 | 80 | 160 | 18.2 | |
11025 | 120 | 240 | 21.6 | |
12000 | 120 | 240 | 21.8 | |
16000 | 160 | 320 | 15.6 | |
22050 | 220 | 440 | 20.0 | |
24000 | 240 | 480 | 15.4 | |
32000 | 320 | 640 | 17.8 | |
44100 | 480 | 960 | 19.8 | |
48000 | 480 | 960 | 16.2 |
Maximum CPU Loading
In this test, we used the BiquadLoading module to load each thread in the system. We measured how many Biquad stages we could run before we started having CPU overruns. We used a 1 msec block size on the Hexagon DSPs and a 10 msec block size on the Arm. This test measures code and framework efficiency.
ADSP
Thread | BiquadStages | % Loading |
1A | 1600 | 95% |
1B | 1720 | 95% |
1C | 1720 | 95% |
1D | 1720 | 95% |
GPDSP0
Thread | BiquadStages | % Loading |
1A | 2000 | 90% |
1B | 2000 | 90% |
1C | 2000 | 90% |
1D | 2000 | 90% |
1E | 2000 | 90% |
1F | 2000 | 90% |
GPDSP1
Thread | BiquadStages | % Loading |
1A | 2000 | 90% |
1B | 2000 | 90% |
1C | 2000 | 90% |
1D | 2000 | 90% |
1E | 2000 | 90% |
1F | 2000 | 90% |
Arm (only loaded a single thread)
Thread | BiquadStages | % Loading |
10A | 5000 | 50% |
Early Audio KPIs
In release R4.0, we are not able to fully measure this KPI. However, we were able to measure the Audio Weaver contribution to the boot time. We instrumented the code and measured the time that the main() function was reached on the ADSP until the time that real-time audio interrupts started.
For the measurement, we used the file “SA8255_Early_Audio_Parallel_with_Load.awd”. This contains 1320 modules spread across all four cores. It also had a TDM input and TDM output port, and would generate audio on the A2B output. There was a signal generator on each core (sine wave or noise). The signal flow was designed so that you could distinguish the sound of each core and verify that each core was properly running just by listening. The top-level is simple and there were additional subsystems that would load up each of the cores.
All Hexagon DSPs are part of the early audio group while the Arm booted later. The combined AWB file was 335,760 bytes long and this was split into 4 separate AWBs, one per core:
SA8255_Early_Audio_Parallel_with_Load_0.awb [ADSP. 84,644 bytes]
SA8255_Early_Audio_Parallel_with_Load_1.awb [GPDSP0. 83,656 bytes]
SA8255_Early_Audio_Parallel_with_Load_2.awb [GPDSP1. 83,824 bytes]
SA8255_Early_Audio_Parallel_with_Load_3.awb [Arm. 83,680 bytes]
The DSPs log information when they boot. We observed:
00:00:43.617500 [awe_bsp.c 1140] AWE ADSP:awe cfg is ready!
00:00:43.635000 [awe_bsp.c 1417] AWE ADSP:The first time to pump audio
00:00:43.617500 [awe_bsp.c 1140] AWE GPDSP0:awe cfg is ready!
00:00:43.638750 [awe_bsp.c 1417] AWE GPDSP0:The first time to pump audio
00:00:43.617500 [awe_bsp.c 1140] AWE GPDSP1:awe cfg is ready!
00:00:43.645000 [awe_bsp.c 1417] AWE GPDSP1:The first time to pump audio
The time from the ADSP booting until it is ready to generate audio is 17.5 msec
The time from the ADSP booting until GPDSP0 and GPDSP1 are ready to generate audio is 27.5 msec.