Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 2 Next »

This section contains some performance benchmarks for this release. During the measurements, processor clock and DDR speeds were set to maximum.

Interprocessor Communication

This is the measured MHz to transfer 16 channels of 32-bit data at 48 kHz between cores. Transfers were done with 1 msec blocks. This test uses the ChangeThread module and part of the reported MHz is on the sending core and part is on the receiving core.

To ADSP

To GPDSP0

To GPDSP1

To Arm

From ADSP

2.5 MHz

24.5 MHz

23.5 MHz

7.8 MHz

From GPDSP0

26.8 MHz

1.7 MHz

23.6 MHz

7.8 MHz

From GPDSP1

26.6 MHz

24.4 MHz

1.4 MHz

7.8 MHz

From Arm

25.5 MHz

24.3 MHz

23.5 MHz

2.5 MHz

When sending data between different cores, it is going through the carveout shared memory. When sending audio between the same core, this means that data is going to another hardware thread on the same core and non-shared is used.

TDM I/O

The limiting factor for serial port I/O is the speed of the LP DMA memory. This is 85 MB/second for reading and 170 MB/second for writing. These tests were with 48 sample block sizes.

TDM Source

16 channels @ 48 kHz. 16-bit samples: 28.3 MHz (theoretical limit of 24 MHz)

16 channels @ 48 kHz. 32-bit samples: 49 MHz

TDM Sink

16 channels @ 48 kHz. 16-bit samples: 14.8 MHz (theoretical limit of 12 MHz)

16 channels @ 48 kHz. 32-bit samples: 25.6 MHz

Input to Output Latency

This was measured by connecting the TDMSource directly to the TDMSink as shown below.

We measured the delay with an oscilloscope and the path included:

A/D → A2B → TDMSource → Copy → TDMSink → A2B → D/A

The latency varied based on the block size as shown below:

Block Size (samples)

Total

Latency

(msec)

Analog

Latency

(msec)

Digital

Latency

(msec)

12

1.6

1.1

0.5

24

2.1

1.1

1.0

48

3.1

1.1

2.0

In R4.1 provide measures with Synchronous Unaligned Ports

ALSA I/O

This test measures the overhead of streaming audio data between the HLOS and Audio Weaver. 16 channels of data streamed at 48 kHz. 1 msec block size in Audio Weaver and a 10 msec block size at the HLOS.

ALSA Source SRC [R4.1]

HLOS sends data at 44.1 kHz and Audio Weaver converts to 48 kHz.

Core

MHz / 16 chan

MHz / channel

ADSP

52.3

3.3

GPDSP0

51.2

3.2

GPDSP1

51.9

3.3

Arm

18.7

1.2

ALSA Sink SRC (R4.1)

Audio Weaver converts 48 kHz to 44.1 kHz for the HLOS.

Core

MHz / 16 chan

MHz / channel

ADSP

62.4

3.9

GPDSP0

60.8

3.8

GPDSP1

60.1

3.8

Arm

7.2

0.45

TDM + ALSA Latency

In this test, we measure the latency from analog in to analog out including a round trip through the HLOS using ALSA modules. The path is:

A/D → A2B → TDMSource → ALSA Sink → HLOS → ALSA Source → TDMSink → A2B → D/A

An application was running on the HLOS which would read the ALSA Source and send it to the ALSA Sink. A 48 kHz sample rate was used throughout and Audio Weaver was processing at a 48 sample block size. The HLOS application was using a 240 sample block size. The ALSA settings used in the test were:

ALSA Sink

  • Buffer size: 960 samples

  • Block size: 240 samples

  • startThreshold: 0 samples

  • stopThreshold: 0 samples

ALSA Source

  • Buffer size: 960 samples

  • Block size: 240 samples

  • startThreshold: 480 samples (prefill to avoid underruns)

  • stopThreshold: 0 samples

The measured latency was 13.2 msec. The breakdown is:

1 msec analog latency

2 msec TDM digital latency (using 1 msec block size)

10 msec ALSA latency (using 5 msec block size)

We then modified the test to include sample rate conversion between Audio Weaver and the HLOS. The measured latency was now:

ALSA Settings

Measured

Sample Rate

Block Size

Buffer Size

Latency (msec)

8000

80

160

18.2

11025

120

240

21.6

12000

120

240

21.8

16000

160

320

15.6

22050

220

440

20.0

24000

240

480

15.4

32000

320

640

17.8

44100

480

960

19.8

48000

480

960

16.2

Maximum CPU Loading

In this test, we used the BiquadLoading module to load each thread in the system. We measured how many Biquad stages we could run before we started having CPU overruns. We used a 1 msec block size on the Hexagon DSPs and a 10 msec block size on the Arm. This test measures code and framework efficiency.

ADSP

Thread

BiquadStages

% Loading

1A

1600

95%

1B

1720

95%

1C

1720

95%

1D

1720

95%

GPDSP0

Thread

BiquadStages

% Loading

1A

2000

90%

1B

2000

90%

1C

2000

90%

1D

2000

90%

1E

2000

90%

1F

2000

90%

GPDSP1

Thread

BiquadStages

% Loading

1A

2000

90%

1B

2000

90%

1C

2000

90%

1D

2000

90%

1E

2000

90%

1F

2000

90%

Arm (only loaded a single thread)

Thread

BiquadStages

% Loading

10A

5000

50%

Early Audio KPIs

In release R4.0, we are not able to fully measure this KPI. However, we were able to measure the Audio Weaver contribution to the boot time. We instrumented the code and measured the time that the main() function was reached on the ADSP until the time that real-time audio interrupts started.

For the measurement, we used the file “SA8255_Early_Audio_Parallel_with_Load.awd”. This contains 1320 modules spread across all four cores. It also had a TDM input and TDM output port, and would generate audio on the A2B output. There was a signal generator on each core (sine wave or noise). The signal flow was designed so that you could distinguish the sound of each core and verify that each core was properly running just by listening. The top-level is simple and there were additional subsystems that would load up each of the cores.

All Hexagon DSPs are part of the early audio group while the Arm booted later. The combined AWB file was 335,760 bytes long and this was split into 4 separate AWBs, one per core:

SA8255_Early_Audio_Parallel_with_Load_0.awb [ADSP. 84,644 bytes]

SA8255_Early_Audio_Parallel_with_Load_1.awb [GPDSP0. 83,656 bytes]

SA8255_Early_Audio_Parallel_with_Load_2.awb [GPDSP1. 83,824 bytes]

SA8255_Early_Audio_Parallel_with_Load_3.awb [Arm. 83,680 bytes]

The DSPs log information when they boot. We observed:

00:00:43.617500 [awe_bsp.c 1140] AWE ADSP:awe cfg is ready!

00:00:43.635000 [awe_bsp.c 1417] AWE ADSP:The first time to pump audio

00:00:43.617500 [awe_bsp.c 1140] AWE GPDSP0:awe cfg is ready!

00:00:43.638750 [awe_bsp.c 1417] AWE GPDSP0:The first time to pump audio

00:00:43.617500 [awe_bsp.c 1140] AWE GPDSP1:awe cfg is ready!

00:00:43.645000 [awe_bsp.c 1417] AWE GPDSP1:The first time to pump audio

The time from the ADSP booting until it is ready to generate audio is 17.5 msec

The time from the ADSP booting until GPDSP0 and GPDSP1 are ready to generate audio is 27.5 msec.

1 Unless, of course, you want to write your own custom modules that execute on the Hexagon.

5 This is how varying sample rates are handled in Audio Weaver. The buffers are oversized for the worst case transfer size and side information - in the timing information pins - is used to regulate the flow.

6 The discussion here is based on block times which are easier to follow. The actual configuration is based on block sizes.

4 This is a new feature which was recently implemented in Audio Weaver. If your design has custom modules, then you may need to add a custom resetState function.

2 The objectID of this SourceInt module is specified in the system configuration file.

3 You will also need a Matlab license. We recommend Matlab 2022b.

  • No labels