Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Scroll Documents: Update page title prefix

Use optimal Block Size and Sample Rate

The Block Size and Sample Rate typically depends on the implementation and the platform / application requirements. However, choosing the optimal block size is essential. It not only has influence on CPU load, but also on memory consumption. Block Size also depends on the constraints in the form of Latency requirements, Target platform and the algorithm used.

...

  • In some cases (e.g. SHARC with external Memory) the CPU load might even increase after a certain block size, as more and more wires will be allocated in external memory

image-20240904-224911.pngImage Removedimage-20240904-224911.pngImage Added

IMPORTANT NOTE: Please use the Profiling Tool on the intended target hardware. Profiling done Native mode may not be reliable and provide different results as compared to the target hardware.


Memory

Audio Weaver Heap Assignment is divided into three block i.e. FastA, FastB and Slow Heap. The wires are allocated first after which the modules in the design are allocated. As a general rule wires should be mapped in fast memory.

...

Profiling can contribute a non-trivial load to the overall layout profiling (more modules, more overhead). Disabling profiling API in AWECore helps with the optimization

Use Multichannel Modules

Majority of the modules in Audio Weaver have multichannel processing capability. Instead of using separate modules , same or individual coefficients can be used per channel. There are many advantages to this which includes lesser function call overhead, lower memory for individual wires within the modules. lower profiling overhead. It also decreases the need to use Interleaver and Deinterleaver.

...

Single Vs Multithreaded

In a multithreaded platform, implementing a single threaded design tends to increase the CPU usage. Utilizing the threads available in a target for example Hexagon ADSP has 4 threads. The implemented design (awd) can use up to 3 threads and 1 thread can be used for I/O for optimal CPU utilization. BufferUp/BufferDown as well as Decimating/Interpolating come with costs, so it usually makes sense to use it on “larger” portions of a signal flow, and not only on a few modules.

image-20240904-204733.pngImage Removedimage-20240904-204733.pngImage Added

Set clockDivider on source modules (e.g. DCSource) à clockDivider propagated on all connected module. New threads operate at a lower priority and must have a block size which is a multiple of the system input pin

image-20240906-224913.pngImage Removedimage-20240906-224913.pngImage Added

Use Subsystems / Hierarchy

Instead of having a big design with all the modules in one page of the awd. Different sections of the design can be divided into subsystems. Hierarchy and subsystems help in testing and identifying bugs in a design. Subsystems or Reusable subsystems can be utilized in other designs to avoid reinventing the wheel again and again. It is important to note that using subsystems up to a few layers can be beneficial but a deep hierarchy can be more problematic in turn.

Control Logic

As a best practice the control signals are separated from the processing signal chain. In some case control signal can be completely decoupled using a “Control Bus”. This implementation method reduces the number of Object IDs in the design as well as reduce complexity. The Control Signal should have a decimated block size.

image-20240906-203003.pngImage Removedimage-20240906-203003.pngImage Added

Different control signals can be combined into a multichannel control signal which can be used to control a Set/Get Tables. As shown in figure below, the ParamSet modules are all controlled by DCSource modules, in turn the individual Param Set modules are used to control the volume of individual channels through separate scaler modules. In this use case the optimized version would be all the individual Scaler modules have been combined to a single ScalerN module. This is not only efficient for debugging but also decreases CPU percentage further.

...


...

Using Modules Judiciously

  • A simple example of this step of optimization is while utilizing a LimiterCore module. This the first figure the Absolute Maximum of the input signal is taken and passed through the Limiter Core module. The MaxAbs module by itself in this use case is more expensive and can be replaced by a SubblockStatistics module with the ‘staticticsType : MaxAbs’ instead. Understanding about various modules in Audio Weaver helps in utilizing more efficient modules according to the target.

image-20240904-180237.pngImage Removedimage-20240904-180237.pngImage Addedimage-20240904-180256.pngImage Removed

image-20240904-180256.pngImage Added

  • Another example of optimization can be seen the example below. In the first Figure Clipping_A, clipping is done on individual mono channels using Interleaver and Deinterleaver with it which increases the overhead on the mono channels. This can be easily mitigated by using Multichannel Clipping (Fig Clipping_B) with the same module which drastically decreases the memory usage.

image-20240906-215306.pngImage Removedimage-20240906-215306.pngImage Addedimage-20240906-215327.pngImage Removed

image-20240906-215327.pngImage Added

Profiling number comparison between Clipping_A and Clipping_B for a SHARC target.

...