Smart Remote 3 nRF52 v1.2
Audio subsystem

Smart Remote features a comprehensive audio subsystem that lets you customize your remote to different use cases involving audio input.

Audio functionality

Smart Remote provides the capability to perform a variety of audio processing operations on audio data.

Audio architecture

Two different memory pools play a crucial role in the audio architecture - the audio buffer pool and the audio frame pool. Both of them involve data circulation and function on a loop basis - buffers and frames are allocated, processed, and then returned to the pool to be reused.

Audio buffer characteristics:

Audio frame characteristics:

audio_buffers_frames.svg
Audio buffer and audio frame pools
  1. The microphone (or microphones) that collects audio input is connected to the pulse density modulation (PDM) interface of the nRF52 SoC.
  2. When audio input appears, the PDM interface starts to fill audio buffers. These buffers are allocated from a memory pool called the audio buffer pool.
  3. When a buffer has been filled with audio data, an event is scheduled to the background scheduler. The background scheduler holds a pointer to the buffer as an argument to the event handler.
  4. Background scheduler runs the m_audio_process() function, which gets the audio buffer as an argument.
  5. The function converts the buffer into an audio frame according to the configured audio specification. The frame is allocated from a memory pool called the audio frame pool. The processed audio buffer is returned to the buffer pool.
  6. The frame transmission is then scheduled to the foreground scheduler.
  7. After data is sent, the frame returns to the audio frame pool and can be reused.

This kind of frame and buffer management mechanism allows for compensating the jitter of calling the m_audio_process() function (related to greater CPU usage), as well as the jitter of data transmission.

There are certain situations in which the audio system might lose data. Each of these situations is clearly signaled as a warning on the console and included in audio statistics.

The audio module gathers information about lost frames and buffers. These statistics can be viewed using Audio gauges of Audio CLI commands.

Buffer and frame sizes

One execution of the m_audio_process() function corresponds to one buffer being converted into a frame.

Buffers hold data in PCM format which means that each of them takes considerable amounts of memory. However, the scheduling applied in the audio subsystem is optimized in such a way to have fewer buffers, which is possible because of relatively small jitter during audio processing.

Frames require less memory because they hold compressed data. However, more of them are required because jitter and latency are higher during the transmission. Even though buffers and corresponding frames differ in terms of used memory, they carry the same number of samples.

The frame size fully depends on the used codec and its configuration. When you choose and configure a particular codec, the following defines are set automatically:

The frame size in bytes and its duration in ms determines its maximum bit rate. A frame can be smaller (containing fewer bytes) and thus bit rate might be reduced.

Configured sampling and actual sampling

The configured frame size in ms does not precisely reflect its actual value. The audio codecs available for Smart Remote support the standard sampling frequencies: 8 kHz, 16 kHz, 24 kHz, or 32 kHz. However, because of its characteristics, the nRF52 SoC cannot achieve exactly such sampling frequencies, which results in a discrepancy. For example, with 16000 Hz configured as the sampling frequency, the actual frequency is 16125 Hz, which means that 125 more samples are produced per second. The bit rate is also affected by this mechanism.

The used codec still treats the configured sampling value as the actual one. That is why, when configuring the codec, you must use the idealized sampling values. On the other hand, when analyzing audio using command-line tools, the audio subsystem shows the actual sampling and bit rate values.

Audio codecs configuration

The audio subsystem features four codecs that can be used for compressing audio data: ADPCM, BV32FP, Opus CELT/SILK, and SBC/mSBC. The following table presents their available configurations in terms of bit rate and sampling rate :

Codec Sampling Frequency Bit rate
ADPCM 8 kHz 32 kbit/s
16 kHz 64 kbit/s
24 kHz 96 kbit/s
32 kHz 128 kbit/s
BV32FP 16 kHz 32 kbit/s
Opus CELT 8 kHz Configurable: VBR or CVBR/CBR 16-128 kbit/s
16 kHz Configurable: VBR or CVBR/CBR 16-128 kbit/s
24 kHz Configurable: VBR or CVBR/CBR 16-128 kbit/s
Opus SILK 8 kHz Configurable: VBR or CVBR/CBR 16-128 kbit/s
16 kHz Configurable: VBR or CVBR/CBR 16-128 kbit/s
SBC 16 kHz Depends on codec configuration
32 kHz Depends on codec configuration
mSBC 16 kHz 62.5 kbit/s
32 kHz 125 kbit/s

Audio transports

Two different BLE Services can be used for transmission of the compressed audio frames: Voice over HID over GATT (VoHoG) and Voice over BLE for Android TV (ATVV). At least one or both of these must be enabled in order to use the audio feature (see BLE services for details).

The Packet scheduling mechanism applies to both services.

Voice over HID over GATT

Voice over HID over GATT (VoHoG) uses the standard HID over GATT (HoG) service to transmit compressed audio frames as HID Vendor Reports. Frames are fragmented into one or more chunks which are transmitted one at a time as HID reports. The host receives the chunk(s) and decompresses the audio frame for playback using the chosen codec (see Nordic Voice System for host-side details).

See HID state subsystem for details regarding the HID subsystem, and HID report descriptor for HID descriptor and packet format details.

Android TV Voice Service

Android TV Voice (ATVV) Service can be used for audio transmission to the Android TV host.

See Voice over BLE for Android implementation for implementation details. If further information is required, please contact Google.

Note
ATVV Service has restrictions regarding audio codec choice and microphone sampling rate. These restrictions also apply to VoHoG when both Services are enabled.
Warning
Consider this transport as an experimental feature, which can change at any time. Currently, v0.4 is supported.

Audio gauges

A highly configurable audio subsystem offers great flexibility. In order to ease its tuning and monitoring, a special infrastructure, called Audio Gauges, was created. When Audio Gauges are enabled (CONFIG_AUDIO_GAUGES_ENABLED is set to 1), detailed statistics are collected during audio transmission. After stopping an audio transfer, they are logged and can be observed on the console:

<info> m_audio: Enabled
<info> drv_audio_codec: OPUS/CELT Codec initialized (mode: VBR, complexity: 0, frame: 20 ms).
<info> m_audio: Disabled
<info> m_audio_gauges: Buffers processed: 222, lost: 0, discarded: 0 (loss ratio: 0%, discard ratio: 0%)
<info> m_audio_gauges: Frames processed: 219, lost: 0, discarded: 1 (loss ratio: 0%, discard ratio: 0%)
<info> m_audio_gauges: Bitrate (min/avg/max): 14/16/21 kbit/s
<info> m_audio_gauges: CPU usage (min/avg/max): 54%/57%/64%
<info> m_audio_gauges: - ANR CPU usage (min/avg/max): 23%/26%/31%
<info> m_audio_gauges: - Codec CPU usage (min/avg/max): 26%/29%/32%

Audio gauges show audio frame and buffer statistics, bit rate information, and audio subsystem load (also referred to as "CPU usage" or "CPU load") of each audio processing stage, as well as of the audio subsystem as a whole. The CPU usage is calculated as a relation of an audio frame's duration to the time that the CPU needs to process this frame. For example, if a frame holds 20 ms of audio data but processing of it takes 30 ms, the CPU load will be indicated as 150%. If CPU load over 100% persists for a longer period of time, the buffer pool will be depleted and it will no longer be possible to compensate latency using them. Audio data is lost in such case. Short periods of CPU load over 100% are tolerated and can be compensated, as long as there are buffers in the buffer pool.

Audio CLI commands

Statistics gathered by Audio Gauges, in greater detail and augmented by memory usage information, can be also viewed in real time using dedicated inspection commands through the command line interface:

SR3-RTT> audio info
Configuration:
Sampling Frequency: 16125 Hz
Frame Length: 19.84 ms (320 samples, up to 102 bytes/frame = 41 kbit/s)
Status: Enabled
Capture time: 3:07
Frame loss ratio: 0% (0 out of 9452 frames)
Bit rate: 18 kbit/s (min/avg/max: 15/18/26 kbit/s)
CPU Usage: 58% (min/avg/max: 52%/58%/66%)
- ANR: 27% (min/avg/max: 23%/27%/33%)
- Codec: 29% (min/avg/max: 26%/29%/39%)
Buffer Pool Usage: 50% (2 out of 4 buffers)
- Maximum: 75% (3 out of 4 buffers)
Frame Pool Usage: 0% (0 out of 6 frames)
- Maximum: 16% (1 out of 6 frames)

The commands might be also used to dynamically alter the configuration of the audio subsystem. The following commands are supported:


Documentation feedback | Developer Zone | Subscribe | Updated