Smart Remote features a comprehensive audio subsystem that lets you customize your remote to different use cases involving audio input.
Smart Remote provides the capability to perform a variety of audio processing operations on audio data.
Two different memory pools play a crucial role in the audio architecture - the audio buffer pool and the audio frame pool. Both of them involve data circulation and function on a loop basis - buffers and frames are allocated, processed, and then returned to the pool to be reused.
Audio buffer characteristics:
Audio frame characteristics:
m_audio_process()
function, which gets the audio buffer as an argument.This kind of frame and buffer management mechanism allows for compensating the jitter of calling the m_audio_process()
function (related to greater CPU usage), as well as the jitter of data transmission.
There are certain situations in which the audio system might lose data. Each of these situations is clearly signaled as a warning on the console and included in audio statistics.
m_audio_process
function cannot allocate a frame and process the buffer as a consequence. In such case, the buffer is freed without processing and its data is lost. The audio module gathers information about lost frames and buffers. These statistics can be viewed using Audio gauges of Audio CLI commands.
One execution of the m_audio_process()
function corresponds to one buffer being converted into a frame.
Buffers hold data in PCM format which means that each of them takes considerable amounts of memory. However, the scheduling applied in the audio subsystem is optimized in such a way to have fewer buffers, which is possible because of relatively small jitter during audio processing.
Frames require less memory because they hold compressed data. However, more of them are required because jitter and latency are higher during the transmission. Even though buffers and corresponding frames differ in terms of used memory, they carry the same number of samples.
The frame size fully depends on the used codec and its configuration. When you choose and configure a particular codec, the following defines are set automatically:
CONFIG_FRAME_SIZE_SAMPLES
- Determines the number of audio samples a frame can hold.CONFIG_FRAME_SIZE_MS
- Determines the duration of audio samples a frame can hold.CONFIG_FRAME_SIZE_BYTES
- Determines the maximum size of a frame, in bytes.The frame size in bytes and its duration in ms determines its maximum bit rate. A frame can be smaller (containing fewer bytes) and thus bit rate might be reduced.
The configured frame size in ms does not precisely reflect its actual value. The audio codecs available for Smart Remote support the standard sampling frequencies: 8 kHz, 16 kHz, 24 kHz, or 32 kHz. However, because of its characteristics, the nRF52 SoC cannot achieve exactly such sampling frequencies, which results in a discrepancy. For example, with 16000 Hz configured as the sampling frequency, the actual frequency is 16125 Hz, which means that 125 more samples are produced per second. The bit rate is also affected by this mechanism.
The used codec still treats the configured sampling value as the actual one. That is why, when configuring the codec, you must use the idealized sampling values. On the other hand, when analyzing audio using command-line tools, the audio subsystem shows the actual sampling and bit rate values.
The audio subsystem features four codecs that can be used for compressing audio data: ADPCM, BV32FP, Opus CELT/SILK, and SBC/mSBC. The following table presents their available configurations in terms of bit rate and sampling rate :
Codec | Sampling Frequency | Bit rate |
---|---|---|
ADPCM | 8 kHz | 32 kbit/s |
16 kHz | 64 kbit/s | |
24 kHz | 96 kbit/s | |
32 kHz | 128 kbit/s | |
BV32FP | 16 kHz | 32 kbit/s |
Opus CELT | 8 kHz | Configurable: VBR or CVBR/CBR 16-128 kbit/s |
16 kHz | Configurable: VBR or CVBR/CBR 16-128 kbit/s | |
24 kHz | Configurable: VBR or CVBR/CBR 16-128 kbit/s | |
Opus SILK | 8 kHz | Configurable: VBR or CVBR/CBR 16-128 kbit/s |
16 kHz | Configurable: VBR or CVBR/CBR 16-128 kbit/s | |
SBC | 16 kHz | Depends on codec configuration |
32 kHz | Depends on codec configuration | |
mSBC | 16 kHz | 62.5 kbit/s |
32 kHz | 125 kbit/s |
Two different BLE Services can be used for transmission of the compressed audio frames: Voice over HID over GATT (VoHoG) and Voice over BLE for Android TV (ATVV). At least one or both of these must be enabled in order to use the audio feature (see BLE services for details).
The Packet scheduling mechanism applies to both services.
Voice over HID over GATT (VoHoG) uses the standard HID over GATT (HoG) service to transmit compressed audio frames as HID Vendor Reports. Frames are fragmented into one or more chunks which are transmitted one at a time as HID reports. The host receives the chunk(s) and decompresses the audio frame for playback using the chosen codec (see Nordic Voice System for host-side details).
See HID state subsystem for details regarding the HID subsystem, and HID report descriptor for HID descriptor and packet format details.
Android TV Voice (ATVV) Service can be used for audio transmission to the Android TV host.
See Voice over BLE for Android implementation for implementation details. If further information is required, please contact Google.
A highly configurable audio subsystem offers great flexibility. In order to ease its tuning and monitoring, a special infrastructure, called Audio Gauges, was created. When Audio Gauges are enabled (CONFIG_AUDIO_GAUGES_ENABLED is set to 1), detailed statistics are collected during audio transmission. After stopping an audio transfer, they are logged and can be observed on the console:
Audio gauges show audio frame and buffer statistics, bit rate information, and audio subsystem load (also referred to as "CPU usage" or "CPU load") of each audio processing stage, as well as of the audio subsystem as a whole. The CPU usage is calculated as a relation of an audio frame's duration to the time that the CPU needs to process this frame. For example, if a frame holds 20 ms of audio data but processing of it takes 30 ms, the CPU load will be indicated as 150%. If CPU load over 100% persists for a longer period of time, the buffer pool will be depleted and it will no longer be possible to compensate latency using them. Audio data is lost in such case. Short periods of CPU load over 100% are tolerated and can be compensated, as long as there are buffers in the buffer pool.
Statistics gathered by Audio Gauges, in greater detail and augmented by memory usage information, can be also viewed in real time using dedicated inspection commands through the command line interface:
The commands might be also used to dynamically alter the configuration of the audio subsystem. The following commands are supported:
audio driver info
audio driver set gain <gain>
audio driver set gain <L-gain> <R-gain>
audio info
audio codec info
audio codec set complexity <0-10>
audio codec set bitrate auto
audio codec set bitrate <bitrate> [vbr|cbr]