Final Report Proposal: Design and Implementation of Audio Buffers for Wireless Audio Transmission with Adaptive Sampling Rates

ziiln liu, 2026/2/15

Overview

Last quarter in CSE118, I designed and implemented the SyncSpeak project where I used an ESP32 microcontroller to record the audio stream in realtime and transmit the audio data to PC (server) over Wi-Fi for further processing. At that time my team encountered issues to send the audio stream in realtime due to the instablity of the Wi-Fi connection, so we decided to send the chunks for every 5 seconds. This leads to a large latency and makes the system not suitable for real-time applications. In this class, I want to redesign the audio buffer to support adaptive sampling rates and implement a more robust transmission protocol to reduce the latency and improve the stability of the system via websocket connection.

Introduction

Instead of a Brute force 5 second wait time interval, I want to send the a fixed size of audio data (\(\[0,\text{chunk\_size}\]\)) to the server every fixed time interval (\(T_\text{period}\)). In case of WiFi instability, the system can adaptively adjust the sampling rate based on the previous buffer filling rate. For example, if the buffer is filling up too fast, we can reduce the sampling rate to prevent overflow. On the other hand, if the buffer is filling up too slow, we can increase the sampling rate to improve the audio quality.

Baseline/Ideal Implementation

In the ideal case, the fixed time interval \(T_{\text{period}} = 50\text{ms}\), and the sampling rate is \(16\text{kHz}\) (\(f_s = 16\text{kHz}\)). The chunk size is \(\text{chunk\_size} = f_s \times T_{\text{period}} \times 2 = 1600\) bytes (the buffer filling rate is \(r_{\text{in}} = 16\text{kHz} \times 2 = 32000\) bytes/s). In the ideal case, the system will not adjust the sampling rate and \(T_{\text{complete}} < T_{\text{period}}\), where \(T_{\text{complete}}\) is the time to fill the buffer and transmit data to the server.

in this case, we have the following metrics:

\[x = [b, \tau]\]

flow set: \(C = [0, B_\text{max}] \times [0, T_\text{period}]\)

where \(\text{B_\text{max}} = \text{chunk\_size} = 1600\) bytes, and \(T_\text{period} = 50ms\).

and the flow map is

\[ \dot{b} = r_\text{in} = 32000 \text{ bytes/s} \\ \dot{\tau} = 1 \]

Jump set: \(D = [0, B_\text{max}] \times [0, T_\text{period}]\)

and the jump map is

\[ b^+ = \max(0, b - \text{chunk\_size}) \\ \tau^+ = 0 \]

Adaptive sampling Implementation

when the network is unstable, it is possible that the network transmission is not completed within the \(T_\text{period}\), causing \(T_\text{complete} > T_\text{period}\). In this case, the buffer fill percentage will increase since the buffer is filling up faster than it is being transmitted. Therefore, we can reversely adjust the sampling rate based on the buffer fill percentage.

define the \(x = [b, \tau, \rho]\)

where \(\rho\) is the sampling rate adjustment factor, which is a function of the buffer fill percentage. TODO: define the function \(\rho = f(b)\).

the flow set is \(C = [0, B_\text{max}] \times [0, T_\text{period}] \times [\rho_\text{min}, \rho_\text{max}]\) where \(\rho_\text{min}\) and \(\rho_\text{max}\) are the minimum and maximum sampling rate adjustment factors, respectively.

the flow map is

\[ \dot{b} = r_\text{in} * \rho \\ \dot{\tau} = 1 \\ \dot{\rho} = 0 \]

the jump set is \(D = [0, B_\text{max}] \times \{T_\text{period}\} \times [\rho_\text{min}, \rho_\text{max}]\)

and the jump map is

\[b^+ = \max(0, b - \text{chunk\_size}) \\ \tau^+ = 0 \\ \rho^+ = f(b)\]