Mvfst-rl is a platform for the training and deployment of reinforcement learning (RL) policies for more effective network congestion control that can adapt proactively to changing traffic patterns. Mvfst-rl uses PyTorch for RL training and is built on top of mvfst, our open source implementation of the Internet Engineering Task Force’s QUIC transport protocol. Mvfst-rl implements congestion control with an asynchronous RL agent, making the training environment more realistic for real-world deployment. Its tight integration with mvfst would enable seamless and efficient transfer of trained RL policies from research to deployment.
Existing RL environments for congestion control research are not compatible with real-world use cases because they use RL interfaces with agents that block the network sender. This is largely an artifact of building on top of frameworks designed for using RL in research with games, where resource constraints do not pose the same challenges that they do in large-scale production environments, and even a delay of just a few milliseconds negatively affects performance.
Mvfst-rl takes a leap forward with an asynchronous RL agent for congestion control that is capable of handling delayed actions. The system accumulates network statistics and sends a state update asynchronously to an RL agent in a separate thread. Once the agent performs the policy lookup, the update is then applied. This allows the network environment to take congestion control actions based on the RL policy without introducing delays.
Industry estimates show that more than 150 exabytes of data per month were sent over the internet in 2018 and this expected to nearly double by 2021. Effective network congestion control strategies are key to keeping the internet operational at this massive scale. For decades, these systems have been dominated by handcrafted heuristics that can react to dynamic traffic patterns but are not able to learn new ways to anticipate them. RL-based systems hold the promise of taking proactive steps to reduce network congestion and adapt to varying network scenarios, and yet to our knowledge none has been transferred to a real-world production system. Our initial results with mvfst-rl show promise when applied to congestion control, and we hope to test our trained network control agents in production in the future.
Moreover, deployment of reinforcement learning in large-scale real-time systems presents challenges that are not seen in highly controlled and predictable RL environments popular in the research community. We hope mvfst-rl offers a new platform and a novel set of challenges for advancing RL research and developing new ways to use RL in real-world use cases.
More details are available in our paper.