MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ranging from simulation engineering up to agent development, training and deployment.

Spotlight Features

  • Design and visualize your policy and value networks with the Perception Module. It is based on PyTorch and provides a large variety of neural network building blocks and model styles. Quickly compose powerful representation learners from building blocks such as: dense, convolution, graph convolution and attention, recurrent architectures, action- and observation masking, self-attention etc.
  • Create the conditions for efficient RL training without writing boiler plate code, e.g. by supporting best practices like pre-processing and normalizing your observations.
  • Maze supports advanced environment structures reflecting the requirements of real-world industrial decision problems such as multi-step and multi-agent scenarios. You can of course work with existing Gym-compatible environments.
  • Use the provided Maze trainers (A2C, PPO, Impala, SAC, Evolution Strategies), which are supporting dictionary action and observation spaces as well as multi-step (auto-regressive policies) training. Or stick to your favorite tools and trainers by combining Maze with other RL frameworks.
  • Out of the box support for advanced training workflows such as imitation learning from teacher policies and policy fine-tuning.
  • Keep even complex application and experiment configuration manageable with the Hydra Config System.
Maze on GitHub

Applied Reinforcement Learning

Applied Reinforcement Learning at the example of stock replenishment for an industrial group.

Our CTO provides a walkthrough on how to apply reinforcement learning on stock replenishment and shows the significant cost savings potential.

Reinforcement learning is often described as the next big step for AI with the ability to surpass human capabilities in optimization & decision making. However, when it comes to practical applications, up until recently there have been very few use cases made public.