On September 1, 2022, WAIC 2022 opened in Shangha, where Shanghai Artificial Intelligence Laboratory released the “OpenXLab” open-source AI system.
As one of the important projects, the OpenMMLab 2.0 vision algorithm system was also officially unveiled, showing the new architecture, algorithm, and ecology.
General Introduction
Back in 2018, open source was still not a widespread concept among researchers, thus hindering third-party reproduction of research results, let alone a fair comparison of different models.
At that time, the PyTorch ecosystem was also far from adequate to compete with TensorFlow. Then, the situation changed after the first release of an OpenMMLab project.
The emergence of MMDetection, a detection computer vision library with modular design and rich algorithm support, brought a new breath of air to this research area.
Since then, more and more standardized algorithmic toolkits have emerged, and open source has gradually become a matter of course for researchers.
From 2018 to 2021, several more computer vision libraries covering broad research areas and algorithms were released as a part of OpenMMLab 1.0 project.
The abundance of frameworks does not interfere with OpenMMLab’s long-term pursuit of code quality and ease of use, which further won good reputations from users for OpenMMlab.
Now, developers from more than a hundred different nations and regions around the globe have already started using OpenMMLab.
After a year of intensive research and development, OpenMMLab 2.0 is officially unveiled. We released the next-generation training architecture, MMEngine, which flexibly supports more than 20 computer vision tasks in each algorithm library and rich training processes such as semi-supervised and self-supervised learning with a unified execution engine.
On top of that, OpenMMLab 2.0 also adds six new computer vision libraries, MMRotate, MMFewshot, MMFlow, MMHuman3D, MMSelfSup, MMRazor, and MMDeploy model deployment framework, realizing a seamless transition from model training, deployment to inference, and bridging the last mile of AI implementation.
New Architecture
Based on MMEngine, OpenMMLab 2.0 has a new core architecture with three main features: Generic, Unified and Flexible.
Generic
More than 20 algorithmic tasks in OpenMMLab 2.0 are based on a robust and generic trainer.
Compared with its counterpart in OpenMMLab 1.0, the new trainer builds data, model, and evaluation components in a more unified way for external libraries to use, supports distributed and non-distributed training with different chips (CPU, GPU, Apple M1, MLU, etc.) in a more scalable way, and supports some of the latest large-scale model training techniques, such as FullyShardedDataParallel.
With the latest MMEngine, various downstream libraries can use this trainer directly, enjoying the simplicity of the code, the latest training techniques and cutting-edge chip support.
The universal trainer also supports algorithm libraries outside the OpenMMLab system for separate use, allowing users to train different tasks with a concise implementation.
For example, training ImageNet with this trainer needs only 80 lines of code, whereas more than 400 lines are a must on PyTorch; it can even train CLIP with only 100 lines instead of thousands of lines of code in OpenCLIP. Moreover, this trainer is easily compatible with TIMM, TorchVision and Detectron2, and other popular algorithm libraries with models.
Unified
The algorithm libraries in OpenMMLab 1.0 support more than 30 general research directions, such as perception, generation, and pre-training.
There are also various algorithms and training paradigms in each research direction, resulting in subtle differences in the interfaces, and consequentially making the development cost of supporting new chips and training techniques proportional to the number of algorithm libraries.
In OpenMMLab 2.0, we have unified the training process of different algorithms into abstracts such as Data Sample, Data Transform, Models, evaluator, and Visualizer. We have designed these interfaces in a unified way, implementing several base classes in MMEngine and MMCV to define their interfaces.
In order to keep the architecture open, all the above key components are managed by their corresponding registries in MMEngine, so that extension modules conforming to the interface can be used through configuration files as long as they are properly registered to the Registry.
Each algorithm library in OpenMMLab 2.0 inherits these registries and implements the related components based on MMEngine’s base classes and interface conventions. Now all of them are unified in terms of training/testing/visualization process, data interface, and data flow. Registry in OpenMMLab 2.0 also enhances the code reusability when, for example, a module conforming to the interface protocol registers to registry. It can be then used by not only the library it’s in but other OpenMMLab projects without the need of duplicate implementation.
Flexibility
OpenMMLab 2.0 introduces a more fine-grained modular design, enabling higher customizability of the entire training process. For example, we add more abstract modules, customizable hooks, and MessageHub, an inter-module information exchange mechanism, in the trainer.
These designs allow users to customize the training pipeline like building LEGOs, by freely “plug-and-play” different modules. For example:
- Dynamic adjustment of training strategies based on the number of iterations, loss, and validation performance, allowing for early stop, ReduceLROnPlateau, and other scheduling optimization techniques.
- Arbitrary forms of model weight averaging, such as Exponential Momentum Average (EMA) and Stochastic Weight Averaging (SWA)
- Flexible visualization and logging control for any data and any node during training
- Per-parameter Optimizer Configuration
- Flexible control of mixed precision training
In addition to generality, consistency, and flexibility, the new architecture also highlights the design and optimization of each module. Specifically, designing and coding share the same development time, where all the simplicity, extensibility, and efficiency are considered.
Regarding code readability and documentation, we set a higher standard for MMEngine, and it would be our pleasure if you could learn something from reading the source code (feel free to star when copying), and we welcome you to try MMEngine.
New Algorithms
In OpenMMLab 2.0, we have also released:
Six new algorithm libraries
MMRotate extends object detection from horizontal bounding boxes to rotated bounding boxes, laying the foundation for applications in scene text detection, object detection in satellite images, and autonomous driving. It also provides efficient and powerful benchmark models for both academia and industry.
MMFlow supports mainstream optical flow estimation algorithms, and the out-of-the-box design allows users to quickly apply it to action recognition and video super-resolution tasks.
MMHuman3D provides a unified test benchmark for human parametric model development and supports 16 commonly used datasets with a unified data structure to help academic research and application development of human parametric models.
MMSelfSup supports a variety of self-supervised learning tasks and a series of cutting-edge algorithms, providing a unified benchmark for research in the direction of self-supervised learning.
MMFewShot provides a unified algorithmic framework for training, inference, and evaluation for the popular few-sample classification and detection algorithms, solving the challenge of unified evaluation and comparison of few-sample learning methods in different vision tasks due to the randomness of sample selection.
MMRazor integrates mainstream model compression algorithms, includingnetwork architecture search, model pruning, and distillation.
It supports a flexible combination of various lightweight algorithms applied to other algorithm libraries in OpenMMLab, providing solutions for model use in practical scenarios.
One model deployment library
MMDeploy establishes a unified and efficient model transformation framework and implements a highly scalable component-based SDK, which supports seven types of back-end inference engines and supports one-click deployment of models trained by OpenMMLab’s algorithm libraries to hardware devices and runs efficiently. It establishes a set of deployment frameworks for AI application deployment that can adapt to all scenarios and high performance and can efficiently adapt to various types of chip hardware to meet the needs of end users for AI applications.