MMDetection V2.0：A Faster and Stronger General Detection Framework

5 min readJun 11, 2020

MMDetection just crossed 10K GitHub stars with over 3.4K forks! 🎉

Since the release of MMDetection V1.0 in Jan 2020, we have received lots of valuable feedbacks. There have also been many suggestions and contributions from the community that have helped to improve the codebase continuously.

Through refactoring and optimization of the various components in the codebase, we have comprehensively improved the speed and accuracy of MMDetection, seizing the top level among all existing detection frameworks. With a more fine-grained modular design, the extensibility of MMDetection is greatly enhanced, making it a highly adaptable basic platform for projects related to detection. At the same time, we add a bunch of documentation and tutorials to improve the user experience.

To celebrate our milestone of reaching 10K GitHub stars, we proudly present to you MMDetection V2.0! The following is a brief introduction to the major improvement in the new version of MMDetection. You may refer to the documentation for more details.

The largest and greatest model zoo

By the end of May 2020, the official model zoo of MMDetection has supported algorithms reported in 35 papers (see the list here), offering more than 250 pre-trained models. This renders MMDetection the largest model zoo in the field of object detection. Beyond the official model zoo, there are at least 21 papers that have released their codes based on MMDetection, including the latest CVPR 2020 papers such as PolarMask and TSD (see the list here). Algorithms that are supported in the MMDetection model zoo are shown below.

In MMDetection V2.0, we add support for Res2Net, RegNet and other backbones, and also methods like CARAFE, PISA, FSAF, NAS-FCOS, etc. The timely support of the state of the art allows MMDetection to be used not only as a standard benchmark for academic research, but also as a toolbox to implement various fancy ideas, even more importantly as an ace in the hole for object detection competitions.

Flexible design, friendly experience

Modular design is the most important principle that MMDetection follows since its first release. Such design enables MMDetection to gain a wide acceptance from the community.

The modular design of MMDetection V2.0 is more fine-grained than the previous version. This allows more modules to be replaced or adjusted more flexibly, catalyzing the evolution of MMDetection from an object detection framework to a platform to handle multiple tasks related to detection in parallel. The new modular design is shown below.

During the code refactoring phases, we focus on improving the user experience in response to the feedbacks. Here are a few examples:

Pain point 1: Some modules have a deep encapsulation. For example, the target functions of AnchorHead are implemented in the “mmdet/core/” directory. To read and modify related codes, users need to jump back and forth between the two folders. This also makes it hard to decouple the extension codes from the core library.
V2.0 improvements: We adjust the code structure to simplify the encapsulation hierarchy. The functions of head are implemented in a single file as much as possible. The code implementation is clear at a glance and is easier to be extended.
Pain point 2: The config file is very long therefore very susceptible to mistakes during modification. The changes made are not clearly illustrated making it even harder to pinpoint the mistakes.
V2.0 improvements: We design a new config system that supports multiple inheritance. The commonly used dataset configuration, basic model, and training strategy are put in the “_base_” folder; each new config file can inherit one or more existing configs for most of common fields. Then users only need to overwrite some of the fields in the new config. The figure below is a comparison of the configuration files of Mask R-CNN R-101 in versions 1.0 and 2.0. Version 2.0 only requires 2 lines of codes, and all the changes are clear at a glance, as shown in the figure below.

Faster speed

We also optimize the implementation of some modules to achieve a speedup of more than 30% for training and 60% for inference.

Higher performance

The performance of basic models in MMDetection V2.0 has significantly improved compared to those in V1.x. We conduct ablation study on various parameters in the technical report of V1.0 (https://arxiv.org/abs/1906.07155), but decided to keep the default settings as simple as possible following typical conventions. In V2.0, we adjust some default settings to achieve performance gains, without increasing the cost of training and inference, e.g., using L1 Loss instead of SmoothL1 Loss for regression. Detectron2 also has many remarkable features in this respect. The main difference between MMDetection V2.0 and Detectron2 is that Detectron2 adopts multi-scale training by default, while we still choose to use single-scale training setting. If the same settings are employed, the performance of these two codebases are almost the same.

Summary

Many researchers and developers feel that the quality of codebases they use significantly affects the efficiency of research and development. A desirable object detection toolbox requires not only a stable and efficient implementation in engineering, but also the flexibility to support new methods without much pains. MMDetection has been constantly making efforts towards this direction. You are also welcomed to create PRs and issues, so that we can make it better and stronger together.