MMDetection3D：The next generation 3D object detection platform

5 min readSep 21, 2020

With the rapid development of autonomous driving technology and the popularization of LiDAR, 3D object detection has drawn much attention from both industry and academia in recent years. However, unlike 2D object detection that owns simple and universal codebases and benchmarks like MMDetection, currently, there is no such universal codebase in 3D object detection. Therefore, we release MMDetection3D (MMDet3D for short) to fill this gap. After the first release on July 23rd, today, MMdetection3D releases the v0.6.0 version and hopes you enjoy it.

TL, DR:

MMDetection3D supports state-of-the-art single-modality/multi-modality 3D object detectors in indoor/outdoor datasets, including VoteNet, MVXNet, H3DNet, CenterPoint, and 3DSSD etc. It could also directly use all the 300+ models and 40+ algorithms in MMDetection. The supported algorithms are the richest among all 3D detection codebase.

MMDetection3D supports SUN RGB-D, ScanNet, Waymo, nuScenes, Lyft, and KITTI datasets. The number of supported datasets is the highest among 3D detection codebases. Meanwhile, MMDetection3D supports nuImages dataset since v0.6.0, a new dataset that was just released in September.

The model training speeds of MMDetection3D are the fastest. Meanwhile, MMDetection3D can be easily installed and used through pip.

To get you familiar with MMDetection3D, we make a brief introduction here. For more details, please refer to the documentation.

Support Multi-modality/Single-modality 3D Detection

Most of the codebases for 3D object detection usually focus on single-modality detection based on point clouds. The codebases for multi-modality detection do not usually support single-modal detection. As the importance of multi-modality 3D detection becomes more and more prominent, we support multi-modality (image + point cloud) 3D detection in MMDetection3D, and currently supports the MVX-Net model on KITTI. Based on MMDetection3D, the new work can be directly compared with other single-modality or multi-modality methods, which reduces the burden of codebase migration and algorithm reimplementation.

Support Indoor/Outdoor Datasets

Due to the difference between indoor and outdoor 3D detection datasets, before MMDetection3D, no codebase can well support both indoor and outdoor datasets and related methods.

MMDetection3D currently supports two mainstream indoor datasets: ScanNet and SUNRGBD, and supports four outdoor datasets: Waymo, nuScenes (with nuImages）, KITTI, and Lyft. At the model level, we implement VoteNet and H3DNet, state of the arts on the indoor dataset. We also implement state of the arts on the outdoor datasets, such as CenterPoint, 3DSSD, PartA2-Net, and PointPillars.

MMDetection3D implements uniform sets of data preprocessing pipelines for indoor and outdoor datasets. At the same time, some models are independent of the coordinate system. Our goal is that new work can be quickly evaluated and compared on indoor and outdoor datasets to verify the universality of the method and increase its influence.

Smooth Integration with MMDetection

So far, there is no such a 3D detection codebase that could support state of the art in 2D detection, while MMDetection3D achieves that. Based on MMDetection and MMCV, MMDetection3D uses the same high-level API as MMDetection and reuses many modules in MMDetection. If users have correct configuration files, they could use all the 300+ model checkpoints and 40+ algorithms in MMDetection’s model zoo.

The benefits brought by this are in the following two folds:

Anyone who wants to try a multi-modality detector can reuse almost all 2D detectors in MMDetection. There are many options and combinations to study.
Many common modules are implemented in MMDetection and MMCV, such as GCBlock, DCN, FPN, FocalLoss, etc.. They can also be used directly in MMDetection3D, reducing the gap between 2D detection and 3D detection. We also encourage users in the community to try new general detection technologies on 3D detectors with MMDetection3D (such as Loss, neural network modules, etc.) in the future, which might also be effective.

Fastest Training Speed

We have compared with multiple open-source 3D detection codebases. Our VoteNet, SECOND, PointPillars, and other models’ training speed are faster than those in other codebases. In the case of 8-GPUs distributed training with the same hyper-parameters, we show the comparison of training speed (number of samples/second) in the figure below. We mark x in the table for corresponding models that are not supported in other codebases.

Comparison of training speed (number of samples/second). We mark x in the table for corresponding models that are not supported in other codebases.

MMDetection3D implements its modules based on MMDetection. Therefore, their design principles are the same. For example, the config system and model abstraction of MMDetection3D are the same as MMDetection, so as the modular design. It is easy for everyone to get started with MMDetection3D.

MMDetection3D can also work as a package like MMDetection or MMSegmentation, so it is convenient for everyone to start a new project based on MMDetection3D quickly. You can install the latest release of MMDetection3D directly by ‘pip install mmdet3d’, and you can easily use all the functions of mmdet3d by directly importing mmdet3d in the new project.

MMDetection3D has also made many engineering optimizations. The original SECOND.Pytorch uses NumPy operators in Target assignment and other processes and puts many operations into Dataloader, neither efficient nor flexible. MMDetection3D directly uses the assigner in MMDetection (yes, MMDetection3D does not need to implement these modules). Based on PyTorch, MMDetection3D perform these complex calculations on CUDA, which is why MMDetection3D is faster. In other codebases, installing spconv is often a big obstacle. Now, spconv is directly integrated into the custom operator of MMDetection3D, making the installation process of MMDetection3D as smooth as MMDetection.

MMDetection3D is also very simple to flash to SOTA. For example, suppose we use PointPillars + RegNet3.2GF + FPN + FreeAnchor + test-time augmentation on the nuScenes dataset. In that case, we don’t need to use CBGS or GT-sampling, and we can reach NDS 65, mAP 57, by only using common data augmentation. This model is in our model zoo now .

Conclusion

MMDetection3D fills many gaps of the codebase in the current 3D detection field, supports multiple multi-modal/single-modal state of the arts in indoor/outdoor datasets, and provides a powerful and easy-to-use codebase for the future freestyle of community. Welcome to fork, star, try, develop, and raise a PR.