A proposed framework for differentiable neural architecture search (DNAS) that uses gradient-based methods to optimize convolutional neural network (CNN) architectures. We need CNNs that are both accurate (to enable better product capability and user experience) and efficient (to allow us to deliver our service to more people who do not have high-end mobile phones). But designing accurate, efficient CNNs for mobile devices is challenging because the design space is combinatorially large.
Previous neural architecture search (NAS) methods have been computationally intensive. In CNN architecture, optimality depends on factors such as input resolution and target devices, which requires case-by-case redesigns. Previous work also focuses primarily on reducing FLOPs, but FLOP count does not always reflect actual latency. With our new proposed framework, it is no longer necessary to enumerate and train individual architectures separately, which makes the process faster and better targeted for real-device constraints. We introduce FBNET, an optimized CNN architecture that is discovered by DNAS and surpasses state-of-the-art models, both those designed manually and those generated automatically.
DNAS allows us to explore a layer-wise search space, such that each layer can choose a different operator from a pool of candidates. The search space is represented by a stochastic super net. Each layer of the super net contains several parallel candidate operators, which are sampled to be executed following some distribution. To search for the optimal architecture, we train the stochastic super net to optimize the architecture distribution — to more frequently sample the low-cost operators that contribute to increased accuracy. The computational load of the operator can be measured by its FLOP count, its parameter size, or its actual latency, as benchmarked on target devices. When the training finishes, we can then sample optimal architectures from the trained distribution. This process is illustrated below.
DNAS provides a fast, powerful tool to automatically design new CNN architectures. The search space can contain arbitrary operators such as convolution, max pooling, or quantized convolution, which allows us to apply DNAS to different problems, including mixed precision quantization and efficient CNN search. The search process is extremely fast and typically takes eight GPUs 24 hours to finish, which is 420x more efficient than other methods for computing resources. DNAS supports optimizing directly for actual latency on target devices, and the CNNs discovered by our method surpass state of the art.
See this work presented at CVPR 2019.