CORE MACHINE LEARNING

Parameter Prediction for Unseen Deep Architectures

November 03, 2021

Abstract

Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures - DeepNets-1M - and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60% accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50%. Our task along with the model and results can potentially lead to a new, more computationally efficient paradigm of training networks. Our model also learns a strong representation of neural architectures enabling their analysis.

Download the Paper

AUTHORS

Written by

Boris Knyazev

Michal Drozdzal

Graham Taylor

Adriana Romero Soriano

Publisher

NeurIPS

Research Topics

Core Machine Learning

Related Publications

November 20, 2024

NLP

CORE MACHINE LEARNING

Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations

Igor Fedorov, Kate Plawiak, Lemeng Wu, Tarek Elgamal, Naveen Suda, Eric Smith, Hongyuan Zhan, Jianfeng Chi, Yuriy Hulovatyy, Kimish Patel, Zechun Liu, Yangyang Shi, Tijmen Blankevoort, Mahesh Pasupuleti, Bilge Soran, Zacharie Delpierre Coudert, Rachad Alao, Raghuraman Krishnamoorthi, Vikas Chandra

November 20, 2024

November 14, 2024

NLP

CORE MACHINE LEARNING

A Survey on Deep Learning for Theorem Proving

Zhaoyu Li, Jialiang Sun, Logan Murphy, Qidong Su, Zenan Li, Xian Zhang, Kaiyu Yang, Xujie Si

November 14, 2024

November 06, 2024

THEORY

CORE MACHINE LEARNING

The Road Less Scheduled

Aaron Defazio, Alice Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

November 06, 2024

August 16, 2024

THEORY

REINFORCEMENT LEARNING

Dual Approximation Policy Optimization

Zhihan Xiong, Maryam Fazel, Lin Xiao

August 16, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.