Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

Hattie Zhou, Janice Lan, Rosanne Liu, Jason Yosinski

Uber

•

Hattie Zhou, Janice Lan, Rosanne Liu, Jason Yosinski

•16 min read•advanced•

--

•View Original

Neural NetworksSolid

Overview

The article explores the Lottery Ticket Hypothesis, which posits that within large neural networks, smaller subnetworks can be identified that perform comparably to the full network when trained independently. It discusses the mechanisms behind these subnetworks, introduces the concept of Supermasks, and evaluates different criteria for weight pruning to enhance performance.

What You'll Learn

1

How to identify and utilize Supermasks in neural networks

2

Why the initial weights of neural networks impact training outcomes

3

When to apply different weight masking criteria for optimal performance

Prerequisites & Requirements

Understanding of neural networks and weight pruning techniques

Key Questions Answered

What is the Lottery Ticket Hypothesis?

The Lottery Ticket Hypothesis suggests that within a large neural network, there exist smaller subnetworks, or 'lottery tickets', that can be trained to perform as well as the full network. This is achieved by pruning weights below a certain threshold and retraining the remaining weights from their initial values.

How do Supermasks improve neural network performance?

Supermasks are binary masks applied to neural networks that allow them to achieve better-than-chance accuracy without additional training. They are created based on specific criteria that select weights that have shown significant performance during training.

What criteria can be used for weight masking in neural networks?

The article discusses several criteria for weight masking, including the 'large final' criterion, which retains weights with large final magnitudes, and the 'magnitude increase' criterion, which keeps weights that have increased significantly during training. These criteria help in identifying effective subnetworks.

Why does reinitializing weights affect training performance?

Reinitializing weights in Lottery Ticket networks can degrade performance because the specific initial configuration of weights is crucial for their training success. The coupling between the pruning mask and the initial weights is essential for maintaining the network's performance.

Key Statistics & Figures

Test accuracy on MNIST with Supermasks

80 percent

Achieved without any training by applying the 'large final, same sign' mask criterion.

Test accuracy on CIFAR-10 with Supermasks

24 percent

Also achieved without additional training using the same mask criterion.

Test accuracy on MNIST using a signed constant

86 percent

Achieved by applying the mask to a signed constant instead of the actual initial weights.

Test accuracy on CIFAR-10 using a signed constant

41 percent

Similar improvement as seen on MNIST when using a signed constant with the mask.

Key Actionable Insights

1
Implementing the 'large final, same sign' mask criterion can significantly enhance the performance of neural networks.
This criterion allows for the selection of weights that not only have large final magnitudes but also maintained their sign during training, leading to improved accuracy without additional training.

2
Understanding the coupling between weight pruning and initial weight configuration is crucial for effective neural network training.
This knowledge can help in designing better training protocols and improving the performance of pruned networks by ensuring that the initial conditions are preserved.

3
Experimenting with different weight masking criteria can uncover more effective subnetworks within larger models.
By evaluating various criteria, practitioners can identify which methods yield the best performance for their specific applications, leading to more efficient model training.

Common Pitfalls

1

Overlooking the importance of initial weight configurations can lead to suboptimal training outcomes.

When reinitializing weights, failing to maintain the original configuration can degrade the performance of the network, as the specific combination of weights and their initial states are critical for successful training.

Related Concepts

Neural Network Pruning Techniques

Weight Initialization Strategies

Transfer Learning And Meta-learning Implications