Analyzing the affinity score of pruned layers

October 20, 2023   
Keywords: machine learning, generalization, regularization, pruning, model compression
Prerequisites: Deep Learning, Statistics
Difficulty: Medium/Hard (M.Sc.). Not suitable for B.Sc.

Abstract

The affinity score is a recently-introduced metric for calculating the non-linearity of a transformation from two linear variables X and Y. This metric can be applied for measuring the non-linearity of a neural network layer seen as a transformation from its input to its output. The authors of the paper show how there seems to be a solid connection between affinity score and loss attained by a fully-trained neural network.

Neural network pruning is a well-known technique for model compression, i.e., reducing the memory footprint of a machine learning model. Specifically, pruning acts on neural networks by removing parameters (i.e., connections) with a given criterion.

A toy model of pruning: a small neural network has some parameters removed which leaves its connectivity pattern sparser The image above illustrate a toy example whereas pruning is applied to a small neural network, leaving the connectivity pattern sparser. Image is own work.

An appealing, yet largely unexplained, property of pruning is that, when applied with a low rate to neural networks, and after some epochs of re-training, the generalization capability may be better than the original, dense model. Thus, it seems like pruning can be seen also as a regularizer (in addition to being a model compression technique). This project proposal aims at studying the regularization effect of pruning at different rates by identifying possible trends in the non-linearity.

Required work

  • Literature review on pruning and methods for comparing hidden representations of neural networks
  • Pick multiple datasets, possibly one simple (not MNIST), one medium (e.g., CIFAR10), and one hard (e.g., Tiny-Imagenet or CIFAR100).
  • Operate pruning
  • Analyze the data (affinity score vs. accuracy)
  • (extra) Extend the work to other non-vision datasets

Relevant literature