Google Research Prediction Depth: Understanding the Laws Governing the Processing of DL Data
One way to understand the principles governing how deep learning models process data is to study input data points that have different levels or types of example difficulties. Different definitions of the example of difficulty have been presented in previous studies: from a statistical point of view, the example of difficulty refers to the probability of predicting the ground truth label for an example; while in model learning, the difficulty of an example refers to the difficulty of learning an example.
These two notions, however, share two fundamental limitations: they do not encapsulate the processing of data within a converged model, and they do not distinguish between difficult examples for different reasons.
In the newspaper Deep learning through the prism of example Difficulty, a Google research team tackles these problems by offering a “prediction depth” determined from hidden integrations as a new measure of the difficulty of the examples. Their study reveals the surprising fact that the depth of prediction of a given input has a strong relationship to the uncertainty, confidence, precision, and speed of training of a model for that data point.
Researchers use hidden layer probes to determine the difficulty of an example. They first introduce a computer view of the difficulty of an example parameterized by the prediction depth, then, on the basis of this definition, show that the prediction depth is both a significant and robust notion of the difficulty of the prediction. ‘an example. They also provide detailed descriptions of how prediction depth can be used to better understand three important aspects of deep learning: the accuracy and consistency of a prediction; the order in which the data is learned; and the simplicity of the learned function (as measured by the margin) near a data point.
The team conducted empirical analysis on various datasets to ensure that the results are robust to different architecture and dataset choices. Data sets used include ResNet18 (He et al., 2016), VGG16 (Simonyan and Zisserman, 2015); and MLP architectures formed on the CIFAR10, CIFAR100 (Krizhevsky et al., 2009), Fashion MNIST (FMNIST) (Xiao et al., 2017) and SVHN (Netzer et al., 2011) datasets. In the CIFAR10 experiment in ResNet18, the proposed method increased the precision from 25% to 98% for entries that were most “ambiguous without their label”.
Google researchers summarize the study’s contributions as follows:
- Introduce a measure of the difficulty of an example calculation: the prediction depth (PD).
- Show that the prediction depth is greater for the examples that appear visually more difficult, and that the prediction depth is consistent between architectures and random seeds.
- Empirical investigation reveals that prediction depth appears to establish a linear lower limit on the consistency of a prediction. Show that predictions are on average more accurate for validation points with small prediction depths.
- Demonstrate that the final predictions for the data points that converge earlier during training are usually determined in the earlier layers, which matches the learning history of the network to the processing of the data in the hidden layers.
- Show that the contradictory entry margin and the exit margin are larger for the examples with smaller prediction depths. Design an intervention to reduce the output margin of a network and show that this leads to predictions only in the last hidden layers.
- Identify three extreme forms of example difficulty by considering the prediction depth in the learning and validation divisions independently and demonstrate how a simple algorithm that uses the inclusions hidden in an intermediate layer to make predictions can lead to dramatic improvements precision for entries that strongly present a specific form of example of difficulty.
- Use the results to present a cohesive picture of deep learning that unifies four seemingly unrelated deep learning phenomena: first layers generalize while later layers memorize, networks converge from input layer to layer. output layer, simple examples are learned first, and networks demonstrate simpler functions earlier in training.
Overall, the proposed notion of predictive depth of example difficulty reveals what article co-author Behnam Neyshabur calls “surprising relationships with different deep learning phenomena.” The Google team notes that their results derive from representing a deep model, which is hierarchical by construction, and therefore similar results will likely appear in larger models, larger datasets, and tasks other than the image classification – although further testing in these and other areas remains to be done.
The researchers say they hope their study can help develop models that capture heteroskedastic uncertainty, improve understanding of how deep networks respond to change in distribution, and advance approaches to learning from programs and l fairness of machine learning.
The paper Deep learning through the prism of example Difficulty is on arXiv.
Author: Hecate Il | Editor: Michael Sarazen, Zhang Channel
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Weekly Synchronized Global AI to get weekly AI updates.