INNER: information theory of deep neural networks

During my PhD, I worked on the WASP project INNER: information theory of deep neural networks, with my co-participants Giuseppe Durisi and Fredrik Kahl. In this project, we aimed to shed light on the generalization capabilities of deep neural networks by leveraging tools from information theory.

The main message of this work is that, as long as the information that a machine learning algorithm extracts from its training data is suitably bounded, its performance on training data gives an accurate indication of its performance on unseen data. Among our findings, we demonstrated how this information-theoretic notion of complexity encompasses several classical notions of complexity, and illustrated that for neural networks, it is often beneficial to consider the information stored in losses rather than in parameters. A popular-science summary of the work is available on YouTube:

A more formal summary of the work is available in my PhD thesis, or in this monograph (written with Giuseppe Durisi, Benjamin Guedj, and Maxim Raginsky). The project resulted in state‑of‑the‑art bounds on the generalization gap for several benchmark deep learning settings, with the work presented at NeurIPS, ISIT, ICML ITR3, and in JSAIT, available at: [1], [2], [3], [4], [5], [6].

(back to main page)

Fredrik Hellström

Fredrik Hellström
University College London
London, United Kingdom

Google Scholar
Site template