Oberseminar "Mathematik des Maschinellen Lernens und Angewandte Analysis" - M.Sc. Albert Alcalde Zafra
Clustering in pure-attention hardmax transformers
| Datum: | 20.11.2024, 14:15 - 15:15 Uhr |
| Kategorie: | Veranstaltung |
| Ort: | Hubland Nord, Geb. 40, 01.003 |
| Veranstalter: | Institut für Mathematik |
| Vortragende: | M.Sc. Albert Alcalde Zafra - Universität Erlangen |
Transformers are extremely successful models in machine learning with poorly understood mathematical properties. In this talk, we rigorously characterize the asymptotic behavior of transformers with hardmax self-attention and normalization sublayers as the number of layers tends to infinity. By viewing such transformers as discrete-time dynamical systems describing the evolution of interacting particles in a Euclidean space, and thanks to a geometric interpretation of the self-attention mechanism based on hyperplane separation, we show that the transformer inputs asymptotically converge to a clustered equilibrium determined by special particles we call leaders. We leverage this theoretical understanding to solve a sentiment analysis problem from language processing using a fully interpretable transformer model, which effectively captures ‘context’ by clustering meaningless words around leader words carrying the most meaning.


