Deutsch Intern
Mathematics of Machine Learning

Oberseminar "Mathematik des Maschinellen Lernens und Angewandte Analysis" - M.Sc. Albert Alcalde Zafra

Clustering in pure-attention hardmax transformers
Date: 11/20/2024, 2:15 PM - 3:15 PM
Category: event
Location: Hubland Nord, Geb. 40, 01.003
Organizer: Institut für Mathematik
Speaker: M.Sc. Albert Alcalde Zafra - Universität Erlangen

Transformers are extremely successful models in machine learning with poorly understood mathematical properties. In this talk, we rigorously characterize the asymptotic behavior of transformers with hardmax self-attention and normalization sublayers as the number of layers tends to infinity. By viewing such transformers as discrete-time dynamical systems describing the evolution of interacting particles in a Euclidean space, and thanks to a geometric interpretation of the self-attention mechanism based on hyperplane separation, we show that the transformer inputs asymptotically converge to a clustered equilibrium determined by special particles we call leaders. We leverage this theoretical understanding to solve a sentiment analysis problem from language processing using a fully interpretable transformer model, which effectively captures ‘context’ by clustering meaningless words around leader words carrying the most meaning.

 

Back