Eventos

Priberam Machine Learning Lunch Seminar – Marcos Treviso

8 Abril

Anfiteatro PA2, Pavilhão de Matemática, campus Alameda

Dia 8 de abril, às 13h, no anfiteatro PA2, Pavilhão de Matemática, campus Alameda

Data: 8 de abril
Hora: 13h
Local: Anfiteatro PA2, Pavilhão de Matemática, campus Alameda

Orador: Marcos Treviso (IT/IST)
Título: “Pushing the Limits of Sparse Attention: From Theory to Practical Efficiency”

Resumo:

Adaptive sparse attention mechanisms have emerged as a powerful alternative to dense attention in transformers, offering more interpretability for sequence modeling. Despite this, their widespread adoption has been limited by computational inefficiencies and insufficient understanding of their theoretical properties compared to dense attention models. In this talk, I will present recent advancements in adaptive sparse attention, exploring its expressivity, generalization ability, and hardware-aware optimizations. First, I’ll explore the expressivity of sparsemax attention, showing how it relates to linear attention with selective updates, and why entmax with α=1.5 offers even greater expressive power. Second, I’ll discuss our findings on generalization capabilities, where sparse attention shows superior performance on longer sequences compared to dense attention, particularly when considering an appropriate scaling. Finally, I’ll introduce AdaSplash, our hardware-aware implementation of α-entmax attention that outperforms FlashAttention-2 at high levels of sparsity. Throughout the talk, I’ll highlight how these advances collectively establish adaptive sparse attention as a robust alternative that can redefine the landscape of long sequence modeling.

Nota Biográfica:

Marcos Treviso is a Postdoctoral Researcher at Instituto de Telecomunicações where he focuses on advancing sparse attention mechanisms for natural language processing. His research spans theoretical analysis of sparse attention expressivity, generalization capabilities to longer contexts, and hardware-efficient implementations. His recent work includes theoretical connections between sparsemax attention and linear attention, studies on sparse attention’s superior generalization to longer sequence lengths, and hardware-aware optimizations for efficient transformers. Marcos earned his Ph.D. with Distinction and Honour from IST, University of Lisbon, under the supervision of Prof. André Martins. He serves as a Reviewer, Area Chair and Senior Area Chair at major NLP conferences, including ACL, helping drive research in efficient language processing techniques.

A Priberam integra a Comunidade IST Spin-Off®.

Os “Priberam Machine Learning Lunch Seminars” são de entrada livre, mediante inscrição prévia.

Mais informações e inscrições.