Science and Technology

‘EuroLLM-22B’: Open European artificial intelligence model has been launched with the participation of Técnico

The European consortium presents a large language model designed to support all 24 official EU languages, using European supercomputing resources.

Instituto Superior Técnico participates in the development of ‘EuroLLM-22B’, an open large language model (LLM) developed within the scope of a European consortium for research into Artificial Intelligence (AI) and officially presented on 15 December. The project brings together Técnico researchers integrated into the ELLIS Unit Lisbon and Instituto de Telecomunicações (IT), aiming to create a foundational model for AI applications that aligns with Europe’s linguistic diversity.

Técnico’s participation is led by André Martins, a professor at Técnico and researcher at IT, whose work in the field of natural language processing is internationally recognised. Regarding the model’s development process, the researcher and project coordinator emphasises that “it has been a long journey and a team effort at various levels”, which involved “data filtering, pre-training, adaptation to long contexts, and post-training”.

The launch of “EuroLLM-22B” is seen as a significant milestone in the initiative’s journey. “We are very proud to launch EuroLLM-22B today”, says André Martins, adding that “this is another important step towards strengthening AI sovereignty”.

The model is freely available to researchers, academic institutions, start-ups and other organisations. For André Martins, this principle is crucial to the project’s objectives. “We want EuroLLM to become an engine of innovation, allowing anyone to develop technology based on this model”, he says, emphasising the importance of reducing barriers to entry as a key factor in accelerating European innovation in Artificial Intelligence.

In an international context dominated by models developed by large technology companies, often with restricted access, EuroLLM emerges as a European alternative specifically designed to respond to the linguistic and cultural diversity of the European Union.

With an initial focus on multilingualism, “EuroLLM-22B” supports the 24 official EU languages, as well as 11 additional ones considered to be of strategic importance. The EuroLLM roadmap includes plans for future multimodal capabilities such as speech, vision, and video, powered by a new extreme access project on the scale of EuroHPC (European High Performance Computing Joint Undertaking), which is scheduled to start in 2026.

With 22 billion parameters, EuroLLM-22B is the largest model in the EuroLLM family, following the launch of EuroLLM-1.7B and EuroLLM-9B. The training was carried out from scratch on the MareNostrum 5 supercomputer at the Barcelona Supercomputing Centre, using the infrastructure of the EuroHPC Joint Undertaking. Public benchmark results indicate that the model performs competitively against global models of similar size in multilingual tasks.

EuroLLM is available via the Hugging Face website.

André Martins has been conducting research on machine learning and natural language processing, with competitive European funding, including the award of a European Research Council (ERC) grant in February 2023 for the study of artificial neural networks.