On 2 December, Unbabel, a member of the Técnico spin-off community, announced the creation of EuroLLM-9B, a language model designed to support all 24 official languages of the European Union (including European Portuguese, with which the AMALIA large language model will work, with the participation of Técnico researchers). In addition to Unbabel, several other organizations, such as Técnico and the Instituto de Telecomunicações, along with various European higher education institutions, are collaborating in the development of EuroLLM, comprising a total of nine partners.
EuroLLM is intended to counteract the ‘Anglophone bias’ of most large language models, designed to work in English and trained with data in that language. This model will therefore be adapted to the cultural and linguistic diversity of the European continent.
Additionally, EuroLLM seeks to counteract the trend of models dominated by agents such as OpenAI, Google, and Meta, which may entail risks related to limited openness (i.e. transparency in the operation of the code) and possible future restrictions on access.
André Martins, a professor at Técnico and vice president of AI Research at Unbabel, leads the Portuguese team for this project. He describes their efforts as “an exciting first step toward strengthening Europe’s digital sovereignty, which is now more important than ever.” “The goal is for EuroLLM to serve as a catalyst for innovation, providing opportunities for anyone to build upon it. Additionally, he views this project as “a success story for the European supercomputing network and its role in the development of artificial intelligence.”
The model was trained on the MareNostrum 5 supercomputer, installed in Barcelona. The technical information, shared by the EuroLLM team, is available on the project’s website.
In 2023, André Martins received a consolidation grant from the European Research Council (ERC) totalling around 2 million euros. In 2017, he had already won an ERC grant, worth 1.4 million euros, to carry out his research work.