Researchers from Instituto Superior Técnico are part of the team that will develop AMALIA, the Portuguese large language model (LLM) that was announced by Prime Minister Luís Montenegro on 11 November during the Web Summit in Lisbon. This large-scale language model designed in Portugal will be trained using data in the Portuguese language on national and European supercomputers.
An LLM is a model that uses artificial intelligence to process, understand, and generate text in natural language. It can be used as a component in various systems, such as dialogue systems and chatbots, search systems, and automatic question-answering systems.
AMALIA (Automatic Multimodal Language AI Assistant) will start from a model with around 9 billion parameters (pre-trained on 4 billion words) and fine-tuned on Portuguese data extracted and filtered from Arquivo.PT.
The project will take 18 months to complete and the first version of the model is expected to be ready by the end of the first quarter of 2025.
At the opening of the Web Summit, the Prime Minister listed future applications of this LLM in education (with ‘an artificial intelligence educational tutor’ for each student), access to public administration services (‘simpler, more direct and more personalised’), and business growth (companies will be able to ‘design their services in an era of artificial intelligence also in Portuguese’).
The team responsible for developing AMALIA includes members of Instituto de Telecomunicações, a research centre affiliated with Técnico, Unbabel, a member of the IST spin-off community, NOVA University Lisbon and Fundação para a Ciência e a Tecnologia.
In the media:
- «Chega ao fim mais uma edição da Web Summit, que bateu vários recordes» – Com Pedro Amaral (SIC Notícias)
- «ChatGPT português pode custar “entre 10 e 20 milhões de euros”» – Com Arlindo Oliveira (Rádio Renascença)
- «“O ChatGPT português vai ser feito perfeitamente a tempo e não será pequeno”» (ECO)
- «De Lisboa a Luanda, há quem acredite que o ChatGPT “à portuguesa” vai ajudar mulheres e escolas» – Com Pedro Amaral (Rádio Renascença)
- «Prioridade do novo LLM português é respeitar e preservar a “soberania da língua portuguesa”» (Sapo TEK)
- «Programa de IA em português será da responsabilidade da Nova e do Técnico» (Executive Digest)
- «Programa de IA em português fica nas mãos da Nova e do Técnico» (Público)