OpenEuroLLM, a new initiative in Europe, aims to develop open-source large language models (LLMs) that cover all EU languages, including those from countries negotiating for EU membership.
The project, co-led by Jan Hajič from Charles University and Peter Sarlin from Silo AI, seeks to advance Europe’s digital sovereignty by creating transparent and culturally diverse AI models. The EU has committed €37.4 million for the development of these models, with additional funding from EuroHPC supercomputer centers in Spain, Italy, Finland, and the Netherlands.
Despite its ambitious goals, the project faces skepticism about its feasibility, given the complexity of managing over 20 collaborating organizations. Critics like Anastasia Stasenko argue that smaller, more focused teams could execute the vision more effectively.
Nonetheless, Hajič remains optimistic, pointing to the project’s preparatory work through the High-Performance Language Technologies (HPLT) initiative, which laid the groundwork for OpenEuroLLM’s datasets, models, and workflows.
The project’s goal is to release the first version of the models by mid-2026, with the final product ready by 2028. OpenEuroLLM aims to develop a multilingual core LLM for general tasks, while smaller versions may be created for more specific use cases. The challenge remains in balancing linguistic diversity and model quality, particularly for languages with limited digital resources.
OpenEuroLLM faces similar goals to another EU-backed project, EuroLLM, which also focuses on creating an open-source European language model. While OpenEuroLLM has a significant budget, its focus is on building the foundational models for future AI applications in Europe, not creating consumer-grade products like chatbots.
The project’s ultimate aim is to ensure digital sovereignty by providing Europe with AI infrastructure that supports its languages and cultures, making it a crucial step for the EU in AI development.