Webinár amália: Smerom k multimodálnemu LLM pre európsku portugalčinu

Webinar amália: Towards a Multimodal LLM for European Portuguese

Join us for an inspiring session on the development of amália, Portugal’s large language model designed to bring the richness of European Portuguese into the new era of multimodal artificial intelligence.

This webinar will feature Prof. João Magalhães from NOVA LINCS, Universidade NOVA de Lisboa, who will present the goals, architecture, and progress of this national AI initiative. The talk will explore how amália combines text, speech, image, and video understanding, and how it contributes to building culturally aligned and trustworthy AI systems for public, academic, and enterprise use.

Date and Time:
Wednesday, November 12th, 2025 | 10:00 AM CEST (9:00 PT)
Online | Free Registration

REGISTRATION

This webinar is organized by the Slovak National Supercomputing Centre as part of the EuroCC project (National Competence Centre – NCC Slovakia) in cooperation with NCC Portugal , within the LLM Webinar Series connecting high-performance computing with artificial intelligence, culture, and innovation. The webinar will be held in English.

The webinar will be held in English.

Abstract:

amália is a government-backed large language model focused on European Portuguese, designed to capture linguistic nuances, cultural context, and multimodal capability across text, speech, image, and video. The project aims for high-impact applications in public administration, research, and industry, with a strong emphasis on trust, alignment, and data sovereignty.

This presentation outlines the process and key methodologies employed by the amália LLM Team at NOVA University and Instituto Superior Técnico / UL, with a focus on utilizing European HPC resources such as the Marenostrum 5 (MN5) and Deucalion supercomputers. Prior Portuguese-language initiatives (like GlórIA) and European initiatives (such as EuroLLM) inform benchmarks and evaluation and help define the roadmap from beta to public release.

The core development pipeline is explored through three key dimensions:
1️ Initial data preparation and training, involving an extensive, multi-month process of transforming noisy HTML and PDFs into high-quality raw texts, followed by tokenization and core training methodologies in language modeling and instruction tuning.
2️. Model alignment, achieved through reinforcement learning approaches, Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO) with Verifiable Rewards (VR) to ensure safer and more trustworthy responses.
3️. Infrastructure setup for advanced RLVR (GRPO) training on MN5, which uses inference nodes for sampling and training nodes for collecting samples and running Verifiable Rewards, highlighting the complexity of configuring multiple custom environments (e.g., mathematics, programming, biology).

The talk concludes with key insights into the computational and methodological rigor required to efficiently develop state-of-the-art LLMs, positioning this work at the forefront of Europe’s innovation path in AI.

Speaker:

Prof. João Magalhães – CMU Portugal co-Director; Head of the Multimodal Systems Group, NOVA LINCS, Universidade NOVA de Lisboa

Prof. João Miguel da Costa Magalhães is a Full Professor in the Department of Informatics at Universidade NOVA de Lisboa and a senior researcher at NOVA LINCS. He serves as co-Director of the CMU Portugal Program and leads the Multimodal Systems Group. His research focuses on vision-language models, multimodal learning, and AI systems for semantic multimedia. He earned his PhD from Imperial College London and has been a key figure in Portugal’s AI and digital innovation ecosystem.

Topics Include:

Building the amália model: architecture, training, and data curation
Multimodal AI for text, speech, image, and video
Cultural alignment and linguistic sovereignty in LLMs
Evaluation, transparency, and responsible AI governance
Future roadmap and collaboration opportunities

Outline:

Introduction: Why a Portuguese multimodal LLM
From Language to Multimodality: Scope and capabilities of amália
Data and Alignment: Linguistic diversity and cultural fidelity
Model Architecture and Training Process
Status and Roadmap: Beta achievements and next steps
Applications and Impact Scenarios
Discussion and Q&A

REGISTRATION

NEWS

BeeGFS in Practice — Parallel File Systems for HPC, AI and Data-Intensive Workloads 6 Feb - This webinar introduces BeeGFS, a leading parallel file system designed to support demanding HPC, AI, and data-intensive workloads. Experts from ThinkParQ will explain how parallel file systems work, how BeeGFS is architected, and how it is used in practice across academic, research, and industrial environments.

When a production line knows what will happen in 10 minutes 5 Feb - Every disruption on a production line creates stress. Machines stop, people wait, production slows down, and decisions must be made under pressure. In the food industry—especially in the production of filled pasta products, where the process follows a strictly sequential set of technological steps—one unexpected issue at the end of the line can bring the entire production flow to a halt. But what if the production line could warn in advance that a problem will occur in a few minutes? Or help decide, already during a shift, whether it still makes sense to plan packaging later the same day? These were exactly the questions that stood at the beginning of a research collaboration that brought together industrial data, artificial intelligence, and supercomputing power.

Who Owns AI Inside an Organisation? — Operational Responsibility 5 Feb - This webinar focuses on how organisations can define clear operational responsibility and ownership of AI systems in a proportionate and workable way. Drawing on hands-on experience in data protection, AI governance, and compliance, Petra Fernandes will explore governance approaches that work in practice for both SMEs and larger organisations. The session will highlight internal processes that help organisations stay in control of their AI systems over time, without creating unnecessary administrative burden.