Kategórie
General

Webinar: How a Large Language Model is Created for a Low-Resource Language

Pozývame vás na náš ďalší odborný webinár, ktorý sa zameria na pokroky v oblasti umelej inteligencie pre slovinský jazyk. Podujatie sa uskutoční online v polovici septembra a ponúkne jedinečnú príležitosť nazrieť do zákulisia vývoja veľkého jazykového modelu prispôsobeného pre jazyk s obmedzenými zdrojmi. Prednášajúcim bude Domen Vreš z Univerzity v Ľubľane, Fakulty informatiky a počítačových vied, ktorý sa dlhodobo venuje výskumu v oblasti spracovania prirodzeného jazyka a umelej inteligencie.

Webinar: How a Large Language Model is Created for a Low-Resource Language

We invite you to join our next expert webinar, which will focus on advances in artificial intelligence for the Slovenian language. The event will take place online in mid-September and will offer a unique opportunity to look behind the scenes of developing a large language model tailored for a low-resource language. The speaker will be Domen Vreš from the Faculty of Computer and Information Science, University of Ljubljana, who has long been engaged in research in the fields of natural language processing and artificial intelligence.

Speaker: Domen Vreš
Affiliation: Faculty of Computer and Information Science, University of Ljubljana
Title of the lecture: Advances in Artificial Intelligence for the Slovenian Language: Training and Scaling the GaMS Large Language Model
Format: Online
Date and Time: 17 September 2025 at 10:00 CEST

About webinar
This webinar will provide an in-depth look at the development of GaMS, a large language model (LLM) tailored for the Slovenian language. We will begin by exploring the general principles of LLM training and explain why this process requires substantial computational resources and data. Participants will learn about the entire workflow, from data preparation through training to model performance evaluation.

We will then focus on scaling LLM training on EuroHPC infrastructure, specifically on the Leonardo supercomputer. This step made it possible to achieve higher performance and quality of the model. Domen Vreš will also share insights into the technical and organisational aspects of collaborating on such a project.

Finally, we will delve into the specific challenges of developing an LLM for a low-resource language such as Slovenian. What makes this process more demanding? How can the limitations of available data be overcome? And what creative solutions did the team apply in the GaMS model?

A dedicated part of the webinar will address the challenges of evaluating LLMs in low-resource environments, where established benchmarks and comprehensive evaluation datasets are often lacking. The speaker will explain how this gap can be bridged through the collection and integration of human feedback, and how such data is used to further improve and fine-tune the model.

Target audience:
The webinar is suitable for artificial intelligence professionals, researchers in natural language processing, as well as students, technology enthusiasts, and anyone interested in the development and applications of large language models in minority languages.

Whether you are an AI expert or simply seeking inspiration, this webinar offers a clear and engaging insight into the groundbreaking work behind the GaMS model and its significance for the future of Slovenian language processing.
Don’t miss the opportunity to participate and gain valuable insights directly from an expert at the University of Ljubljana.

All Hands Meeting in Estonia 30 Sep - V dňoch 23. – 25. septembra 2025 sa zástupkyne Národného kompetenčného centra pre HPC  Halyna Hyryavets a Lucia Malíčková zúčastnili záverečnej konferencie projektov CASTIEL2, EuroCC2, EuroCC4SEE a Centier excelentnosti (CoEs), ktorá sa konala v Tallinne v Estónsku.
MATLAB WORKSHOP: Scaling Algorithms and Simulations Using Parallel Computing  25 Sep - Computationally intensive tasks, processing large volumes of data, and repeated simulation runs are everyday challenges in both science and industry. Modern approaches make it possible to significantly accelerate these processes by leveraging parallel computing and available hardware resources – from multiple CPU cores to GPU acceleration. MATLAB and Simulink offer a wide range of possibilities to efficiently utilize these resources without the need for specialized knowledge of GPU programming or complex HPC solutions.
The computing power of HPC brings new opportunities in the protection of the brown bear  23 Sep - Vysokovýkonné počítanie (HPC) predstavuje kľúčovú technológiu modernej doby, ktorá zásadne mení spôsob, akým vedci, výskumníci aj firmy riešia zložité problémy. Superpočítače dokážu spracovať obrovské objemy dát a vykonať miliardy výpočtov za sekundu – úlohy, ktoré by na bežných počítačoch trvali mesiace či roky, zvládajú v priebehu hodín alebo dní.