Kategórie
General

Záznam z podujatia: Slovak Language in the Era of Large Language Models (with the Support of the Leonardo Supercomputer)

On June 11, 2025, a joint webinar on language modeling was held, organized by the National Competence Centres for HPC in Slovakia and Italy. The discussion focused on the challenges and solutions related to using large language models (LLMs) for less-resourced languages such as Slovak.

Recording of the event: Slovak Language in the Era of Large Language Models (with support from the Leonardo supercomputer)

On June 11, 2025, a joint webinar on language modeling was held, organized by the National Competence Centres for HPC in Slovakia and Italy. The discussion focused on the challenges and solutions related to using large language models (LLMs) for less-resourced languages such as Slovak.

Participants were introduced to several innovative approaches aimed at reducing linguistic inequality in the era of artificial intelligence:

  • Generation of bilingual datasets: We used a database of professionally edited Slovak books and the LLaMA 3.3 70B Instruct model to translate between Slovak and English, resulting in datasets for training translation models and improving machine-translated Slovak.
  • Summarization of scientific texts: Using the Gemini Flash Experimental model and the PLOS database, we generated Slovak summaries of scientific articles, contributing to the development of domain-specific language in large language models (LLMs).
  • Enhancing cultural context: We are preparing a dataset based on Slovak sources aimed at improving the models’ ability to understand culturally specific topics and local context.

The webinar was hosted by Marek Dobeš, with Radovan Garabík and Peter Bednár as co-authors of the project. The research is conducted using high-performance computing infrastructure – the Slovak supercomputer Devana and the Italian supercomputer Leonardo, operated by the Cineca supercomputing center in Italy.

The case study highlights the potential for applying these methodologies to other low-resource languages. We believe that the insights gained from this project can inspire experts around the world.

How to Communicate Science and Technology Effectively: Our Team’s Experience 22 Aug - Our team has long been dedicated not only to the development of high-performance computing, but also to the effective communication of the results and opportunities we bring to industry, academia, and the next generation of researchers. We consider this a key part of our work – a modern scientific and technological institution today cannot operate in isolation but must be able to clearly and transparently communicate the value and impact of its activities.
HPC and MATLAB: A Strong Partnership for Accessible Supercomputing 19 Aug - Minulý týždeň sa uskutočnilo zaujímavé pracovné stretnutie, na ktorom sa Lucia Malíčková, zastrešujúca Národné superpočítačové centrum a Národné kompetenčné centrum pre HPC, stretla s Martinom Foltínom zo spoločnosti HUMUSOFT s.r.o., oficiálneho distribútora MathWorks.
Webinar: How a Large Language Model is Created for a Low-Resource Language 14 Aug - Pozývame vás na náš ďalší odborný webinár, ktorý sa zameria na pokroky v oblasti umelej inteligencie pre slovinský jazyk. Podujatie sa uskutoční online v polovici septembra a ponúkne jedinečnú príležitosť nazrieť do zákulisia vývoja veľkého jazykového modelu prispôsobeného pre jazyk s obmedzenými zdrojmi. Prednášajúcim bude Domen Vreš z Univerzity v Ľubľane, Fakulty informatiky a počítačových vied, ktorý sa dlhodobo venuje výskumu v oblasti spracovania prirodzeného jazyka a umelej inteligencie.