{"id":10484,"date":"2025-05-29T09:58:00","date_gmt":"2025-05-29T07:58:00","guid":{"rendered":"https:\/\/eurocc.nscc.sk\/?p=10484"},"modified":"2025-05-29T09:59:14","modified_gmt":"2025-05-29T07:59:14","slug":"webinar-slovencina-v-ere-velkych-jazykovych-modelov-s-podporou-superpocitaca-leonardo","status":"publish","type":"post","link":"https:\/\/eurocc.nscc.sk\/en\/webinar-slovencina-v-ere-velkych-jazykovych-modelov-s-podporou-superpocitaca-leonardo\/","title":{"rendered":"Slovak Language in the Era of Large Language Models (with the Support of the Leonardo Supercomputer)"},"content":{"rendered":"<div class=\"is-layout-flow wp-block-group alignfull posts-all\"><div class=\"wp-block-group__inner-container\">\n<div class=\"is-layout-flex wp-container-4 wp-block-columns\">\n<div class=\"is-layout-flow wp-block-column\" style=\"flex-basis:60%\">\n<div class=\"is-layout-flow wp-block-group alignfull\"><div class=\"wp-block-group__inner-container\">\n<p><strong>Slovak Language in the Era of Large Language Models (with the Support of the Leonardo Supercomputer)<\/strong><\/p>\n\n\n\n<h5 class=\"post-h\"><hr style=\"border: 1px solid #b8870b; width: 100px;\"><\/h5>\n\n\n\n<p>You are warmly invited to a joint webinar on language modeling, organized by the National Competence Centres for HPC in Slovakia and <a href=\"https:\/\/euroccitaly.it\/en\/\">Italy<\/a>.  \u00a0The rise of large language models (LLMs), which require vast amounts of training data, initially put users of low-resource languages at a disadvantage.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n<\/div><\/div>\n<\/div>\n\n\n\n<div class=\"is-layout-flow wp-block-column\" style=\"flex-basis:50%\"><div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><a href=\"https:\/\/eurocc.nscc.sk\/wp-content\/uploads\/2025\/05\/Prednasky-1280-x-720-px-4-scaled.jpg\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"576\" src=\"https:\/\/eurocc.nscc.sk\/wp-content\/uploads\/2025\/05\/Prednasky-1280-x-720-px-4-1024x576.jpg\" alt=\"\" class=\"wp-image-10485\" srcset=\"https:\/\/eurocc.nscc.sk\/wp-content\/uploads\/2025\/05\/Prednasky-1280-x-720-px-4-1024x576.jpg 1024w, https:\/\/eurocc.nscc.sk\/wp-content\/uploads\/2025\/05\/Prednasky-1280-x-720-px-4-300x169.jpg 300w, https:\/\/eurocc.nscc.sk\/wp-content\/uploads\/2025\/05\/Prednasky-1280-x-720-px-4-768x432.jpg 768w, https:\/\/eurocc.nscc.sk\/wp-content\/uploads\/2025\/05\/Prednasky-1280-x-720-px-4-1536x864.jpg 1536w, https:\/\/eurocc.nscc.sk\/wp-content\/uploads\/2025\/05\/Prednasky-1280-x-720-px-4-2048x1152.jpg 2048w, https:\/\/eurocc.nscc.sk\/wp-content\/uploads\/2025\/05\/Prednasky-1280-x-720-px-4-18x10.jpg 18w, https:\/\/eurocc.nscc.sk\/wp-content\/uploads\/2025\/05\/Prednasky-1280-x-720-px-4-1200x675.jpg 1200w, https:\/\/eurocc.nscc.sk\/wp-content\/uploads\/2025\/05\/Prednasky-1280-x-720-px-4-1980x1114.jpg 1980w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure><\/div><\/div>\n<\/div>\n\n\n\n<h5 class=\"post-h\"><\/h5>\n\n\n\n<p>As part of our project, we are working to overcome this barrier for the Slovak language through several strategies that may also offer methodological insights for other low-resource languages:<\/p>\n\n\n\n<ul>\n<li><strong>Generating Bilingual Datasets:<\/strong>\u00a0Using a carefully curated database of professionally edited Slovak books, we employ the LLaMA 3.3 70B Instruct model to translate texts into English and then back into Slovak. This process allows us to create two datasets\u2014one for training a compact open-source model for English-to-Slovak translation, and another for improving the quality of machine-translated Slovak.<\/li>\n\n\n\n<li><strong>Summarizing Scientific Texts:<\/strong>\u00a0Using Gemini Flash Experimental and the PLOS scientific database, we generate summaries of scientific articles in Slovak. This dataset supports the training of Slovak LLMs in the area of specialized scientific terminology.<\/li>\n\n\n\n<li><strong>Enhancing Cultural Context:<\/strong>\u00a0Although models like DeepSeek and ChatGPT perform relatively well in Slovak, they struggle with culturally specific and contextual topics related to Slovakia. We plan to synthesize texts from Slovak sources to create a dataset that fills this gap.<\/li>\n<\/ul>\n\n\n\n<p><strong>Date and Time: <\/strong>June 11, 2025, 10:00 \u2013 11:00 CEST<br><strong>Venue: <\/strong>online <br><strong>Language: <\/strong>\u00a0 English <strong> <\/strong><br><strong>Speaker: <\/strong>Marek Dobe\u0161<br><strong>Co-authors: <\/strong>Radovan Garab\u00edk and Peter Bedn\u00e1r<br><strong><a href=\"https:\/\/forms.office.com\/e\/UECHKV1gA3\" target=\"_blank\" rel=\"noreferrer noopener\">Registration<\/a><\/strong><\/p>\n\n\n\n<p>Our aim is to mitigate the data scarcity for the Slovak language and enhance the performance of LLMs in terms of linguistic accuracy, scientific discourse, and cultural relevance. We believe that the approaches explored in this case study may inspire similar efforts for other low-resource languages.<\/p>\n\n\n\n<p>This research is conducted on high-performance infrastructure \u2014 specifically, the Slovak national supercomputer Devana and. <a href=\"https:\/\/leonardo-supercomputer.cineca.eu\">Leonardo<\/a>\u00a0one of Europe\u2019s most powerful supercomputers operated by <a href=\"https:\/\/www.cineca.it\/en\">Cineca<\/a> in Italy. These platforms enable us to process multilingual datasets, train models at scale, and test advanced LLM techniques with resource efficiency.<\/p>\n\n\n\n<p>Although our case study focuses on Slovak, the methods and tools we are developing are broadly applicable to other underrepresented languages around the world. We warmly invite collaborators from all countries \u2014 not only from Central Europe or Italy, but from any region where a lack of language data poses a barrier to AI development. Our project demonstrates how European collaboration and shared use of supercomputing resources can open up new possibilities for inclusive, multilingual language modeling \u2014 especially for countries that have so far had limited opportunities to contribute to the creation of multilingual language models.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"is-layout-flow wp-block-group alignfull call-bot\"><div class=\"wp-block-group__inner-container\">\n<div class=\"is-horizontal is-content-justification-center is-layout-flex wp-container-5 wp-block-buttons\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"\/en\/calls-for-proposals\/\">BACK<\/a><\/div>\n<\/div>\n<\/div><\/div>\n<\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>Srde\u010dne v\u00e1s poz\u00fdvame na spolo\u010dn\u00fd webin\u00e1r o modelovan\u00ed jazyka, ktor\u00fd organizuj\u00fa N\u00e1rodn\u00e9 kompeten\u010dn\u00e9 centr\u00e1 pre HPC na Slovensku a v Taliansku. N\u00e1stup ve\u013ek\u00fdch jazykov\u00fdch modelov (LLM), ktor\u00e9 si vy\u017eaduj\u00fa obrovsk\u00e9 mno\u017estvo tr\u00e9novac\u00edch d\u00e1t, p\u00f4vodne znev\u00fdhod\u0148oval pou\u017e\u00edvate\u013eov m\u00e1lo zast\u00fapen\u00fdch jazykov.<\/p>","protected":false},"author":2,"featured_media":10485,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"templates\/template-full-width.php","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/eurocc.nscc.sk\/en\/wp-json\/wp\/v2\/posts\/10484"}],"collection":[{"href":"https:\/\/eurocc.nscc.sk\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/eurocc.nscc.sk\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/eurocc.nscc.sk\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/eurocc.nscc.sk\/en\/wp-json\/wp\/v2\/comments?post=10484"}],"version-history":[{"count":10,"href":"https:\/\/eurocc.nscc.sk\/en\/wp-json\/wp\/v2\/posts\/10484\/revisions"}],"predecessor-version":[{"id":11058,"href":"https:\/\/eurocc.nscc.sk\/en\/wp-json\/wp\/v2\/posts\/10484\/revisions\/11058"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/eurocc.nscc.sk\/en\/wp-json\/wp\/v2\/media\/10485"}],"wp:attachment":[{"href":"https:\/\/eurocc.nscc.sk\/en\/wp-json\/wp\/v2\/media?parent=10484"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/eurocc.nscc.sk\/en\/wp-json\/wp\/v2\/categories?post=10484"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/eurocc.nscc.sk\/en\/wp-json\/wp\/v2\/tags?post=10484"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}