EuroHPC JU Call for Proposals for Benchmark Access 2024
The purpose of theEuroHPC JU Benchmark Access calls is to support researchers and HPC application developers by giving them the opportunity to test or benchmark their applications on the upcoming/available EuroHPC Pre-exascale and/or Petascale system prior to applying for an Extreme Scale and/or Regular Access.
The EuroHPC Benchmark call is designed for code scalability tests or for test of AI applications and the outcome of which is to be included in the proposal in a future EuroHPC Extreme Scale and Regular Access call. Users receive a limited number of node hours; the maximum allocation period is three months
EuroHPC JU Call for Proposals for Development Access 2024
The purpose of the EuroHPC JU Development Access calls is to support researchers and HPC application developers by giving them the opportunity to develop, test and optimise their applications on the upcoming/available EuroHPC Pre-exascale and/or Petascale system prior to applying for an Extreme Scale and/or Regular Access. The EuroHPC Development call is designed for projects focusing on code and algorithm development and optimisation. As well as development of AI application methods.
This can be in the context of research projects from academia or industry, or as part of large public or private funded initiatives as for instance Centres of Excellence or Competence Centres. Users will typically be allocated a small number of node hours; the allocation period is one year and is renewable up to two times.
Pozývame vás na konzultačno-vzdelávacie stretnutie s expertmi z Národného kompetenčného centra pre HPC. Podujatie je organizovane pre podniky, ktoré pracujú s veľkým množstvom dát, alebo vyvíjajú svoje vlastné produkty a technológie.
Program:
Ako moderne a rýchlo spracovať dáta pomocou HPC
Úspešné príklady z portfólia Národného kompetenčného centra pre HPC
Aká je dostupnosť HPC služieb na Slovensku a v Európe
Ako sa môže podnik zapojiť do bezplatných testovacích projektov
Ponuka školení a prednášok Národného kompetenčného centra pre HPC
Prehliadka superpočítača Devana
Diskusia – v prípade záujmu možnosť dohodnúť osobnú konzultáciu s NCC
Termín: 19.3.2024 (utorok) od 14:00 – 15:30 hod.
Venue: Výpočtové stredisko SAV, Dúbravská cesta 9, 845 35 Karlova Ves
Odborná konferencia Superpočítač a Slovensko v Bratislave15 Nov-Dňa 14. novembra 2024 sa v hoteli Devín v Bratislave uskutočnila odborná konferencia s názvom Superpočítač a Slovensko, ktorú zorganizovalo Ministerstvo investícií, regionálneho rozvoja a informatizácie SR. Konferencia sa zameriavala na aktuálne trendy a vývoj v oblasti vysokorýchlostného počítania na Slovensku. Súčasťou podujatia bola prezentácia L. Demovičovej z Národného kompetenčného centra pre HPC.
Konferencia vysokovýkonného počítania v Portugalsku12 Nov-V poradí 4. Stretnutie vysokovýkonného počítania 2024, ktoré sa konalo 5. a 6. novembra na Univerzite Beira Interior v Covilhã, sa etablovalo ako kľúčové stretnutie používateľov, technikov a partnerov ekosystému vysokovýkonného počítania v Portugalsku.
REGISTRÁCIA OTVORENÁ: Nová séria populárno-náučných prednášok o zaujímavých HPC aplikáciách6 Oct-Otvorili sme registráciu na sériu prednášok v zimnom semestri 2024, kde sa budeme venovať fascinujúcim témam, v ktorých vysokovýkonné počítanie zohráva kľúčovú úlohu. Tento semester sa zameriame na oblasti ako meteorológia, klimatológia, chémia, veľké jazykové modely a mnoho ďalších.
The International Day of Women and Girls in Science, celebrated annually on February 11th, was proclaimed by the United Nations General Assembly in 2015. This day reminds us that women and girls play a crucial role in science and technology and that their support in this area needs to be further enhanced. The contribution of women in science is invaluable, from basic research to the development of technologies. Women, such as Marie Curie, Rosalind Franklin, and many others, have had a fundamental impact on our understanding of the world.
Nevertheless, to this day, their presence in science remains under-dimensioned and often faces obstacles such as prejudice and inequality. Women in STEM (Science, Technology, Engineering, and Mathematics) fields face multiple challenges, including gender stereotypes and a lack of mentors and support. They also very often encounter worse conditions for the possibility of career advancement. These obstacles are very complicated and complex, requiring coordinated efforts at national and international levels to overcome them.
International Day of Women and Girls in Science is a celebration, but also a call to action. The aim is not only to recognize the contributions of women in science, but also to:
Raise awareness of the importance of gender equality in science and research.
Support policies that open up more opportunities for women and girls in scientific disciplines.
Remove barriers that prevent women from reaching their full potential in science.
To inspire future generations of women to follow their passions in science and technology fields.
The International Day of Women and Girls in Science is an important reminder that despite the progress we have made, there is still a long way to go to achieve full gender equality in science. The aim of this international day is to point out that we need to create a world in which women and girls will be able to contribute to scientific progress and innovation without limitations (work, social...).
According to UNESCO data, less than 30% of women currently hold research positions worldwide. Therefore, it is essential to support the full and equal access of women and girls to employment in science.
Doc. Mgr. Mariana Derzsi, PhD. she received her doctorate in physical chemistry at the Institute of Inorganic Chemistry of the SAS. Right after school, she worked at the Polish Academy of Sciences in Krakow, later at the University of Warsaw, and was also a visiting scientist at Chemie Parich Tech in Paris and at the University of Padua in Italy. After ten years, she returned to Slovakia and has been working for six years at the Research Institute of Advanced Technologies in Trnava at the Faculty of Materials Technology of the Slovak Technical University, where she founded the Computer Modeling of Materials research group.
When/how did you find out that you wanted to be a scientist and who or what inspired you to study in the STEM field?
From a young age, during my elementary school years, I was drawn to fairy tales and films for teens with a scientific theme, as well as documentaries about nature. In high school, I gradually realized that I wanted to dedicate my life to learning. I spent a school year in the USA, which opened up new horizons for me. I experienced how enriching mastering a foreign language can be, allowing me to delve into another culture, pose new questions, think differently, and truly understand and feel different perspectives on the same matter. Moreover, rote memorization did not engage me; I needed to understand concepts to remember them. Therefore, when choosing a university, it was clear to me that it would involve natural sciences and languages. Thus, I began interdisciplinary studies at Comenius University: physics at the Faculty of Mathematics, Physics, and Informatics (FMFI), and English language at the Faculty of Arts. During the first year, my fascination with physics grew, especially because of the atmosphere at the FMFI faculty. I was engrossed in books by Stephen Hawking and gradually began to read all the international popular science literature on physics that I could find. At FMFI, I met many professors who treated us, the university students, as equal partners. The lectures were inspiring (the ones I attended 🙂), and several lecturers were legends in our eyes. It was all wonderful, but it also intensified the sense of responsibility for my studies. I aspired to be like them. By the end of the first year, I had decided to dedicate myself fully to physics, so I requested a transfer to the scientific physics program and bid farewell to the Faculty of Arts. This was the moment I knew I wanted to be a scientist and consciously chose this path. It was not easy. After this decision, I failed exams multiple times, even finding myself as the only girl in a written exam, and the only one to repeatedly fail while all the boys passed. I was also hinted at during several supplementary consultations that, as a woman, I should perhaps think more about starting a family. One examiner even asked me why I had come to his exam, noting that girls usually do not choose him, opting instead for examiners through whom they could 'pass' more easily. I did not take it to heart. Those who raised these points did not discriminate against or underestimate me. They were fully committed to me. I think they were simply surprised, and when I completed all the exams I had set as my goal before requesting the transfer, I was told that the transfer would not be allowed. This shocked me, but I did not give up and found a way. Since then, I have been following it.
Did your family, friends and teachers support you in your decision to study STEM subjects?
My family, friends and teachers played key roles. For example, my primary school teacher advised my parents to send me to gymnasium. If she hadn't done that, I would have ended up in gardening school. Who knows, maybe that path would lead me to microbiology. My parents were surprised because I was more of an average student or a slacker. But the teacher probably saw that my academic results were more a consequence of my lack of interest than a reflection of my abilities. I am grateful to her for that. That's when I started thinking about myself differently.
My parents never imposed on me what I should study. They let me choose freely. I will never forget how my father told me and set an example for me every day, that I should find in life what will make me happy and fulfilled and do the best I can. It was very nice and I remembered it forever. Close friends and classmates were my support, mirror and challenge. They played an important role in my decision to study physics. I didn't believe in myself and thought that as a girl I didn't belong at FMFI. I have never been to any knowledge competition, I did not excel in natural sciences, and therefore I would not have dared to make this decision myself. They gave me confidence and convinced me that those things don't matter. That I should go after what draws me and not give up when I haven't even tried it. And so I went. I also had classmates who inspired me with their intellect and approach to study. We challenged each other, for example, in speed reading English classic literature for entrance exams.
What would be your message to girls, who would like to pursue their interest in STEM subjects, but hesitate to enter a field, where women are significantly underrepresented? (or something along this line…)
Go for what draws you and don't give up when you haven't even tried it. And even if you've already tried it, don't give up after the first failures or rejections. If you want it, do it!
Can you please briefly describe what you do now?
I am engaged in computer modeling of molecular and crystal structures of inorganic materials. From the moment I learned that all living and non-living nature is made up of atoms, I have been fascinated by the fact that what essentially distinguishes one object or material from another is primarily the way in which the atoms are arranged. Regardless of the chemical composition, in principle we can achieve some property of the material by arranging the atoms in it appropriately. Not under every condition will it be possible to achieve the desired arrangement of atoms for a given chemical composition, but we can always find conditions under which we can force them to do so. And this is basically what I do. I deal with the predictive modeling of new molecular and crystal structures and the study of the relationship between the structure and the properties of materials that result from the given structure. We can approximate it using the example of diamond and graphite. Both materials are made of the same chemical element - carbon. They differ only in how the carbon atoms in graphite and diamond are arranged. One arrangement makes the material translucent (transmits all light), very hard (we can cut glass with it) and electrically non-conductive (insulator) – that's diamond. A different arrangement causes the material to be opaque, black, because it absorbs all light, it is soft (we easily split the individual layers), and its electrical properties depend on the direction in the crystal (it is more conductive in one direction, less conductive in another) - that is graphite. I focus on the prediction of crystal structures of new transition metal halides and oxides. I focus primarily on binary systems, that is, compounds that consist of two chemical elements. Even with such a simple composition, we observe a fascinating variety of possible atomic arrangements (molecular and crystal structures) and the properties that result from them. Understanding the relationship between structure and properties helps us to predict the behavior of the material based on the structure, and conversely from the behavior of the material we can deduce the arrangement of the atoms in it. This applies to all materials, inorganic as well as organic or complex biological systems. This is simply fascinating!
Odborná konferencia Superpočítač a Slovensko v Bratislave15 Nov-Dňa 14. novembra 2024 sa v hoteli Devín v Bratislave uskutočnila odborná konferencia s názvom Superpočítač a Slovensko, ktorú zorganizovalo Ministerstvo investícií, regionálneho rozvoja a informatizácie SR. Konferencia sa zameriavala na aktuálne trendy a vývoj v oblasti vysokorýchlostného počítania na Slovensku. Súčasťou podujatia bola prezentácia L. Demovičovej z Národného kompetenčného centra pre HPC.
Konferencia vysokovýkonného počítania v Portugalsku12 Nov-V poradí 4. Stretnutie vysokovýkonného počítania 2024, ktoré sa konalo 5. a 6. novembra na Univerzite Beira Interior v Covilhã, sa etablovalo ako kľúčové stretnutie používateľov, technikov a partnerov ekosystému vysokovýkonného počítania v Portugalsku.
REGISTRÁCIA OTVORENÁ: Nová séria populárno-náučných prednášok o zaujímavých HPC aplikáciách6 Oct-Otvorili sme registráciu na sériu prednášok v zimnom semestri 2024, kde sa budeme venovať fascinujúcim témam, v ktorých vysokovýkonné počítanie zohráva kľúčovú úlohu. Tento semester sa zameriame na oblasti ako meteorológia, klimatológia, chémia, veľké jazykové modely a mnoho ďalších.
Project AI-BOOST the EU-funded project launched it’s first competition, offering up to €250K in equity-free funding to up to four high-potential European SMEs and startups that have the technical capacity and experience working on large-scale AI models. Large-scale AI models refer to a new generation of general-purpose AI systems that can adapt to various domains and tasks without significant modification. Thanks to their adaptability, these models hold immense potential to revolutionise various industries.
Notable examples of large-scale AI models include OpenAI’s GPT-4 and Meta’s LLaMA 2. It is of strategic importance for European sovereignty to ensure European companies master this field.
SMEs can gain:
LUMI: up to two prizes of €250K and an allocation of 2 million GPU hours per project on the LUMI supercomputer
Leonardo: up to two prizes of €250K and an allocation of 2 million GPU hours per project on the Leonardo supercomputer
The main objective of the AI-BOOST project, which will also open up further calls in the future, is to support scientific progress in the main areas of artificial intelligence. European small and medium-sized enterprises and startups with high potential, which have technical capacities and experience with working on Large-Scale AI models, can apply.
Zabezpečenie digitálnych hraníc zajtrajška: Qubit Conference® stanovuje nové trendy kybernetickej bezpečnosti v strednej Európe
The event, known for its high-quality lineup of expert lectures and top-notch speakers from around the world, has become a hub for building relationships, sharing progressive ideas, technologies, and strategies that shape the future of digital security for both small and large enterprises. The Qubit Conference® has become a must-attend event for professionals who want to keep up with the latest trends and innovations in cybersecurity, resulting in its growing not only in the number of participants year by year, but also in its impact on the industry.
The conference annually welcomes over 500 participants from dozens of countries, of which an average of 44% are new attendees seeking innovation and challenges in the field of cybersecurity. According to the latest Qubit Conference® satisfaction survey, 100% of the participants proudly state that they would recommend the conference to anyone looking for exceptional experiences in cybersecurity.
Uncovering Trends, Solutions, and a Unique Opportunity for Networking
Digital trends, as well as the threats associated with them, are always at the center of attention. The conference regularly hosts representatives from technology companies that are at the forefront of innovations in cybersecurity. Presentations and workshops focus on introducing revolutionary solutions that change the paradigm in the fight against cyber threats.
Participants can expect concrete examples of implementations and guidance on how to integrate these innovations into their security strategies. This is not just theoretical, as an audience member, but also practical, through interactive workshops where each participant can try out the implementation of new technologies and solutions. This provides numerous opportunities to acquire new skills and understand their practical applications
The Qubit Expo showcases companies that bring the latest innovations in software, hardware, data analysis, and other areas that push the boundaries of what is possible in digital security. With rapid technological advances, there comes a unique opportunity for networking. Discussions and workshops provide participants with opportunities to create new professional or business connections.
Qubit Conference® It is undoubtedly an event that pushes the boundaries of knowledge, innovation, and networking in the field of cybersecurity. This is evidenced by the latest Qubit Conference® satisfaction survey, in which a significant majority, up to 97% of participants, express interest in attending future workshops and training sessions prepared by Qubit in the near future. Additionally, as many as 94% of participants confirm that all educational activities of the Qubit Conference® are focused on current trends. Furthermore, for more than 4/5 of the participants (83%), the conference represents an excellent opportunity for creating new professional relationships or acquiring potential customers.
For the first time on the stage, cyber protection for small and medium-sized enterprises
The Qubit Conference® definitely confirms its position as a leading technology event in Central Europe. This autumn, it launched a series of webinars and cybersecurity training sessions for small and medium-sized enterprises (SMEs). This initiative responds to the growing need to secure digital infrastructure for smaller businesses, which are increasingly exposed to complex cyber threats.
The discussion on the stage was rich in expert perspectives from professionals with extensive experience in cyber protection. The topic of discussion was not only the current threats but also specific strategies and tools that SMEs can implement to enhance their cyber resilience. The concluding panel discussion offered the attending SMEs an invaluable opportunity to interact directly with the experts and obtain specific advice for their individual needs.
The year 2024 will open the debate on the human touch in advanced technologies
The Qubit Conference® proves that it is more than just a one-time event. The organizers have already announced plans to expand and strengthen the community that is emerging within the conference. This will include a regular series of workshops and meetings, which will allow cybersecurity, IT, and innovation experts to maintain contacts and collaborate even after the conference itself has ended.
As Katarína Gamboš, the Senior Event Producer of Qubit Conference®, states, "The year 2024 will not only bring a new beginning but also a revolution in how we perceive and integrate the human touch into advanced technologies. The theme of the upcoming Qubit Conference® Prague 2024, 'Bringing humans back to cyber,' introduces into the discussion the value of the human factor in an era where technological progress moves at an incredible speed. 'Bringing humans back to cyber' is not just a slogan, but also an opportunity for experts from around the world to share their views and ideas on how we should perceive and incorporate the human factor into technological innovations. At Qubit Conference® Prague 2024, our goal is to open a discussion about how we, as individuals, can ensure that technological progress contributes to the benefit of the whole society and maintains ethical standards."
Odborná konferencia Superpočítač a Slovensko v Bratislave15 Nov-Dňa 14. novembra 2024 sa v hoteli Devín v Bratislave uskutočnila odborná konferencia s názvom Superpočítač a Slovensko, ktorú zorganizovalo Ministerstvo investícií, regionálneho rozvoja a informatizácie SR. Konferencia sa zameriavala na aktuálne trendy a vývoj v oblasti vysokorýchlostného počítania na Slovensku. Súčasťou podujatia bola prezentácia L. Demovičovej z Národného kompetenčného centra pre HPC.
Konferencia vysokovýkonného počítania v Portugalsku12 Nov-V poradí 4. Stretnutie vysokovýkonného počítania 2024, ktoré sa konalo 5. a 6. novembra na Univerzite Beira Interior v Covilhã, sa etablovalo ako kľúčové stretnutie používateľov, technikov a partnerov ekosystému vysokovýkonného počítania v Portugalsku.
REGISTRÁCIA OTVORENÁ: Nová séria populárno-náučných prednášok o zaujímavých HPC aplikáciách6 Oct-Otvorili sme registráciu na sériu prednášok v zimnom semestri 2024, kde sa budeme venovať fascinujúcim témam, v ktorých vysokovýkonné počítanie zohráva kľúčovú úlohu. Tento semester sa zameriame na oblasti ako meteorológia, klimatológia, chémia, veľké jazykové modely a mnoho ďalších.
Graduate students and postdoctoral scholars from institutions in Australia, Europe, Japan and the United States are invited to apply for the 13th International High Performance Computing (HPC) Summer School, to be held on 7-12 July, 2024 in Kobe, Japan, hosted by the RIKEN Center for Computational Science. Applications to participate in the summer school will be accepted until 23:59 AOE January 31, 2024.
The summer school is sponsored by the RIKEN Center for Computational Science (R-CCS), the EuroHPC Joint Undertaking (EuroHPC JU), the Pawsey Supercomputing Research Centre (Pawsey) and the ACCESS program. Additional sponsors, who will conduct separate, internal selection processes, include EPCC (U.K.) and NICIS CHPC (South Africa). It is important to note that certain places for the 2024 school are still being offered on a preliminary basis and will be confirmed subject to funding availability.
The summer school will familiarize the best students in computational sciences with major state-of-the-art aspects of HPC and Big Data Analytics for a variety of scientific disciplines, catalyze the formation of networks, provide advanced mentoring, facilitate international exchange and open up further career options.
Leading computational scientists and HPC technologists from partner regions will offer instruction in parallel sessions on a variety of topics such as:
HPC and Big Data challenges in major scientific disciplines
Shared-memory programming
Distributed-memory programming
GPU programming
Performance analysis and optimization on modern CPUs and GPUs
Software engineering
Numerical libraries
Big Data analytics
Deep learning
Scientific visualization
Canadian, European, Japanese, Australian and U.S. HPC-infrastructures
The expense-paid program will benefit scholars from Australia, European, Japanese and U.S. institutions who use advanced computing in their research.
The ideal candidate will have many of the following qualities, however this list is not meant to be a “checklist” for applicants to meet all criteria:
A familiarity with HPC, not necessarily an HPC expert, but rather a scholar who could benefit from including advanced computing tools and methods into their existing computational work
A graduate student with a strong research plan or a postdoctoral fellow in the early stages of their research careers
Regular practice with, or interest in, parallel programming
Applicants from any research discipline are welcome, provided their research activities include computational work.
The first two days of the program comprise two tracks that run concurrently. You need to choose your preferred track in your application.
An introduction to shared-memory parallelism and accelerator programming.
Advanced distributed-memory programming.
School fees, meals and housing will be covered for all accepted applicants to the summer school. Reasonable flight costs will also be covered for those travelling to/from the school.
Odborná konferencia Superpočítač a Slovensko v Bratislave15 Nov-Dňa 14. novembra 2024 sa v hoteli Devín v Bratislave uskutočnila odborná konferencia s názvom Superpočítač a Slovensko, ktorú zorganizovalo Ministerstvo investícií, regionálneho rozvoja a informatizácie SR. Konferencia sa zameriavala na aktuálne trendy a vývoj v oblasti vysokorýchlostného počítania na Slovensku. Súčasťou podujatia bola prezentácia L. Demovičovej z Národného kompetenčného centra pre HPC.
Konferencia vysokovýkonného počítania v Portugalsku12 Nov-V poradí 4. Stretnutie vysokovýkonného počítania 2024, ktoré sa konalo 5. a 6. novembra na Univerzite Beira Interior v Covilhã, sa etablovalo ako kľúčové stretnutie používateľov, technikov a partnerov ekosystému vysokovýkonného počítania v Portugalsku.
REGISTRÁCIA OTVORENÁ: Nová séria populárno-náučných prednášok o zaujímavých HPC aplikáciách6 Oct-Otvorili sme registráciu na sériu prednášok v zimnom semestri 2024, kde sa budeme venovať fascinujúcim témam, v ktorých vysokovýkonné počítanie zohráva kľúčovú úlohu. Tento semester sa zameriame na oblasti ako meteorológia, klimatológia, chémia, veľké jazykové modely a mnoho ďalších.
Named Entity Recognition for Address Extraction in Speech-to-Text Transcriptions Using Synthetic Data
Many businesses spend large amounts of resources for communicating with clients. Usually, the goal is
to provide clients with information, but sometimes there is also a need to request specific information
from them.
In addressing this need, there has been a significant effort put into the development of chatbots
and voicebots, which on one hand serve the purpose of providing information to clients, but they can
also be utilized to contact a client with a request to provide some information.
A specific real-world example is to contact a client, via text or via phone, to update their postal address. The address may have possibly changed over time, so a business needs to update this information
in its internal client database.
Nonetheless, when requesting such information through novel channels|like chatbots or voicebots|
it is important to verify the validity and format of the address. In such cases, an address information
usually comes by a free-form text input or as a speech-to-text transcription. Such inputs may contain
substantial noise or variations in the address format. To this end it is necessary to lter out the noise
and extract corresponding entities, which constitute the actual address. This process of extracting
entities from an input text is known as Named Entity Recognition (NER). In our particular case we
deal with the following entities: municipality name, street name, house number, and postal code. This
technical report describes the development and evaluation of a NER system for extraction of such
information.
Problem Description and Our Approach
This work is a joint effort of Slovak National Competence Center for High-Performance Computing
and nettle, s.r.o., which is a Slovak-based start-up focusing on natural language processing, chatbots,
and voicebots. Our goal is to develop highly accurate and reliable NER model for address parsing. The
model accepts both free text as well as speech-to-text transcribed text. Our NER model constitutes
an important building block in real-world customer care systems, which can be employed in various
scenarios where address extraction is relevant.
The challenging aspect of this task was to handle data which was present exclusively in Slovak
language. This makes our choice of a baseline model very limited.
Currently, there are several publicly available NER models for the Slovak language. These models
are based on the general purpose pre-trained model SlovakBERT [1]. Unfortunately, all these models
support only a few entity types, while the support for entities relevant to address extraction is missing.
A straightforward utilization of popular Large Language Models (LLMs) like GPT is not an option
in our use cases because of data privacy concerns and time delays caused by calls to these rather
time-consuming LLM APIs.
We propose a fine-tuning of SlovakBERT for NER. The NER task in our case is actually a classification task at the token level. We aim at achieving proficiency at address entities recognition with a
tiny number of real-world examples available. In Section 2.1 we describe our dataset as well as a data
creation process. The significant lack of available real-world data prompts us to generate synthetic
data to cope with data scarcity. In Section 2.2 we propose SlovakBERT modifications in order to train
it for our task. In Section 2.3 we explore iterative improvements in our data generation approach.
Finally, we present model performance results in Section 3.
Data
The aim of the task is to recognize street names, house numbers, municipality names, and postal codes
from the spoken sentences transcribed via speech-to-text. Only 69 instances of real-world collected
data were available. Furthermore, all of those instances were highly affected by noise, e.g., natural
speech hesitations and speech transcription glitches. Therefore, we use this data exclusively for testing.
Table 1 shows two examples from the collected dataset.
Artificial generation of training dataset occurred as the only, but still viable option to tackle the
problem of data shortage. Inspired by the 69 real instances, we programmatically conducted numerous
external API calls to OpenAI to generate similar realistic-looking examples. BIO annotation scheme [2]
was used to label the dataset. This scheme is a method used in NLP to annotate tokens in a sequence
as the beginning (B), inside (I), or outside (O) of entities. We are using 9 annotations: O, B-Street,
I-Street, B-Housenumber, I-Housenumber, B-Municipality, I-Municipality, B-Postcode, I-Postcode.
We generated data in multiple iterations as described below in Section 2.3. Our final training
dataset consisted of more than 104
sentences/address examples. For data generation we used GPT3.5-turbo API along with some prompt engineering. Since the data generation through this API is
limited by the number of tokens — both generated as well as prompt tokens—we could not pass the
list of all possible Slovak street names and municipality names within the prompt. Hence, data was
generated with placeholders streetname and municipalityname only to be subsequently replaced
by randomly chosen street and municipality names from the list of street and municipality names,
respectively. A complete list of Slovak street and municipality names was obtained from the web pages
of the Ministry of Interior of the Slovak republic [3].
With the use of OpenAI API generative algorithm we were able to achieve organic sentences without
the need to manually generate the data, which sped up the process significantly. However, employing
this approach did not come without downsides. Many mistakes were present in the generated dataset,
mainly wrong annotations occurred and those had to be corrected manually. The generated dataset was split, so that 80% was used for model’s training, 15% for validation and 5% as synthetic test data,
so that we could compare the performance of the model on real test data as well as on artificial test
data.
Model Development and Training
Two general-purpose pre-trained models were utilized and compared: SlovakBERT [1] and a distilled
version of this model [4]. Herein we refer to the distilled version as DistilSlovakBERT. SlovakBERT
is an open-source pretrained model on Slovak language using a Masked Language Modeling (MLM)
objective. It was trained with a general Slovak web-based corpus, but it can be easily adapted to new
domains to solve new tasks [1]. DistilSlovakBERT is a pre-trained model obtained from SlovakBERT
model by a method called knowledge distillation, which significantly reduces the size of the model
while retaining 97% of its language understanding capabilities.
We modified both models by adding a token classification layer, obtaining in both cases models
suitable for NER tasks. The last classification layer consists of 9 neurons corresponding to 9 entity
annotations: We have 4 address parts and each is represented by two annotations – beginning and
inside of each entity, and one for the absence of any entity. The number of parameters for each model
and its components are summarized in Table 2.
Models’ training was highly susceptible to overfitting. To tackle this and further enhance the
training process we used linear learning rate scheduler, weight decay strategies, and some other hyperparameter tuning strategies.
Computing resources of the HPC system Devana, operated by the Computing Centre, Centre of
operations of the Slovak Academy of Sciences were leveraged for model training, specifically utilizing
a GPU node with 1 NVidia A100 GPU. For a more convenient data analysis and debugging, an
interactive environment using OpenOnDemand was employed, which allows researches remote web
access to supercomputers.
The training process required only 10-20 epochs to converge for both models. Using the described
HPC setting, one epoch’s training time was on average 20 seconds for 9492 samples in the training
dataset for SlovakBERT and 12 seconds for DistilSlovakBERT. Inference on 69 samples takes 0.64
seconds for SlovakBERT and 0.37 seconds for DistilSlovakBERT, which demonstrates model’s efficiency
in real-time NLP pipelines.
Iterative Improvements
Although only 69 instances of real data were present, the complexity of it was quite challenging to
imitate in generated data. The generated dataset was created using several different prompts, resulting
in 11,306 sentences that resembled human-generated content. The work consisted of a number of
iterations. Each iteration can be split into the following steps: generate data, train a model, visualize
obtained prediction errors on real and artificial test datasets, and analyze. This way we identified
patterns that the model failed to recognize. Based on these insights we generated new data that
followed these newly identified patterns. The patterns we devised in various iterations are presented
in Table 3. With each newly expanded dataset both of our models were trained, with SlovakBERT’s
accuracy always exceeding the one of DistilSlovakBERT’s. Therefore, we have decided to further utilize
only SlovakBERT as a base model.
Results
The confusion matrix corresponding to the results obtained using model trained in Iteration 1 (see
Table 3)—is displayed in Table 4. This model was able to correctly recognize only 67.51% of entities in test dataset. Granular examination of errors revealed that training dataset does not represent the
real-world sentences well enough and there is high need to generate more and better representative
data. In Table 4 it is evident, that the most common error was identification of a municipality as a
street. We noticed that this occurred when municipality name appeared before the street name in the
address. As a result, this led to data generation with Iteration 2 and Iteration 3.
This process of detailed analysis of prediction errors and subsequent data generation accounts for
most of the improvements in the accuracy of our model. The goal was to achieve more than 90%
accuracy on test data. Model’s predictive accuracy kept increasing with systematic data generation.
Eventually, the whole dataset was duplicated, with the duplicities being in uppercase/lowercase. (The
utilized pre-trained model is case sensitive and some test instances contained street and municipality
names in lowercase.) This made the model more robust to the form in which it receives input and led
to final accuracy of 93.06%. Confusion matrix of the final model can be seen in Table 5.
There are still some errors; notably, tokens that should have been tagged as outside were occasionally misclassified as municipality. We have opted not to tackle this issue further, as it happens
on words that may resemble subparts of our entity names, but, in reality, do not represent entities
themselves. See an example below in Table 6.
Conclusions
In this technical report we trained a NER model built upon SlovakBERT pre-trained LLM model as
the base. The model was trained and validated exclusively on artificially generated dataset. This well
representative and high quality synthetic data was iteratively expanded. Together with hyperparameter fine-tuning this iterative approach allowed us to reach predictive accuracy on real dataset exceeding
90%. Since the real dataset contained a mere 69 instances, we decided to use it only for testing.
Despite the limited amount of real data, our model exhibits promising performance. This approach
emphasizes the potential of using exclusively synthetic dataset, especially in cases where the amount
of real data is not sufficient for training.
This model can be utilized in real-world applications within NLP pipelines to extract and verify the
correctness of addresses transcribed by speech-to-text mechanisms. In case a larger real-world dataset
is available, we recommend to retrain the model and possibly also expand the synthetic dataset with
more generated data, as the existing dataset might not represent potentially new occurring data
patterns. This model can be utilized in real-world applications within NLP pipelines to extract and verify the
correctness of addresses transcribed by speech-to-text mechanisms. In case a larger real-world dataset
is available, we recommend to retrain the model and possibly also expand the synthetic dataset with
more generated data, as the existing dataset might not represent potentially new occurring data
patterns. The model is available on https://huggingface.co/nettle-ai/slovakbert-address-ner
Acknowledgement
The research results were obtained with the support of the Slovak National competence centre for
HPC, the EuroCC 2 project and Slovak National Supercomputing Centre under grant agreement
101101903-EuroCC 2-DIGITAL-EUROHPC-JU-2022-NCC-01.
AUTHORS
Bibiána Lajčinová – Slovak National Supercomputing Centre
Patrik Valábek – Slovak National Supercomputing Centre, ) Institute of Information Engineering, Automation, and Mathematics, Slovak University of Technology in Bratislava
[1] Matús Pikuliak, Stefan Grivalsky, Martin Konopka, Miroslav Blsták, Martin Tamajka, Viktor Bachratý, Marián Simko, Pavol Balázik, Michal Trnka, and Filip Uhlárik. Slovakbert: Slovak masked language model. CoRR, abs/2109.15254, 2021.
[2] Lance Ramshaw and Mitch Marcus. Text chunking using transformation-based learning. In Third Workshop on Very Large Corpora, 1995.
[3] Ministerstvo vnútra Slovenskej republiky. Register adries. https://data.gov.sk/dataset/register-adries-register-ulic. Accessed: August 21, 2023.
[4] Ivan Agarský. Hugging face model hub. https://huggingface.co/crabz/distil-slovakbert, 2022. Accessed: September 15, 2023.
Intent Classification for Bank Chatbots through LLM Fine-Tuning12 Sep-Tento článok hodnotí použitie veľkých jazykových modelov na klasifikáciu intentov v chatbote s preddefinovanými odpoveďami, určenom pre webové stránky bankového sektora. Zameriavame sa na efektivitu modelu SlovakBERT a porovnávame ho s použitím multilingválnych generatívnych modelov, ako sú Llama 8b instruct a Gemma 7b instruct, v ich predtrénovaných aj fine-tunovaných verziách. Výsledky naznačujú, že SlovakBERT dosahuje lepšie výsledky než ostatné modely, a to v presnosti klasifikácie ako aj v miere falošne pozitívnych predikcií.
Leveraging LLMs for Efficient Religious Text Analysis5 Aug-The analysis and research of texts with religious themes have historically been the domain of philosophers, theologians, and other social sciences specialists. With the advent of artificial intelligence, such as the large language models (LLMs), this task takes on new dimensions. These technologies can be leveraged to reveal various insights and nuances contained in religious texts — interpreting their symbolism and uncovering their meanings. This acceleration of the analytical process allows researchers to focus on specific aspects of texts relevant to their studies.
Mapping Tree Positions and Heights Using PointCloud Data Obtained Using LiDAR Technology25 Jul-Cieľom spolupráce medzi Národným superpočítačovým centrom (NSCC) a firmou SKYMOVE, v rámci projektu Národného kompetenčného centra pre HPC, bol návrh a implementácia pilotného softvérového riešenia pre spracovanie dát získaných technológiou LiDAR (Light Detection and Ranging) umiestnených na dronoch.
New call for proposal for the Slovak scientific community: access to Leonardo supercomputer
The Leonardo Consortium, composed of six European countries led by Italy, procured and in November 2022 put into operation the currently sixth most powerful supercomputer in the world Slovakia, as one of the consortium members, provides its expertise in the field of HPC through the Computing Center of the Slovak Academy of Sciences and offers high-level technical and engineering support to user communities. Thanks to this collaboration, Slovak users have a unique opportunity to participate in a national call and gain access to the Leonardo system.
The supercomputer Leonardo has a performance of approximately 250 PFlop/s, with a total allocation available for Slovak projects being 56,000 GPU node-hours and 25,000 CPU node-hours per year. Therefore, in collaboration with the National Supercomputing Center, the Computing Center of the Slovak Academy of Sciences is opening the first call for proposals to access Leonardo's computing resources.Given the size of the allocation, support will be provided to a smaller number of projects, primarily those that require simultaneous utilization of a large number of computing nodes.
Access is open to all fields of science and research, and eligible applicants are from Slovak public universities or institutions of the Slovak Academy of Sciences. Supported projects should enable progress and innovation in their chosen area, with added value for addressing societal and/or technological challenges in Slovakia.
Applications used in projects should be thoroughly tested, demonstrating high efficiency and scalability on HPC systems or the need for extensive simulations that require a significant amount of CPU/GPU time. These should be highly parallelized applications capable of efficiently utilizing the available resources, the allocation of which would be challenging on the current national HPC infrastructureDevana supercomputer ). The computational power requirement and resource utilization must be clearly and comprehensively described in the proposal. You can find the specifications of individual Leonardo modules HERE.
The call is open until January 31, 2024. Evaluation and selected projects will be published two weeks after the deadline, and successful applicants will be informed about the next steps via email. Individual projects will be evaluated by the expert staff of VS SAV and NSCC with regard to scientific contribution and the most efficient use of computing capacities. Projects with a maximum duration of 12 months can be submitted through the user portal
Before submitting an application, we ask interested parties to thoroughly familiarize themselves with the conditions of this call..
In case of any questions or uncertainties, please contact us at eurocc@nscc.sk
Odborná konferencia Superpočítač a Slovensko v Bratislave15 Nov-Dňa 14. novembra 2024 sa v hoteli Devín v Bratislave uskutočnila odborná konferencia s názvom Superpočítač a Slovensko, ktorú zorganizovalo Ministerstvo investícií, regionálneho rozvoja a informatizácie SR. Konferencia sa zameriavala na aktuálne trendy a vývoj v oblasti vysokorýchlostného počítania na Slovensku. Súčasťou podujatia bola prezentácia L. Demovičovej z Národného kompetenčného centra pre HPC.
Konferencia vysokovýkonného počítania v Portugalsku12 Nov-V poradí 4. Stretnutie vysokovýkonného počítania 2024, ktoré sa konalo 5. a 6. novembra na Univerzite Beira Interior v Covilhã, sa etablovalo ako kľúčové stretnutie používateľov, technikov a partnerov ekosystému vysokovýkonného počítania v Portugalsku.
REGISTRÁCIA OTVORENÁ: Nová séria populárno-náučných prednášok o zaujímavých HPC aplikáciách6 Oct-Otvorili sme registráciu na sériu prednášok v zimnom semestri 2024, kde sa budeme venovať fascinujúcim témam, v ktorých vysokovýkonné počítanie zohráva kľúčovú úlohu. Tento semester sa zameriame na oblasti ako meteorológia, klimatológia, chémia, veľké jazykové modely a mnoho ďalších.
Supercomputer Leonardo for the Slovak scientific community
The Leonardo Consortium, composed of six European countries led by Italy, procured and in November 2022 put into operation the currently sixth most powerful supercomputer in the world . Slovensko, ako jeden z členov konzorcia, poskytuje prostredníctvom Výpočtového strediska Slovenskej akadémie vied projektu svoje odborné znalosti v oblasti HPC a komunitám používateľov technickú a inžiniersku podporu na vysokej úrovni. Slovenskí používatelia majú vďaka tomu jedinečnú možnosť zapojiť sa do národnej výzvy a získať prístup na systém Leonardo.
Superpočítač Leonardo má výkon približne 250 PFlop/s, pričom celková alokácia dostupná pre slovenské projekty je 56 000 GPU nód-hodín a 25 000 CPU nód-hodín ročne. Computing Center of the Slovak Academy of Sciences, preto v spolupráci s National Supercomputing Center, to access Leonardo's computing resources.Given the size of the allocation, support will be provided to a smaller number of projects, primarily those that require simultaneous utilization of a large number of computing nodes.
Access is open to all fields of science and research, and eligible applicants are from Slovak public universities or institutions of the Slovak Academy of Sciences. Supported projects should enable progress and innovation in their chosen area, with added value for addressing societal and/or technological challenges in Slovakia.
Applications used in projects should be thoroughly tested, demonstrating high efficiency and scalability on HPC systems or the need for extensive simulations that require a significant amount of CPU/GPU time. These should be highly parallelized applications capable of efficiently utilizing the available resources, the allocation of which would be challenging on the current national HPC infrastructureDevana supercomputer ). The computational power requirement and resource utilization must be clearly and comprehensively described in the proposal. You can find the specifications of individual Leonardo modules HERE.
Výzva je otvorená do 31.1. 2024. Vyhodnotenie a vybrané projekty budú zverejnené dva týždne po uzávierke, úspešní riešitelia budú o ďalšom postupe informovaní mailom. Jednotlivé projekty budú hodnotené odborným personálom VS SAV a NSCC s ohľadom na vedecký prínos a čo najefektívnejšie využitie výpočtových kapacít. Projekty s maximálnou dĺžkou trvania 12 mesiacov je možné podať prostredníctvom používateľského portálu register.nscc.sk . Po registrácii je potrebné vyplniť formulár v sekcii The projecty / Leonardo projekt.
Before submitting an application, we ask interested parties to thoroughly familiarize themselves with the conditions of this call..
In case of any questions or uncertainties, please contact us at eurocc@nscc.sk