The National Competence Center for High-Performance Computing (HPC) has established a new strategic partnership with IT Valley Košice under the HPC Ambassador program. This collaboration aims to strengthen technological innovation and development in eastern Slovakia, contributing to the growth of the innovation ecosystem across the country.
Goals and Vision
The partnership focuses on promoting the adoption of HPC technologies among members of IT Valley Košice including companies, academic institutions, and research organizations. Through this effort, we aim to create an environment that fosters the development of talent and innovative companies capable of competing on the global stage.
IT Valley Košice strives to build a technologically advanced business environment in eastern Slovakia, and this partnership significantly contributes to establishing the region as a center of excellence for business, research, and education.
Practical Collaboration
The National Competence Center for HPC will provide relevant information, training, and services to IT Valley Košice members, while IT Valley Košice will promote these opportunities and help identify organizations ready to leverage HPC technologies. Members will gain access to top-notch support and expert consultations.
Additionally, IT Valley Košice will play a key role in facilitating knowledge and technology transfer between academia and the IT industry. Educational events, workshops, and competitions will be organized to enhance the region's innovation potential. We look forward to successful collaboration with IT Valley Košice and to projects that will support the innovative and entrepreneurial environment in Slovakia.
In today's rapidly changing market, it is crucial for small and medium enterprises (SMEs) to effectively leverage new technologies to stay competitive. One of the innovative solutions that can significantly advance SMEs is High-Performance Computing (HPC).
If you are interested in how HPC can improve your business, don't hesitate to join a special online event organized by the National Competence Centre for HPC. The eventwill take place online on September 4, 2024. Registrationis mandatory.
Why You Shouldn't Miss It?
The event will focus on the possibilities of using HPC in Central Europe and provide practical information and examples of how even smaller companies can use this technology to enhance their business processes. HPC can help speed up product development, optimize manufacturing processes, improve service quality, and reduce costs, which is especially important for SMEs looking for ways to gain a competitive edge.
What Can You Look Forward To?
During the event, you will have the opportunity to hear real case studies from various industries that demonstrate how HPC has helped small and medium-sized businesses achieve their goals. You'll learn how engineering companies use HPC for simulations and optimization of design proposals or how pharmaceutical companies utilize this technology to accelerate the development of new drugs.
Collaboration and Support
Another important topic will be the collaboration between SMEs and technology centers. You will learn how these organizations can provide the necessary infrastructure and expertise that SMEs need to utilize HPC. Experts from the National Competence Centers for HPC in Central Europe will also present opportunities to access modern computing resources that would otherwise be financially inaccessible.
Register Today!
Don't miss this unique opportunity and join the event on September 4, 2024, which will take place online. It's a chance to gain valuable information for free, make new connections, and discover how HPC can take your business to the next level. Registration is open, so don't hesitate and secure your spot today!
Registrationis open, so don't hesitate and secure your spot today! We also have a giveaway for event participants: giveaway!
Get inspired and find out how you can gain a competitive advantage in the global market with HPC!
Leveraging LLMs for Efficient Religious Text Analysis
The analysis and research of texts with religious themes have historically been the domain of philosophers, theologians, and other social sciences specialists. With the advent of artificial intelligence, such as the large language models (LLMs), this task takes on new dimensions. These technologies can be leveraged to reveal various insights and nuances contained in religious texts — interpreting their symbolism and uncovering their meanings. This acceleration of the analytical process allows researchers to focus on specific aspects of texts relevant to their studies.
One possible research task in the study of texts with religious themes involves examining the works of authors affiliated with specific religious communities. By comparing their writings with the official doctrines and teachings of their denominations, researchers can gain deeper insights into the beliefs, convictions, and viewpoints of the communities shaped by the teachings and unique contributions of these influential authors.
This report proposes an approach utilizing embedding indices and LLMs for efficient analysis of texts with religious themes. The primary objective is to develop a tool for information retrieval, specifically designed to efficiently locate relevant sections within documents. The identification of discrepancies between the retrieved sections of texts from specific religious communities and the official teaching of the particular religion the community originates from is not part of this study; this task is entrusted to theological experts.
This work is a joint effort of Slovak National Competence Center for High-Performance Computing and the Faculty of Theology at Trnava University. Our goal is to develop a tool for information retrieval using LLMs to help theologians analyze religious texts more efficiently. To achieve this, we are leveraging resources of HPC system Devana to handle the computations and large datasets involved in this project.
Dataset
The texts used for the research in this study originate from the religious community known as the Nazareth Movement (commonly referred to as ”Beňovci”), which began to form in the 1970s. The movement, which some scholars identify as having sect-like characteristics, is still active today, in reduced and changed form. Its founder, Ján Augustín Beňo (1921 - 2006), was a secretly ordained Catholic priest during the totalitarian era. Beňo encouraged members of the movement to actively live their faith through daily reading of biblical texts and applying them in practice through specific resolutions. The movement spread throughout Slovakia, with small communities existing in almost every major city. It also spread to neighboring countries such as Poland, the Czech Republic, Ukraine, and Hungary. In 2000, the movement included approximately three hundred married couples, a thousand children, and 130 priests and students preparing for priesthood. The movement had three main goals: radical prevention in education, fostering priests who could act as parental figures to identify and nurture priestly vocations in children, and the production and distribution of samizdat materials needed for catechesis and evangelization.
27 documents with texts from this community are available for research. These documents, which significantly influenced the formation of the community and its ideological positions, were reproduced and distributed during the communist regime in the form of samizdats — literature banned by the communist regime. After the political upheaval, many of them were printed and distributed to the public outside the movement. Most of the analyzed documents consist of texts intended for ”morning reflections” — short meditations on biblical texts. The documents also include the founder’s comments on the teachings of the Catholic Church and selected topics related to child rearing, spiritual guidance, and catechesis for children.
Although the documents available to us contained a few duplications, this did not pose a problem for the information retrieval task and will thus remain unaddressed in this report. All of the documents are written exclusively in Slovak language.
One of the documents is annotated for test purposes by experts from the partner faculty, who have long been studying the Nazareth Movement. By annotations, we refer to text parts labeled as belonging to one of the five classes, where these classes represent five topics, namely
Directive obedience
Hierarchical upbringing
Radical adoption of life model
Human needs fulfilled only in religious community and family
Strange/Unusual/Intense
Additionally, each of this topics is supplemented with a set of queries designed to test the retrieval capabilities of our solution.
Strategy/Solution
There are multiple strategies appropriate for solving this task, including text classification, topic modelling, retrieval-augmented generation (RAG), and fine-tuning of LLMs. However, the theologians’ requirement is to identify specific parts of the text for detailed analysis, necessitating the retrieval of exact wording. Therefore, a choice was made to leverage information retrieval. This approach differs from RAG, which typically incorporates both information retrieval and text generation components, in focusing solely on retrieving textual data, without the additional step of new content generation.
Information retrieval leverages LLMs to transform complex data such as text, into a numerical representation that captures the semantic meaning and context of the input. This numerical representation, known as embedding, can be used to conduct semantic searches by analysing the positions and proximity of embeddings within a multi-dimensional vector space. By using queries, the system can retrieve relevant parts of the text by measuring the similarity between the query embeddings and the text embeddings. This approach does not require any fine-tuning of the existing LLMs, therefore the models can be used without any modification and the workflow remains quite simple.
Model choice
Information retrieval leverages LLMs to transform complex data such as text, into a numerical representation that captures the semantic meaning and context of the input. This numerical representation, known as embedding, can be used to conduct semantic searches by analysing the positions and proximity of embeddings within a multi-dimensional vector space. By using queries, the system can retrieve relevant parts of the text by measuring the similarity between the query embeddings and the text embeddings.
These four models were leverages to acquire vector representations of the chunked text, and their specific contributions will be discussed in the following parts of the study.
Data preprocessing
The first step of data preprocessing involved text chunking. The primary reason for this step was to meet the requirement of religious scholars for retrieval of paragraph-sized chunks. Besides, documents needed to be split into smaller chunks anyway due to the limited input lengths of some LLMs. For this purpose, the Langchain library was utilized. It offers hierarchical chunking that produces overlapping chunks of a specific length (with a desired overlap) to ensure that the context is preserved. Chunks with lengths of 300, 400, 500 and 700 symbols were generated. Subsequent preprocessing steps included removal of diacritics, case normalization according to the requirements of the models and stopwords removal. The removal of stopwords is a common practice in natural language processing tasks. While some models may benefit from the exclusion of stopwords to improve relevancy of retrieved chunks, others may take advantage of retaining stopwords to preserve contextual information essential for understanding the text.
Vector Embeddings
Vector embeddings were created from text chunks using selected pre-trained language models.
For the Slovak-BERT model, generating embedding involves leveraging the model without any additional layers for inference and then using the first embedding, which contains all the semantic meaning of the chunk, as the context embedding. Other models produce embeddings in required form, so no further postprocessing was needed.
In the subsequent results section, the performance of all created embedding models will be analyzed and compared based on their ability to capture and represent the semantic content of the text chunks.
Results
Prior to conducting quantitative tests, all embedding indices underwent preliminary evaluation to determine the level of understanding of the Slovak language and the specific religious terminology by the selected LLMs. This preliminary evaluation involved subjective judgement of the relevance of retrieved chunks.
These tests revealed that the E5 model embeddings exhibit limited effectiveness on our data. When retrieving for a specific query, the retrieved chunks contained most of the key words used in the query, but did not contain the context of the query. One of the explanations could be that this model prioritizes word-level matches over the nuanced context in Slovak language, because it’s possible that the training data of this model for Slovak was less extensive or less contextually rich, leading to weaker performance. However, these observations are not definitive conclusions but rather hypotheses based on current, limited results. A decision was made not to further evaluate the performance of the embedding indices leveraging E5 embeddings, as it seemed irrelevant given the inability to effectively capture the nuances of the religious texts. On the other hand, the abilities of Slovak-BERT model, based on the RoBERTa architecture characterized by its relatively simple architecture, exceeded the expectations. Moreover, the performance of text-embedding-3-small and BGE M3 embeddings met expectations, as the first test, subjectively evaluated, demonstrated a very good grasp of the context, proficiency in Slovak language, and understanding of the nuances within the religious texts.
Therefore, quantitative tests were performed only on embedding indices utilizing Slovak-BERT, OpenAI’s text-embedding-3-small and BGE M3 embeddings.
Given the problem specification and the nature of test annotations, there arises a potential concern regarding the quality of the annotations. It is possible that some text parts were misclassified as there may be sections of text that belong to multiple classes. This, combined with the possibility of human error, can affect the consistency and accuracy of the annotations.
With this consideration in mind, we have opted to focus solely on recall evaluation. By recall, we mean the proportion of correctly retrieved chunks out of the total number of annotated chunks, regardless of the fraction of false positive chunks. Recall will be evaluated for every topic and for every length-specific embedding index for all selected LLMs.
Moreover, the provided test queries might also reflect the complexity and interpretative nature of religious studies. For example, consider a query ”God’s will” for the topic Directive obedience. While careful reader understands how this query relates to the given topic, it might not be as clear to a language model. Therefore, apart from evaluating using provided queries, another evaluation was conducted using queries acquired through contextual augmentation. Contextual/query augmentation is a prompt engineering technique for enhancing text data quality and is well-documented in various research papers , . This technique involves prompting a language model to generate a new query based on initial query and other contextual information in order to formulate a better query. Language model used for generation of queries through query augmentation technique was GPT 3.5 and these queries will be referred to as ”GPT queries” throughout the rest of the report.
Slovak-BERT embedding indices
Recall evaluation for embedding indices utilizing Slovak-BERT embeddings for four different chunk sizes with and without stopwords removal is presented in Figure 1The evaluation covers each topic specified in the list in Section 2 and includes both original queries and GPT queries.
We observe, that GPT queries generally yield better results compared to the original queries, except for the last two topics, where both sets of queries produce similar results. Also, it is apparent, that Slovak-BERT-based embeddings benefit from stopwords removal in most cases. The highest recall values were achieved for the third topic Radical adoption of life model, with the chunk size of 700 symbols with removed stopwords, reaching more than 47%. In contrast, the worst results were observed for the topic Strange/Unusual/Intense, where neither the original nor GPT queries successfully retrieved relevant parts. In some cases none of the relevant parts were retrieved at all.
Recall values obtained for all topics using both original and GPT queries, across various chunk sizes of embeddings generated using the Slovak-BERT model. Embedding indices marked as +SW include stopwords, while -NoSW indicates stopwords were removed.
OpenAI’s text-embedding-3-small embedding indices
Similar to the evaluation for Slovak-BERT embedding indices, evaluation charts for embedding indices utilizing OpenAI’s text-embedding-3-small embeddings are presented in Figure 2The recall values are generally much higher than those observed with Slovak-BERT embeddings. As with the previous results, GPT queries produce better outcomes. We can observe a subtle trend in recall value and chunk size dependency – longer chunk sizes generally yield higher recall values.
An interesting observation can be made for the topic Radical adoption of life model. When using the original queries, hardly any relevant results were retrieved. However, when using GPT queries, recall values were much higher, reaching almost 90% for chunk sizes of 700 symbols.
Regarding the removal of stopwords, its impact on embeddings varies. For topics 4 and 5, stopwords removal proves beneficial. However, for the other topics, this preprocessing step does not offer advantages.
Topics 4 and 5 exhibited the weakest performance among all topics. This may be due to the nature of the queries provided for these topics, which are quotes or full sentences, compared to queries for other topics, that are phrases, keywords or expressions. It appears that this model performs better with the latter type of queries. On the other hand, since the queries for topics 4 and 5 are full sentences, the embeddings benefit from stopwords removal, as it probably helps in handling the context of sentence-like queries.
Topic 4 is very specific and abstract, while topic 5 is very general, making it understandable that capturing this topic in queries is challenging. The specificity of topic 4 might require more nuanced test queries, as the provided test queries probably did not contain all nuances of a given topic. Conversely, the general nature of topic 5 might benefit from a different analytical approach. Methods like Sentiment Analysis could potentially grasp the strange, unusual, or intense mood in relation to the religious themes analysed.
BGE M3 embedding indices
Evaluation charts for embedding indices utilizing BGE M3 embeddings are presented in Figure 3The recall values demonstrate a performance falling between Slovak-BERT and OpenAI’s text-embedding-3-small embeddings. While, in some cases, not reaching the recall values of OpenAI’s embeddings, BGE M3 embeddings show competitive performance, particularly considering their open-source availability compared to OpenAI’s embeddings, that are accessible through API, which might pose a problem with data confidentiality.
With these embeddings, we also observe the same phenomenon as with OpenAI’s text-embedding-3-small embeddings: shorter, phrase-like queries are preferred over quote-like queries. Therefore, recall values are higher for first three topics.
Stopwords removal seems to be mostly beneficial, mainly for the last two topics.
Conclusion
This paper presents an approach for analysis of text with religious themes with the use of text numerical representations known as embeddings, generated by three selected pre-trained large language models: Slovak-BERT, OpenAI’s text-embedding-3-small and BGE M3 embedding model. These models were selected after it was evaluated, that their proficiency in Slovak language and religious terminology is sufficient to handle the task of information retrieval for a given set of documents.
Challenges related to quality of test queries were addressed using query augmentation technique. This approach helped in formulating appropriate queries, resulting in more relevant retrieval of text chunks, capturing all the nuances of topics that interest theologians.
Evaluation results proved the effectiveness of the embeddings produced by these models, particularly the text-embedding-3-small from OpenAI, which exhibited a strong contextual understanding and linguistic proficiency. The recall value for this model’s retrieval abilities varied depending of the topic and queries used, with the highest values reaching almost 90% for topic Radical adoption of life model when using GPT queries and chunk length of 700 symbols. Generally, text-embedding-3-small performed best with the longest chunk lengths studied, showing a trend of increasing recall with the increase in chunk length. The topic Strange/Unusual/Intense had the lowest recall, possibly due to the uncertainty in topic specification.
For Slovak-BERT embedding indices, the recall values were slightly lower, but still impressive given the simplicity of this language model. Better results were achieved using GPT queries, with the best recall value of 47.1% for the topic Radical adoption of life model at a chunk length of 700 symbols, with embeddings created from chunks with removed stropwords. Generally, this embedding model benefited most from the stopwords removal preprocessing step.
As for BGE M3 embeddings, the result were impressive, achieving high recall, though not as high as OpenAI’s embeddings. However, considering that BGE M3 is an open-source model, these results are remarkable.
These findings highlight the potential of leveraging LLMs for specialized domains like analysis of texts with religious themes. Future work could explore the connections between text chunks using clustering techniques with embeddings to discover hidden associations and inspirations of the text authors. For theologians, future work lies in examining the retrieved text parts to identify deviations from official teaching of Catholic Church, shedding light on movement’s interpretations and insights.
Acknowledgment
Research results were obtained with the support of the Slovak National competence centre for HPC, the EuroCC 2 project and Slovak National Supercomputing Centre under grant agreement 101101903-EuroCC 2-DIGITAL-EUROHPC-JU-2022-NCC-01.
Computational resources were procured in the national project National competence centre for high performance computing (project code: 311070AKF2) funded by European Regional Development Fund, EU Structural Funds Informatization of society, Operational Program Integrated Infrastructure.
Bibiána Lajčinová – Slovak National Supercomputing Centre Jozef Žuffa – Faculty of Theology, Trnava University, Milan Urbančok – Faculty of Theology, Trnava University,
References:
[1] Matúš Pikuliak, Štefan Grivalský, Martin Konôpka, Miroslav Blšťák, Martin Tamajka, Viktor Bachratý, Marián Šimko, Pavol Balážik, Michal Trnka, and Filip Uhlárik. Slovakbert: Slovak masked language model, 2021.
[2] Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation, 2024.
[3] Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. Multi-lingual e5 text embeddings: A technical report, 2024.
Odborná konferencia Superpočítač a Slovensko v Bratislave15 Nov-Dňa 14. novembra 2024 sa v hoteli Devín v Bratislave uskutočnila odborná konferencia s názvom Superpočítač a Slovensko, ktorú zorganizovalo Ministerstvo investícií, regionálneho rozvoja a informatizácie SR. Konferencia sa zameriavala na aktuálne trendy a vývoj v oblasti vysokorýchlostného počítania na Slovensku. Súčasťou podujatia bola prezentácia L. Demovičovej z Národného kompetenčného centra pre HPC.
Konferencia vysokovýkonného počítania v Portugalsku12 Nov-V poradí 4. Stretnutie vysokovýkonného počítania 2024, ktoré sa konalo 5. a 6. novembra na Univerzite Beira Interior v Covilhã, sa etablovalo ako kľúčové stretnutie používateľov, technikov a partnerov ekosystému vysokovýkonného počítania v Portugalsku.
REGISTRÁCIA OTVORENÁ: Nová séria populárno-náučných prednášok o zaujímavých HPC aplikáciách6 Oct-Otvorili sme registráciu na sériu prednášok v zimnom semestri 2024, kde sa budeme venovať fascinujúcim témam, v ktorých vysokovýkonné počítanie zohráva kľúčovú úlohu. Tento semester sa zameriame na oblasti ako meteorológia, klimatológia, chémia, veľké jazykové modely a mnoho ďalších.
The Starmus Science Festival, known for its unique combination of scientific lectures, music, and art, was held this year in May in Bratislava. This unique festival attracted numerous science enthusiasts who enjoyed inspirational presentations and discussions with leading scientists from around the world.
The festival was not just about passively watching lectures. Participants had the opportunity to engage directly in various scientific demonstrations and interactive activities that provided practical examples of scientific principles. The demonstration booths were popular spots where visitors could try out different experiments and technologies.
In the panel discussions, experts also addressed ethical questions and the societal impact of new technologies. These discussions provided a deeper understanding of the challenges we face in connection with rapid technological advancement.
One of the most intriguing lectures of the festival was given by Neil Lawrence, titled "What Makes Us Unique in the Age of AI." Neil Lawrence, a renowned scientist in the field of artificial intelligence, covered a wide range of topics related to our uniqueness in an era of rapid AI development. He discussed how we can preserve human values and abilities at a time when artificial intelligence is increasingly penetrating our lives. The lecture was inspiring and provided deep insights into the future of human-technology interaction.
Neil Lawrence spoke about the importance of an interdisciplinary approach in science and technological progress. His presentation emphasized how crucial collaboration between different fields is to achieving significant scientific discoveries. He pointed out that the combination of various scientific disciplines can lead to new and groundbreaking insights.
Another part of the lecture focused on the latest discoveries in space. Lawrence used visualizations and animations to explain complex concepts in a simple way, which appealed to the general audience. During the lecture, he also addressed the history of space exploration. He illustrated his points with numerous historical photographs and videos, which added an authentic and informative character to his presentation. Another significant topic was breakthroughs in genetics and biotechnology. Lawrence explained how these new technologies have the potential to treat previously incurable diseases and improve the quality of life for many people. In discussions about artificial intelligence, he emphasized its ability to transform various sectors, including medicine, transportation, and education. He stressed the importance of ethics and responsibility in the development and implementation of AI technologies.
Another crucial point in his lecture was the advancements in renewable energy sources and sustainability. He highlighted the importance of investing in solar and wind energy and innovative technologies that can help reduce the carbon footprint. The discussion also focused on global initiatives and cooperation between different countries to address climate change. Lawrence also focused on the latest technologies in medicine, particularly AI, explaining how artificial intelligence helps doctors diagnose diseases more quickly and accurately. He also spoke about how new technologies enable personalized medicine tailored to individual patients.
Another topic was quantum technology in computers and communication. He emphasized how quantum computers can revolutionize various sectors, including medicine and finance, by allowing faster and more efficient information processing. He also discussed the importance of oceans for our planet and the need to protect them. He warned about threats like pollution and climate change that endanger the marine ecosystem and stressed the need for international cooperation in ocean protection.
Finally, he addressed the issue of space debris and its impact on future space missions. He discussed technologies and strategies being developed to address the growing problem of debris in orbit. The final part of the lecture focused on the challenges and benefits of integrating neuroscience and artificial intelligence. He discussed how AI can help understand and treat neurological disorders and how studying the brain can contribute to the development of smarter AI systems.
Lawrence also talked about ecological innovations and their potential to change the way we live and work. He discussed how new technologies can contribute to sustainable development and reduce the negative impact on the environment. He also addressed the development of space technologies and their potential to improve life on Earth. He spoke about how space research contributes to progress in areas such as material sciences, energy, and communication.
The last discussion focused on the importance of education in science and technology for future generations. Lawrence emphasized the need for investments in educational programs that promote critical thinking and innovative solutions to global problems. He also discussed virtual reality (VR) technology and its applications in education and healthcare. He explained how VR can enhance learning by providing immersive and interactive environments and how it can help patients in rehabilitation and therapy.
The latest information emphasized the importance of interdisciplinary research and collaboration between different scientific fields. Lawrence explained how the combination of various expertise can lead to innovative solutions to complex global problems such as climate change and health crises (pandemics).
Scientific and Artistic Projects
The festival also brought a discussion about the future of art and science. Collaboration between artists and scientists was showcased through various multimedia projects that demonstrated how these two worlds can come together to create innovative and inspiring works.
The Starmus festival is not only a celebration of science but also a platform for sharing knowledge and inspiration. This year's edition in Bratislava once again highlighted the importance of dialogue between science and the public. It allowed scientists, artists, and the general public to meet, discuss, and jointly seek solutions to current global challenges. We are already looking forward to the next editions and the new discoveries they will bring.
About Starmus Festival
Starmus is a festival of science, art, and music created by Garik Israelian, PhD., an astrophysicist from the Institute of Astrophysics of the Canary Islands (IAC), and Sir Brian May, PhD., an astrophysicist and the lead guitarist of the iconic rock band Queen. It consists of presentations by astronauts, cosmonauts, Nobel laureates, thinkers, and prominent figures from various scientific and musical fields. Starmus brings these exceptional people together to share their knowledge and experiences and to jointly seek answers to humanity's big questions.
Stephen Hawking Medal for Science Communication
In 2015, Stephen Hawking and Alexei Leonov, along with Brian May, created the Stephen Hawking Medal for Science Communication, awarded to individuals and teams for significant contributions to science communication. Previous recipients of the Stephen Hawking Medal include Dr. Jane Goodall, Elon Musk, Neil deGrasse Tyson, Brian Eno, Hans Zimmer, and the documentary Apollo 11.
This year's Starmus festival brought a wealth of new knowledge and inspiration that will resonate with participants and the general public for a long time. The festival once again confirmed its important role in promoting science, art, and education worldwide.
Real-life Examples of HPC Utilization in Poland, Czech Republic, and Slovakia
Today, we hosted an informative webinar that highlighted the potential of high-performance computing (HPC) through real-life success stories and engaging projects supported by National Competence Centers for HPC. In addition to examples from the Slovak NCC, the webinar also showcased the expertise and experiences of neighboring competence centers in the Czech Republic and Poland.
Michal Pitoňák shared experiences from four successful HPC use cases, including the transfer and optimization of CFD computational workflows in the HPC environment, anomaly detection in time series to prevent gambling using deep learning, entity identification for address extraction from transcribed interviews using synthetic data, and measurement of structural parameters of capsules using AI and ML techniques. Tomáš Karásek presented examples of using artificial intelligence to solve engineering problems focused on energy and transportation. Szymon Mazurek introduced the SpeakLeash initiative, a community-driven project to develop a national large language model (LLM) ecosystem in Poland.
Don't miss the next opportunity to get inspired and discover how HPC can support innovation and success in your projects.
Join us for the next webinar on September 4th from 14:00 to 15:30.
Register on our website EuroCC Slovakia and secure your spot at this event full of inspiring and practical insights.
Mapping Tree Positions and Heights Using PointCloud Data Obtained Using LiDAR Technology
The goal of the collaboration between the Slovak National Supercomputing Centre (NSCC) and the company SKYMOVE within the National Competence Center for HPC project was to design and implement a pilot software solution for processing data obtained using LiDAR (Light Detection and Ranging) technology mounted on drones.
Data collection
LiDAR is an innovative method of remote distance measurement that is based on measuring the travel time of laser pulse reflections from objects. LiDAR emits light pulses that hit the ground or object and return to the sensors. By measuring the return time of the light, LiDAR determines the distance to the point where the laser beam was reflected.
LiDAR can emit 100k to 300k pulses per second, capturing dozens to hundreds of pulses per square meter of the surface, depending on specific settings and the distance to the scanned object. This process creates a point cloud (PointCloud) consisting of potentially millions of points. Modern LiDAR use involves data collection from the air, where the device is mounted on a drone, increasing the efficiency and accuracy of data collection. In this project, drones from DJI, particularly the DJI M300 and Mavic 3 Enterprise (Fig. 1), were used for data collection. The DJI M300 is a professional drone designed for various industrial applications, and its parameters make it suitable for carrying LiDAR.
The DJI M300 drone was used as a carrier for the Geosun LiDAR (Fig. 1). This is a mid-range, compact system with an integrated laser scanner, a positioning, and orientation system. Given the balance between data collection speed and data quality, the data was scanned from a height of 100 meters above the surface, allowing for the scanning of larger areas in a relatively short time with sufficient quality.
The collected data was geolocated in the S-JTSK coordinate system (EPSG:5514) and the Baltic Height System after adjustment (Bpv), with coordinates given in meters or meters above sea level. In addition to LiDAR data, aerial photogrammetry was performed simultaneously, allowing for the creation of orthophotomosaics. Orthophotomosaics provide a photographic record of the surveyed area in high resolution (3 cm/pixel) with positional accuracy up to 5 cm. The orthophotomosaic was used as a basis for visual verification of the positions of individual trees.
Figure 1. DJI M300 Drone (left) and Geosun LiDAR (right).
Data classification
The primary dataset used for the automatic identification of trees was a LiDAR point cloud in LAS/LAZ format (uncompressed and compressed form). LAS files are a standardized format for storing LiDAR data, designed to ensure efficient storage of large amounts of point data with precise 3D coordinates. LAS files contain information about position (x, y, z), reflection intensity, point classification, and other attributes necessary for LiDAR data analysis and processing. Due to their standardization and compactness, LAS files are widely used in geodesy, cartography, forestry, urban planning, and many other fields requiring detailed and accurate 3D representations of terrain and objects.
The point cloud needed to be processed into a form that would allow for an easy identification of individual tree or vegetation points. This process involves assigning a specific class to each point in the point cloud, known as classification.
Various tools can be used for point cloud classification. Given our positive experience, we decided to use the Lidar360 software from GreenValley International [1]. In the point cloud classification, the individual points were classified into the following categories: unclassified (1), ground (2), medium vegetation (4), high vegetation (5), buildings (6). A machine learning method was used for classification, which, after being trained on a representative training sample, can automatically classify points of any input dataset (Fig. 2).
The training sample was created by manually classifying points in the point cloud into the respective categories. For the purposes of automated tree identification in this project, the ground and high vegetation categories are essential. However, for the best classification results of high vegetation, it is also advisable to include other classification categories. The training sample was composed of multiple smaller areas from the entire region including all types of vegetation, both deciduous and coniferous, as well as various types of buildings. Based on the created training sample, the remaining points of the point cloud were automatically classified. It should be noted that the quality of the training sample significantly affects the final classification of the entire area.
Figure 2. Example of a point cloud of an area colored using an orthophotomosaic (left) and the corresponding classification (right) in CloudCompare.
Data segmentation
In the next step, the classified point cloud was segmented using the CloudCompare software [2]. Segmentation generally means dividing classified data into smaller units – segments that share common characteristics. The goal of segmenting high vegetation was to assign individual points to specific trees.
For tree segmentation, the TreeIso plugin in the CloudCompare software package was used, which automatically recognizes trees based on various height and positional criteria (Fig. 3). The overall segmentation consists of three steps:
Grouping points that are close together into segments and removing noise.
Merging neighboring point segments into larger units.
Composing individual segments into a whole that forms a single tree.
The result is a complete segmentation of high vegetation. These segments are then saved into individual LAS files and used for further processing to determine the positions of individual trees. A significant drawback of this tool is that it operates only in serial mode, meaning it can utilize only one CPU core, which greatly limits its use in an HPC environment.
Figure 3. Segmented point cloud in CloudCompare using the TreeIso plugin module.
As an alternative method for segmentation, we explored the use of orthophotomosaics of the studied areas. Using machine learning methods, we attempted to identify individual tree crowns in the images and, based on the geolocational coordinates determined, identify the corresponding segments in the LAS file. For detecting tree crowns from the orthophotomosaic, the YOLOv5 model [3] with pretrained weights from the COCO128 database [4] was used. The training data consisted of 230 images manually annotated using the LabelImg tool [5]. The training unit consisted of 300 epochs, with images divided into batches of 16 samples, and their size was set to 1000x1000 pixels, which proved to be a suitable compromise between computational demands and the number of trees per section. The insufficient quality of this approach was particularly evident in areas with dense vegetation (forested areas), as shown in Figure 4. We believe this was due to the insufficient robustness of the chosen training set, which could not adequately cover the diversity of image data (especially for different vegetative periods). For these reasons, we did not develop segmentation from photographic data further and focused solely on segmentation in the point cloud.
Figure 4. Tree segmentation in the orthophotomosaic using the YOLOv5 tool. The image illustrates the problem of detecting individual trees in the case of dense vegetation (continuous canopy).
To fully utilize the capabilities of the Devana supercomputer, we deployed the lidR library [6] in its environment. This library, written in R, is a specialized tool for processing and analyzing LiDAR data, providing an extensive set of functions and tools for reading, manipulating, visualizing, and analyzing LAS files. With lidR, tasks such as filtering, classification, segmentation, and object extraction from point clouds can be performed efficiently. The library also allows for surface interpolation, creating digital terrain models (DTM) and digital surface models (DSM), and calculating various metrics for vegetation and landscape structure. Due to its flexibility and performance, lidR is a popular tool in geoinformatics and is also suitable for HPC environments, as most of its functions and algorithms are fully parallelized within a single compute node, allowing for full utilization of available hardware. When processing large datasets where the performance or capacity of a single compute node is insufficient, splitting the dataset into smaller parts and processing them independently can leverage multiple HPC nodes simultaneously.
The lidR library includes the locate_trees() function, which can reliably identify tree positions. Based on selected parameters and algorithms, the function analyzes the point cloud and identifies tree locations. In our case, the lmf algorithm, based on maximum height localization, was used [7]. The algorithm is fully parallelized, enabling efficient processing of relatively large areas in a short time.
The identified tree positions can then be used in the silva2016 algorithm for segmentation with the segment_trees() function [8]. This function segments the identified trees into separate LAS files (Fig. 5), similar to the TreeIso plugin module in CloudCompare. These segmented trees in LAS files are then used for further processing, such as determining the positions of individual trees using the DBSCAN clustering algorithm [9].
Figure 5. Tree positions determined using the lmf algorithm (left, red dots) and corresponding tree segments identified by the silva2016 algorithm (right) using the lidR library.
Detection of tree trunks using the DBSCAN clustering algorithm
To determine the position and height of trees in individual LAS files obtained from segmentation, we used various approaches. The height of each tree was obtained based on the z-coordinates for each LAS file as the difference between the minimum and maximum coordinates of the point clouds. Since some point cloud segments contained more than one tree, it was necessary to identify the number of tree trunks within these segments.
Tree trunks were identified using the DBSCAN clustering algorithm with the following settings: maximum distance between two points within one cluster (= 1 meter) and minimum number of points in one cluster (= 10). The position of each identified trunk was then obtained based on the x and y coordinates of the cluster centroids. The identification of clusters using the DBSCAN algorithm is illustrated in Figure 6.
Figure 6. Segments of the point cloud, PointCloud (left column), and the corresponding detected clusters at heights of 1-5 meters (right column).
Determining tree heights using surface interpolation
As an alternative method for determining tree heights, we used the Canopy Height Model (CHM). CHM is a digital model that represents the height of the tree canopy above the terrain. This model is used to calculate the height of trees in forests or other vegetative areas. CHM is created by subtracting the Digital Terrain Model (DTM) from the Digital Surface Model (DSM). The result is a point cloud, or raster, that shows the height of trees above the terrain surface (Fig. 7).
If the coordinates of tree's position are known, we can easily determine the corresponding height of the tree at that point using this model. The calculation of this model can be easily performed using the lidR library with the grid_terrain() function, which creates the DTM, and the grid_canopy() function, which calculates the DSM.
Figure 7. Canopy Height Model (CHM) for the studied area (coordinates in meters on the X and Y axes), with the height of each point in meters represented using a color scale.
Comparison of results
To compare the results achieved by the approaches mentioned before, we focused on the Petržalka area in Bratislava, where manual measurements of tree positions and heights had already been conducted. From the entire area (approximately 3500x3500 m), we selected a representative smaller area of 300x300 m (Fig. 2). We obtained results for the TreeIso plugin module in CloudCompare (CC), working on a PC in a Windows environment, and results for the locate_trees() and segment_trees() algorithms using the lidR library in the HPC environment of the Devana supercomputer. We qualitatively and quantitatively evaluated the tree positions using the Munkres (Hungarian Algorithm) [10] for optimal matching. The Munkres algorithm, also known as the Hungarian Algorithm, is an efficient method for finding the optimal matching in bipartite graphs. Its use in matching trees with manually determined positions means finding the best match between trees identified from LiDAR data and their known positions. By setting an appropriate distance threshold in meters (e.g., 5 m), we can qualitatively determine the number of accurately identified tree positions. The results are processed using histograms and percentage accuracy of tree positions depending on the chosen precision threshold (Fig. 8).
We found that both methods achieve almost the same result at a 5-meter distance threshold, approximately 70% accurate tree positions. The method used in CloudCompare shows better results, i.e., a higher percentage at lower threshold values, as reflected in the corresponding histograms (Fig. 8). When comparing both methods, we achieve up to approximately 85% agreement at a threshold of up to 5 meters, indicating the qualitative parity of both approaches. The quality of the results is mainly influenced by the accuracy of vegetation classification in point clouds, as the presence of various artifacts incorrectly classified as vegetation distorts the results. Tree segmentation algorithms cannot eliminate the impact of these artifacts.
Figure 8. The histograms on the left display the number of correctly identified trees depending on the chosen distance threshold in meters (top: CC – CloudCompare - method, bottom: lidR method). The graphs on the right show the percentage success rate of correctly identified tree positions based on the method used and the chosen distance threshold in meters.
Parallel efficiency analysis of the locate_trees() algorithm in the lidR library
To determine the efficiency of parallelizing the locate_trees() algorithm in the lidR library, we applied the algorithm to the same study area using different numbers of CPU cores – 1, 2, 4, up to 64 (the maximum of the compute node of Devana HPC system). To assess sensitivity to problem size, we tested it on three areas of different sizes – 300x300, 1000x1000, and 3500x3500 meters. The times measured are shown in Table 1, and the scalability of the algorithm is illustrated in Figure 9. The results show that the scalability of the algorithm is not ideal. When using approximately 20 CPU cores, the algorithm's efficiency drops to about 50%, and with 64 CPU cores, the efficiency is only 15-20%. The efficiency is also affected by the problem size – the larger the area, the lower the efficiency, although this effect is not as pronounced. In conclusion, for effective use of the algorithm, it is suitable to use 16-32 CPU cores and to achieve maximum efficiency of the available hardware by appropriately dividing the study area into smaller parts. Using more than 32 CPU cores is not efficient but still allows for further acceleration of the computation.
Figure 9. SpeedUp of the lmf algorithm in the locate_trees() function of the lidR library depending on the number of CPU cores (NCPU)CPU) a veľkosti študovaného územia (v metroch).
Final evaluation
We found that achieving good results requires carefully setting the parameters of the algorithms used, as the number and quality of the resulting tree positions depend heavily on these settings. If obtaining the most accurate results is the goal, a possible strategy would be to select a representative part of the study area, manually determine the tree positions, and then adjust the parameters of the respective algorithms. These optimized settings can then be used for the analysis of the entire study area.
The quality of the results is also influenced by various other factors, such as the season, which affects vegetation density, the density of trees in the area, and the species diversity of the vegetation. The quality of the results is further impacted by the quality of vegetation classification in the point cloud, as the presence of various artifacts, such as parts of buildings, roads, vehicles, and other objects, can negatively affect the results. The tree segmentation algorithms cannot always reliably filter out these artifacts.
Regarding computational efficiency, we can conclude that using an HPC environment provides a significant opportunity for accelerating the evaluation process. For illustration, processing the entire study area of Petržalka (3500x3500 m) on a single compute node of the Devana HPC system took approximately 820 seconds, utilizing all 64 CPU cores. Processing the same area in CloudCompare on a powerful PC using a single CPU core took approximately 6200 seconds, which is about 8 times slower.
Authors Marián Gall – Slovak National Supercomputing Centre Michal Malček – Slovak National Supercomputing Centre Lucia Demovičová – Centrum spoločných činností SAV v. v. i., organizačná zložka Výpočtové stredisko Dávid Murín – SKYMOVE s. r. o. Robert Straka – SKYMOVE s. r. o.
[6] Roussel J., Auty D. (2024). Airborne LiDAR Data Manipulation and Visualization for Forestry Applications.
[7] Popescu, Sorin & Wynne, Randolph. (2004). Seeing the Trees in the Forest: Using Lidar and Multispectral Data Fusion with Local Filtering and Variable Window Size for Estimating Tree Height. Photogrammetric Engineering and Remote Sensing. 70. 589-604. 10.14358/PERS.70.5.589.
[8] Silva C. A., Hudak A. T., Vierling L. A., Loudermilk E. L., Brien J. J., Hiers J. K., Khosravipour A. (2016). Imputation of Individual Longleaf Pine (Pinus palustris Mill.) Tree Attributes from Field and LiDAR Data. Canadian Journal of Remote Sensing, 42(5).
[9] Ester M., Kriegel H. P., Sander J., Xu X.. KDD-96 Proceedings (1996) pp. 226–231
[10] Kuhn H. W., “The Hungarian Method for the assignment problem”, Naval Research Logistics Quarterly, 2: 83–97, 1955
Odborná konferencia Superpočítač a Slovensko v Bratislave15 Nov-Dňa 14. novembra 2024 sa v hoteli Devín v Bratislave uskutočnila odborná konferencia s názvom Superpočítač a Slovensko, ktorú zorganizovalo Ministerstvo investícií, regionálneho rozvoja a informatizácie SR. Konferencia sa zameriavala na aktuálne trendy a vývoj v oblasti vysokorýchlostného počítania na Slovensku. Súčasťou podujatia bola prezentácia L. Demovičovej z Národného kompetenčného centra pre HPC.
Konferencia vysokovýkonného počítania v Portugalsku12 Nov-V poradí 4. Stretnutie vysokovýkonného počítania 2024, ktoré sa konalo 5. a 6. novembra na Univerzite Beira Interior v Covilhã, sa etablovalo ako kľúčové stretnutie používateľov, technikov a partnerov ekosystému vysokovýkonného počítania v Portugalsku.
REGISTRÁCIA OTVORENÁ: Nová séria populárno-náučných prednášok o zaujímavých HPC aplikáciách6 Oct-Otvorili sme registráciu na sériu prednášok v zimnom semestri 2024, kde sa budeme venovať fascinujúcim témam, v ktorých vysokovýkonné počítanie zohráva kľúčovú úlohu. Tento semester sa zameriame na oblasti ako meteorológia, klimatológia, chémia, veľké jazykové modely a mnoho ďalších.
We invite you to the interesting event POP3 Profiling and Optimization Tools 46th VI-HPS Tuning Workshop. The event is organized by POP3 CoE in cooperation with the National Competence Centers for HPC from Slovakia, Czechia, Poland and Austria Hungary and Slovenia.
Virtual Institute—High Productivity Supercomputing (VI-HPS) is an initiative that aims to enhance the productivity of supercomputing applications by providing a comprehensive set of tools and methodologies for performance analysis, debugging, and tuning. It brings together expertise and resources from various organisations to support developing and optimising high-performance computing applications.
The workshop is designed to facilitate collaborative learning and application tuning, with a particular emphasis on teams of two or more participants working with the same or closely related application codes the teams are developing.
The first day of the workshop introduces participants to the POP Centre of Excellence (CoE), detailing its services, methodology, and tools for performance assessments and second-level services.
On the second day, the focus shifts to getting started with open-source multi-platform tools for analysing MPI+OpenMP application executions on CPU architectures.
The third day delves into more advanced usage, including analysing application executions on combined CPU and GPU architectures. During this hands-on workshop, participants will be introduced to the use of Paraver/Extrae and Scalasca/Score-P/CUBE toolsets for CPUs and GPUs.
Paraver/Extrae is a performance analysis toolset designed for tracing and analysing the execution of parallel applications. Extrae captures detailed execution traces, while Paraver provides powerful visualisation and analysis capabilities to help identify performance bottlenecks and optimise parallel code.
Scalasca/Score-P/CUBE is an integrated performance analysis toolkit for parallel applications. Score-P collects performance data in profiles and execution traces, Scalasca analyses and identifies performance issues, and CUBE facilitates exploration of the results, helping developers tune their applications.
Annotation
The course is organised in collaboration with POP3 CoE, NCC Austria, NCC Czechia, NCC Hungary, NCC Poland, NCC Slovakia and NCC Slovenia.
Virtual Institute—High Productivity Supercomputing (VI-HPS) is an initiative that aims to enhance the productivity of supercomputing applications by providing a comprehensive set of tools and methodologies for performance analysis, debugging, and tuning. It brings together expertise and resources from various organisations to support developing and optimising high-performance computing applications.
The workshop is designed to facilitate collaborative learning and application tuning, with a particular emphasis on teams of two or more participants working with the same or closely related application codes the teams are developing. The first day of the workshop introduces participants to the POP Centre of Excellence (CoE), detailing its services, methodology, and tools for performance assessments and second-level services. On the second day, the focus shifts to getting started with open-source multi-platform tools for analysing MPI+OpenMP application executions on CPU architectures. The third day delves into more advanced usage, including analysing application executions on combined CPU and GPU architectures. During this hands-on workshop, participants will be introduced to the use of Paraver/Extrae and Scalasca/Score-P/CUBE toolsets for CPUs and GPUs.
Paraver/Extrae is a performance analysis toolset designed for tracing and analysing the execution of parallel applications. Extrae captures detailed execution traces, while Paraver provides powerful visualisation and analysis capabilities to help identify performance bottlenecks and optimise parallel code.
Scalasca/Score-P/CUBE is an integrated performance analysis toolkit for parallel applications. Score-P collects performance data in profiles and execution traces, Scalasca analyses and identifies performance issues, and CUBE facilitates exploration of the results, helping developers tune their applications.
Additionally, other tools from the POP CoE will be available for participants to utilise throughout the workshop.
Target Audience and Purpose of the Course: Attendees will learn how to use the parallel performance analysis tools of the Performance Optimisation and Productivity (POP) CoE and a corresponding methodology for applying those tools to assess execution performance and scaling efficiency of their own parallel application codes in a portable fashion.
Level Intermediate/advanced, as no knowledge of any parallel performance tools is required (though serial code profiling experience is advantageous). However, participants are expected to be familiar with building/running (potentially hybrid, GPU-enabled) parallel applications.
Course format The hands-on parts will only be available for on-site participants, who should bring their codes to work on.
Výukové/prednáškové časti budú dostupné pre neobmedzený počet účastníkov, ktorí sa môžu zúčastniť online.
Prerequisites Participants should be familiar with one or more parallel programming paradigms, such as MPI and OpenMP (on CPUs), and preferably also the use of OpenMP, OpenACC, CUDA, or similar (for GPUs). When registering for the workshop, participants should report the programming languages and paradigms employed by their application codes, along with relevant framework/library dependencies. Note that applications using AI/ML frameworks such as TensorFlow are unsuitable for this workshop.
Technical requirements Participants with their own application code(s) should have these installed and running on Karolina supercomputer before the event. Also, a representative execution test case should be prepared, suitable for running on a single node in several minutes. The required tools will be available on Karolina (CPU and GPU partitions). However, participants may also install graphical tools on their own notebook computers. Each participant will get access to the mentioned clusters before the event.
Starts: 4.09.2024. 9:00 CET Ends: 6.09.2024 17:00 CET Venue: online and F2F in IT4Innovations v Ostrave
Call for Ideas: Seeking Slovak SME Partners for FFPlus Project Consortium
NCC Slovakia hľadá slovenských MSP partnerov na vytvorenie konzorcia pre návrh prestížneho projektu FFPlusis looking for Slovak SME partners to form a consortium for the prestigious FFPlus project proposal. The aim is to leverage High-Performance Computing (HPC) in addressing specific business challenges comprising e.g. modelling and simulation, data analytics, AI, etc. and achieving significant industrial impact.
The selected SMEs can benefit from our or state-of-the-art European Tier-0 HPC infrastructure, code efficiency optimization and/or parallelization, and domain and technical support. The expected output is a Success story in form of a white paper with no obligation in revealing details of the technical solution, or any other proprietary / IP information or data.
What We Offer:
HPC Infrastructure: Access to state-of-the-art HPC systems.
Technical Support and co-development: Expert guidance in HPC utilization, workflow and code optimization and parallelization.
Application Guidance: NCC Slovakia will guide and accompany partners throughout the application process.
Expected Output:
White Paper: A short success story documenting the business impact achieved through HPC adoption. Note: There is no open science condition for this output.
Key Focus Areas:
Uptake of HPC by SMEs: Targeting businesses with no prior experience in HPC to solve real-world challenges.
Positive Business Impact: Demonstrate how HPC adoption leads to tangible business benefits.
Diverse Application Domains: Prioritizing projects with the highest business impact potential.
Eligibility:
Slovak SMEs: Must have less than 250 employees and an annual turnover of less than 50 million EUR.
Non-Research-Oriented: SMEs should be commercially driven, focus on acedemic / fundamental research is not supported.
Project Details:
Submission Deadline: September 4th, 2024, 17:00 Brussels local time
Project Duration: Maximum of 15 months, starting January 1st, 2025
Funding Budget: Total of €4M for all sub-projects
Maximum Funding per Experiment: Up to 200 K€, up to 150 K€ per organization in the consortium. Main participant, i.e. SME, can participate in only one experiment.
Total maximum number of consortium partners: 5 - main participant and supporting participants
Proposal Expectations:
Alignment: Clearly define the business challenge and the necessity of HPC use.
Impact: Present the potential positive business impact.
Objectives: Set specific, achievable goals.
Consortium: Include all necessary parties for effective project execution.
Resources and Costs: Outline required resources and associated costs.
Data Protection: Address any data protection concerns.
Success Stories: Support in generating publishable success stories.
Submission Guidelines:
Format: Proposals must be submitted in English and comprise two parts: Part A (administrative information) and Part B (proposal body).
Electronic Submission: Proposals must be submitted electronically using the designated submission tool.
Join us in demonstrating the transformative potential of HPC for SMEs. Contact NCC Slovakia today, aby sme mohli vybudovať partnerstvo a spoločne sa uchádzať o tento projekt.
The FFplus project launched a new open call for European small and medium-sized enterprises. They are looking for agile innovative companies that decide to use supercomputers in practice and thus gain a competitive advantage on the market.
The FFplus project is already the fourth continuation of a very successful initiative that directly deals with how to help businesses overcome obstacles in the use of supercomputers and high-performance data analysis in practice or in the work and development of generative AI. The goal is primarily to strengthen the global competitiveness of European industry.
In the past years, dozens of companies from all over Europe that used supercomputers have successfully passed open challenges. Let their stories inspire you, you can find them on the FFplus website.
The FFplus project call is divided into 2 parts:
BUSINESS EXPERIMENTS
The first part of this challenge is intended for businesses with no previous experience with supercomputing across all disciplines. As part of this challenge, companies have the opportunity to submit their "experiments", i.e. projects solving a specific business challenge with the help of supercomputer technologies, high-performance data analysis or artificial intelligence. Estimated duration of the experiment max. 15 months with a planned start on January 1, 2025.
A sum of EUR 4 million will be distributed among all the selected projects for the financing of experiments.
The deadline for submitting applications is September 4, 2024 at 5 p.m.
INNOVATION STUDIES
The second part of the FFplus challenge will support companies and startups that are already active in the field of generative AI and that lack the necessary computing resources to develop their own models. The goal is to facilitate and strengthen the technological development of European companies in the field of AI.
Participating enterprises will be supported to increase their innovation potential by leveraging new generative UI models, such as large language models (LLM), based on their existing expertise, application area, business model and potential for expansion.
Submitted "innovation studies" must use extensive European supercomputing resources (pre-exascale and exascale) to develop and adapt generative AI models (pr. LLM).
A sum of EUR 4 million intended for the financing of experiments will be distributed among all selected sub-projects.
The deadline for submitting applications is September 4, 2024 at 5 p.m.
Are you interested in this opportunity? You can find out more information on the project website.
The experts from the National Competence Center for HPC will be happy to help you with the submission of the project - contact us..
Semi-Supervised Learning in Aerial Imagery: Implementing Uni-Match with Frame Field learning for Building Extraction
Building extraction in GIS (geographic information system) is pivotal for urban planning, environmental studies, and infrastructure management, allowing for accurate mapping of structures, including the detection of illegal constructions for regulatory compliance. Integrating extracted building data with other geospatial layers enhances the understanding of urban dynamics and spatial relationships. Given the scale and complexity of these tasks, there is a growing need to automate building extraction using deep learning techniques, which offer improved accuracy and efficiency in handling large-scale geospatial data.
illustrative image
State-of-the-art image segmentation models primarily output in raster format, whereas GIS applications often require vector polygons. One such method to meet this requirement is Frame Field learning, which addresses the gap between raster format outputs of image segmentation models and the vector format needed in GIS. This approach significantly enhances the accuracy of building vectorization by aligning with ground truth contours and provide topologically clean vector objects.
These models are trained using a 'supervised learning' method, necessitating a large amount of labeled examples for training. However, obtaining such a significant volume of data can be extremely challenging and expensive. A potential solution to this problem is 'semi-supervised learning,' a method that reduces reliance on labeled data. In semi-supervised learning, the model is trained with a mix of a small set of labeled data and a larger set of unlabeled data. Hence, the goal of this collaboration between the Slovak National Competence Center for High-Performance Computing and Geodeticca Vision s.r.o. was to identify, implement, and evaluate an appropriate semi-supervised method for Frame Field learning.
The aim of this cooperation between the National Competence Center for HPC and Geodeticca Vision s.r.o. was to identify, implement and evaluate a suitable partial tutor learning method for Frame Field learning.
Methods
Frame Field learning
The key idea of the frame field learning [1] is to help the polygonization method in solving ambiguous cases caused by discrete probability maps (output from image segmentation models). This is accomplished by introducing an additional output to the neural network of image segmentation, namely a frame field (see. Fig. 1), which represents the structural features and geometrical characteristics of the building.
Frame fields
Frame field is a 4-PolyVector field that assigns four vectors to each point on a plane. Specifically, the first two vectors are constrained to be opposite to the other two, meaning each point is assigned a set of vectors {u, −u, v, −v}. This approach is particularly necessary for buildings, as they are regular structures with sharp corners, and capturing directionality at these sharp corners requires two directions.
Figure 1: Visualization of the frame field output on the image from training set [1].
Frame Field learning
Figure 2: Diagram of the frame field learning [1]
The learning process of frame fields can be summarized as follows:
The network's input is a 3×H×W RGB image.
To generate a feature map, any deep segmentation model could be used, such as U-Net, which is then processed to output detailed segmentation maps.
The training is supervised with ground truth rasterized polygons for interiors and edges, utilizing a mix of cross-entropy and Dice loss for accurate segmentation.
To train the frame field, three losses are used:
Lalign enforces alignment of the frame field to the tangent direction.
Lalign90 prevents the frame field from collapsing to a line field.
Lsmooth measures the smoothness of the frame field.
Additional losses, regularization losses, are introduced to maintain output consistency, aligning the spatial gradients of the predicted maps with the frame field.
Vectorization
Figure 3: Visualization of the vectorization process [1]
The vectorization process transforms classified raster images into vector polygons using a polygonization method using the Active Skeleton Model (ASM). The principle of this algorithm is the iterative shifting of the vertices of the skeleton graph to their ideal positions. This method optimizes a skeleton graph - a network of pixels outlining the building's structure - created by a thinning method applied on a building wall probability map. The iterative shifting is controlled by a gradient optimization method aimed at minimizing an energy function, which includes specific components related to the structure and geometry being analyzed:
Eprobability – fits the skeleton paths to the contour of the building interior probability map at a certain probability threshold, e.g. 0.5
Eframe field align aligns each edge of the skeleton graph to the frame field.
Elength ensures that the node distribution along paths remains homogeneous as well as tight.
UniMatch semi-supervised learning
UniMatch [2], an advanced semi-supervised learning method in the consistency regularization category, builds upon the foundational principles established by FixMatch [3], a baseline method in this domain. primarily operates on the principle of pseudo-labeling combined with consistency regularization.
The basic principle of the FixMatch method involves generating pseudo-labels for unlabeled data from the predictions of a neural network. Specifically, for a weakly perturbed unlabeled input xw , a prediction pwpw is generated, which serves as a pseudo-label for the prediction of xwith, a strongly perturbed input. Subsequently, the loss function value, for example, cross-entropypw, pwithis calculated, considering only areas from pwpw with a probability value greater than a certain threshold, e.g., >0.95.
UniMatch builds upon and extends the FixMatch methodology, introducing two core enhancements:
UniPerb (Unified Perturbations for Images and Features) - This involves applying perturbations at the feature level. Practically, this means applying a dropout function to the output (i.e., the feature) from the encoder layer of the neural network, randomly ignore features, which then proceed to the decoder part of the network, generating pfp.
Instead of using one strong perturbation, two perturbations are utilized. xs1 and xs2.
Figure 4: (a) The FixMatch baseline (b) used UniMatch method. The FP denotes feature pertubation, w and s means weak and strong pertubation, respectively [2].
Ultimately, there are three error functions: crossentropy(pw, pfp), cross-entropy(pw, ps1), cross-entropy(pw, ps2These are then linearly combined with the supervised error function.
Táto metóda v súčasnosti patrí medzi state-of-the-art metódy učenia s čiastočným učiteľom. Hlavnou výhodou tejto metódy je jej jednoduchosť pri implementácií a nevýhodou je jej citlivosť na výber vhodnej slabej a silnej perturbácie.
Integrating UniMatch Semi-Supervised Learning with Frame Field Learning
Implementation Strategy for UniMatch in Frame Field Learning
To integrate UniMatch into our Frame Field learning framework, we first differentiated between weak and strong perturbations. For weak perturbations, we chose basic spatial transformations such as rotation, mirroring, and vertical/horizontal flips. These are well-suited for aerial imagery and straightforward to implement.
For strong perturbations, we opted for photometric transformations. These include adjustments in hue, color, and brightness, providing a more significant alteration to the images compared to spatial transformations.
Incorporating feature perturbation loss was a crucial step. We implemented this by introducing a dropout mechanism between the encoder and decoder parts of the network. This dropout selectively omits features at the feature level, which is essential for the UniMatch approach.
Regarding the dual-stream perturbations of UniMatch, we adapted our model to handle two types of strong perturbations. The dual-stream approach involves using the weak perturbation prediction as a pseudo-label and training the model using the strong perturbation predictions as loss functions. We have two strong perturbations, hence the term 'dual-stream'. Each of these perturbations contributes to the overall robustness and effectiveness of the model in semi-supervised learning scenarios, especially in the context of building extraction from complex aerial imagery.
Prostredníctvom týchto úprav bola UniMatch metóda úspešne integrovaná do Frame Field learning algoritmu, čím sa zvýšila jeho schopnosť efektívne spracúvať a učiť sa z anotovaných a hlavne neanotovaných dát.
Experiments Dataset Labeled Data
Our labeled data comes from three different sources, which we'll detail in the accompanying Table 1.
Table 1: Overview of 3 data sources of labeled data used for training the models with details.
Unlabeled Data
For the unlabeled dataset, we selected high-quality aerial images from Geodetický a kartografický ústav (GKÚ) [6], available for free public use. We specifically targeted a diverse area of 7000 km2ensuring a wide representation of various landscapes and urban settings.
Data Processing: Patching
We processed both labeled and unlabeled images into patches of size 320x320 px. This patch size is specifically chosen to match the input requirements of our neural network. From the labeled data, this process resulted in approximately 55,000 patches. Similarly, from the unlabeled dataset, we obtained around 244,000 patches.
Training setup Model Architecture
We designed our model using a U-Net architecture with an EfficientNet-B4 backbone. This combination provides a good balance of accuracy and efficiency, crucial for handling the complexity of our segmentation tasks. The EfficientNet-B4 backbone was specifically chosen for its optimal balance between memory usage and performance. In Frame Field learning, U-Net architecture has been shown to be highly effective, as evidenced by its strong performance in prior studies.
Training Process
For training, we used the AdamW optimizer, which combines the advantages of Adam optimization with weight decay, aiding in better model generalization. To prevent overfitting, we implemented L2 regularization. Additionally, we used the ReduceLROnPlateau learning rate scheduler. This scheduler adjusts the learning rate based on validation loss, ensuring efficient training progress.
Semi-Supervised Learning Adjustments
A key aspect of our training was adjusting the ratio of unlabeled to labeled patches. We experimented with ratios ranging from 1:1 to 1:5 (labeled:unlabeled). This variability allowed us to explore the impact of different amounts of unlabeled data on the learning process. It enabled us to identify the optimal balance for training our model, ensuring effective learning while leveraging the advantages of semi-supervised learning in handling large and diverse datasets.
Model evaluation
In our evaluation of the building footprint extraction model, we chose metrics that precisely measure how well our predictions align with real-world structures.
Intersection over Union (IoU)
Kľúčovou metrikou, ktorú sme využívali je metrika s názvom Intersection over Union (IoU). Počíta zhodu medzi predikciami modelu a skutočným tvarom budov. Hodnota skóre IoU blízka 1 znamená, že naše predikcie sú podobné skutočným budovám. Táto metrika je nevyhnutná na posúdenie geometrickej presnosti pre segmentované oblasti, pretože odráža presnosť vytýčenia hraníc budov. Okrem toho, vyhodnotením pomeru správne predikovanej oblasti ku kombinovanej oblasti (zjednotenie oblasti predikcie a skutočnej oblasti), nám IoU poskytuje jasnú mieru efektivity modelu v zachytávaní skutočného kontextu a tvaru budov v komplexnej mestskej krajine.
Precision, Recall and F1
Precision measures the accuracy of the model's building predictions, indicating the proportion of correctly identified buildings out of all identified buildings, thereby reflecting the model's specificity. Recall assesses the model's ability to capture all actual buildings, with a high recall score highlighting its sensitivity in detecting buildings. The F1 Score combines precision and recall into a single metric, offering a balanced view of the model's performance by ensuring that high scores result from both high precision and high recall.
Complexity Aware IoU (cIoU)
We also utilized Complexity Aware IoU (cIoU) [7]. This metric addresses a shortfall in IoU by balancing segmentation accuracy and the complexity of the polygon shapes. While IoU alone can lead models to create overly complex polygons, cIoU ensures that the complexity of the polygons (number of vertices) is kept realistic, reflecting the typically less complex structure of real buildings.
N Ratio Metric
The N ratio metric was an additional component of our evaluation strategy. It contrasts the number of vertices in our predicted shapes with those in the actual buildings [7]. This helps in understanding whether our model accurately replicates the detailed structure of the buildings.
Max Tangent Angle Error
To ensure clean geometry in building extraction tasks, accurately measuring contour regularity is essential. The Max Tangent Angle Error (MTAE) [1] metric is designed to address this need by supplementing the Intersection over Union (IoU) metric. It specifically targets the limitation of IoU, where segmentations with rounded corners may receive higher scores than those with more precise, sharp corners. By evaluating the alignment of edges through the comparison of tangent angles at sampled points along predicted and ground truth contours, MTAE effectively penalizes inaccuracies in edge orientation. This focus on edge precision is critical for producing clean vector representations of buildings, emphasizing the importance of accurate edge delineation in segmentation tasks.
Evaluation Process
Natrénované modely boli testované na veľkej dátovej množne leteckých snímok v plnej veľkosti (namiesto malých častí, pomocou ktorých bola sieť trénovaná). Takéto testovanie poskytuje presnejšie zobrazenie reálnych použití takýchto modelov. Na extrakciu budov zo snímok v plnej veľkosti sme použili techniku posuvného okna, čím boli vytvorené predikcie po jednotlivých segmentoch obrázku. Na okraje prekrývajúcich sa segmentov bola použitá pokročilá priemerovacia technika, dôležitá pre minimalizáciu nežiadúcich efektov a zachovanie konzistentnosti v rámci predikčnej mapy. Výstupná predikčná mapa v plnej veľkosti bola následne vektorizovaná do presných vektorových polygónov s použitím algoritmu Active Skeleton Model (ASM).
Results
Tabuľka 2: Výsledky trénovania modelov pre základný prístup (učenie s učiteľom) a prístupy učenia s čiastočným učiteľom s rôznymi podielmi použitých anotovaných a neanotovaných obrázkov.
The results from our experiments, reflecting performance of segmentation model trained under different conditions, reveal significant insights (see Table 2). We evaluated the model's performance in a baseline scenario without semi-supervised learning and in scenarios where semi-supervised learning was applied with varying ratios of labeled to unlabeled data (1:1, 1:3, and 1:5).
IoU: Starting from the baseline IoU of 80.50%, we observed a steady increase in this metric as we introduced more unlabeled data into the training process, reaching up to 85.77% with a 1:5 labeled to unlabeled ratio
2. Precision, Recall, and F1 Score: The precision of the model, which measures how accurate the predictions are, improved from 85.75% in the baseline to 90.04% in the 1:5 ratio setup. Similarly, recall, which indicates how well the model can find all relevant instances, slightly increased from 94.27% to 94.76%. The F1 Score, which balances precision and recall, also saw an improvement from 89.81% to 92.34%. These improvements suggest that the model became more accurate and reliable in its predictions when semi-supervised learning was used.
N Ratio a cIoU: The results show a notable decrease in the N Ratio from 2.33 in the baseline to 1.65 in the semi-supervised 1:5 ratio setup, indicating that the semi-supervised model generates simpler, yet accurate, vector shapes that more closely resemble the actual structures. This simplification likely contributes to the enhanced usability of the output in practical GIS applications. Concurrently, the complexity-aware IoU (cIoU) significantly improved from 48.89% in the baseline to 64.75% in the 1:5 ratio, suggesting that the semi-supervised learning approach not only improves the overlap between the predicted and actual building footprints but also produces simpler vector shapes, which are closer to real-world buildings in terms of geometry.
Mean Max Tangent Angle ErrorMTAE: The Mean MTAE's reduction from 18.60° in the baseline to 17.45° in the 1:5 semi-supervised setting signifies an improvement in the geometric precision of the model's predictions. This suggests that the semi-supervised learning model is better at capturing the architectural features of buildings with more accurately defined angles, contributing to the production of topologically simpler and cleaner vector polygons.
Training on High-Performance Computing (HPC) Machine
HPC Configuration
Our training was conducted on a High-Performance Computing (HPC) machine equipped with substantial computational resources. The HPC had 8 nodes, each outfitted with 4 NVIDIA A100 GPUs with 40GB of VRAM, 64 CPU cores, and 256GB of RAM. For task scheduling, the system utilized Slurm.
PyTorch Lightning Framework
We employed the PyTorch Lightning framework, which offers user-friendly multi-GPU settings. This framework allows the specification of the number of GPUs per node, the total number of nodes, various distributed strategies, and the option for mixed-precision training.
Experiences with Slurm and PyTorch Lightning
When training on a single GPU, our Slurm configuration was as follows: #SBATCH –partition=ngpu #SBATCH –gres=gpu:1 #SBATCH –cpus-per-task=16 #SBATCH –mem=64000
In PyTorch Lightning, we set the trainer as: Trainer:
trainer = Trainer(accelerator=”gpu”, devices=1)
Since, here, we allocated one GPU from four available in one node, we allocated 16 CPUs from 64 available. Therefore, for the data loaders, we assigned 16 workers. Since semi-supervised learning uses two data loaders (one for labeled and one for unlabeled data), we allocated 8 workers to each. It was critical to ensure that the total number of cores for the data loaders did not exceed the available CPUs to prevent training crashes.
Distributed Data Parallel (DDP) Strategy
Using PyTorch Lightning's Distributed Data Parallel (DDP) option, we ensured each GPU across the nodes operated independently:
Each GPU processed a portion of the dataset.
All processes initiated the model independently.
Each conducted forward and backward passes in parallel.
Gradients were synchronized and averaged across processes.
Each process updated its optimizer individually.
With this approach, the total number of data loaders equaled the number of GPUs multiplied by the number of data loaders. For example, in a semi-supervised learning setup with 4 GPUs and two types of data loaders (labeled and unlabeled), we ended up with 8 data loaders, each with 8 workers – 64 workers in total.
To fully utilized one node with four GPU, we used following configurations:
Using PyTorch Lighting, it is possible to leverage multiple nodes on HPC. For instance, using 4 nodes with 4 GPUs each (16 GPUs in total) was configured as:
Correspondingly, the Slurm configuration was set to:
#SBATCH –nodes=4
#SBATCH –ntasks-per-node=4
#SBATCH –gres=gpu:4
These settings and experiences highlight the scalability and flexibility of training complex machine learning models on an HPC environment, especially for tasks demanding significant computational resources like semi-supervised learning in geospatial data analysis.
Training Scalability Analysis
Tabuľka 3: Výsledky trénovania prístupov učenia s učiteľom a učenia s čiastočným učiteľom s 1, 2, 4 a 8 GPU. Pre každú konfiguráciu je uvedený čas na jednu epochu a pomer urýchlenia proti 1 GPU.
In the Training Scalability Analysis, we carefully examined the impact of expanding computational resources on the efficiency of training models, utilizing the PyTorch Lightning framework. This investigation covered both supervised and semi-supervised learning approaches, with a particular emphasis on the effects of increasing GPU numbers, including setups involving 2 nodes (or 8 GPUs).
Figure 5: This graph compares the actual speedup ratios for supervised and semi-supervised learning against the number of GPUs, alongside the ideal linear speedup ratio. It showcases the closer alignment of semi-supervised learning with ideal scalability, emphasizing its greater efficiency gains from increased computational resources.
A key finding from this analysis was that the increase in speedup ratios for supervised learning did not perfectly align with the number of GPUs utilized. Ideally, doubling the number of GPUs would directly double the speedup ratio (e.g., using 4 GPUs would result in a 4x speedup). However, the actual speedup ratios were lower than this ideal expectation. This discrepancy can be attributed to the overhead associated with managing multiple GPUs and nodes, particularly the need to synchronize data across all GPUs, which introduces efficiency losses.
Učenie s čiastočným učiteľom ukázalo mierne iný trend, viac približujúci sa ideálnemu (lineárnemu) nárastu urýchlenia. Zdá sa, že komplexnosť a vyššie výpočtové nároky učenia s čiastočným učiteľom zmierňujú dopad overhead nákladov a tým umožňujú efektívnejšie využívanie viacerých GPU. Napriek výzvam spojeným so synchronizáciou dát cez viacero GPU kariet a výpočtových uzlov, vyššie výpočtové nároky učenia s čiastočným učiteľom umožňujú efektívnejšie škálovanie zdrojov, t.j. urýchlenie bližšie ideálnemu scenáru.
Conclusion
The research presented in this whitepaper has successfully demonstrated the effectiveness of integrating UniMatch semi-supervised learning with Frame Field learning for the task of building extraction from aerial imagery. This integration addresses the challenges associated with the scarcity of labeled data in deep learning applications for geographic information systems (GIS), providing a cost-effective and scalable solution.
Our findings reveal that employing semi-supervised learning significantly enhances the model's performance across several key metrics, including Intersection over Union (IoU), precision, recall, F1 Score, N Ratio, complexity-aware IoU (cIoU), and Mean Max Tangent Angle Error (MTAE). Notably, the improvements in IoU and cIoU metrics underscore the model's increased accuracy in delineating building footprints and generating vector shapes that closely resemble actual structures. This outcome is pivotal for applications in urban planning, environmental studies, and infrastructure management, where precise mapping and analysis of building data are crucial.
The methodology adopted, which combines Frame Field learning with the innovative UniMatch approach, has proven to be highly effective in leveraging both labeled and unlabeled data. This strategy not only improves the geometric precision of the model's predictions but also ensures the generation of cleaner, topologically accurate vector polygons. Furthermore, the scalability and efficiency of training on a High-Performance Computing (HPC) machine using the PyTorch Lightning framework and Distributed Data Parallel (DDP) strategy have been instrumental in handling the extensive computational demands of the semi-supervised learning process on the data at hand, within a time frame ranging from tens of minutes to hours.
Práca zdôrazňuje potenciál učenia s čiastočným učiteľom v zlepšovaní automatickej extrakcie budov z leteckých snímok. Implementácia UniMatch do Frame Field learning metódy predstavuje významný krok vpred, poskytujúc robustné riešenie pre výzvy spojené s nedostatkom dát a potreby vysokej presnosti geopriestorovej dátovej analýzy. Tento prístup zlepšuje efektívnosť a presnosť extrakcie budov, a taktiež otvára nové možnosti pre aplikácie metód učenia s čiastočným učiteľom v GIS a príbuzných oblastiach.
Acknowledgment
Research results were obtained with the support of the Slovak National competence centre for HPC, the EuroCC 2 project and Slovak National Supercomputing Centre under grant agreement 101101903-EuroCC 2-DIGITAL-EUROHPC-JU-2022-NCC-01.
Computational resources were procured in the national project National competence centre for high performance computing (project code: 311070AKF2) funded by European Regional Development Fund, EU Structural Funds Informatization of society, Operational Program Integrated Infrastructure.
[1] Nicolas Girard, Dmitriy Smirnov, Justin Solomon, and Yuliya Tarabalka. “Polygonal Building Extraction by Frame Field Learning”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2021), pp. 5891-5900.
[2] L. Yang, L. Qi, L. Feng, W. Zhang, and Y. Shi. “Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation”. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2023), pp. 7236-7246. doi: 10.1109/CVPR52729.2023.00699.
[3] Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, and Colin Raffel. “FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence”. In: CoRR, vol. abs/2001.07685 (2020). Available: https://arxiv.org/abs/2001.07685.
[4] Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat, and Pierre Alliez. “Can Semantic Labeling Methods Generalize to Any City? The Inria Aerial Image Labeling Benchmark”. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (2017). IEEE.
[5] Adrian Boguszewski, Dominik Batorski, Natalia Ziemba-Jankowska, Tomasz Dziedzic, and Anna Zambrzycka. “LandCover.ai: Dataset for Automatic Mapping of Buildings, Woodlands, Water and Roads from Aerial Imagery”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2021), pp. 1102-1110.
[7] Stefano Zorzi, Shabab Bazrafkan, Stefan Habenschuss, and Friedrich Fraundorfer. “PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 1848-1857.
Odborná konferencia Superpočítač a Slovensko v Bratislave15 Nov-Dňa 14. novembra 2024 sa v hoteli Devín v Bratislave uskutočnila odborná konferencia s názvom Superpočítač a Slovensko, ktorú zorganizovalo Ministerstvo investícií, regionálneho rozvoja a informatizácie SR. Konferencia sa zameriavala na aktuálne trendy a vývoj v oblasti vysokorýchlostného počítania na Slovensku. Súčasťou podujatia bola prezentácia L. Demovičovej z Národného kompetenčného centra pre HPC.
Konferencia vysokovýkonného počítania v Portugalsku12 Nov-V poradí 4. Stretnutie vysokovýkonného počítania 2024, ktoré sa konalo 5. a 6. novembra na Univerzite Beira Interior v Covilhã, sa etablovalo ako kľúčové stretnutie používateľov, technikov a partnerov ekosystému vysokovýkonného počítania v Portugalsku.
REGISTRÁCIA OTVORENÁ: Nová séria populárno-náučných prednášok o zaujímavých HPC aplikáciách6 Oct-Otvorili sme registráciu na sériu prednášok v zimnom semestri 2024, kde sa budeme venovať fascinujúcim témam, v ktorých vysokovýkonné počítanie zohráva kľúčovú úlohu. Tento semester sa zameriame na oblasti ako meteorológia, klimatológia, chémia, veľké jazykové modely a mnoho ďalších.