Kategórie
General

Invitation to Qubit Conference® Slovakia 2024: Don't miss the unique opportunity to participate in the top event in the field of cyber security

Invitation to Qubit Conference® Slovakia 2024: Don't miss the unique opportunity to participate in the top event in the field of cyber security

Qubit Conference® Slovakia 2024 is approaching and we are happy to invite you to this prestigious event, which will take place on November 13 and 14, 2024 in the wonderful surroundings of the Congress & Wellness Hotel Chopok in Jasná. This is a unique opportunity for professionals from various industries to meet, share their experiences and gain the latest knowledge in the field of cyber security.

What awaits you?

The conference offers a rich and varied program that includes more than 30 renowned speakers and numerous panel discussions and training sessions. The first day will be dedicated to five key panel discussions, which will focus on the most important challenges and current topics in the field of cyber security:

  • The Joys and Sorrows of Everyday Cyber ​​Security Operations: How to Manage Cyber ​​Security's Everyday Challenges and Contingencies?
  • Don't be afraid of "GRC": Governance, risk and compliance - how to ensure that your organizations meet all regulatory requirements while minimizing risks?
  • Here's how we solved it: Cyber ​​incident resolution case studies straight from the experts.
  • Taking care of people in IT from A to Z: How to effectively recruit, develop and retain top IT professionals?
  • Future threats: Looking at security through the window of the future - what new threats are emerging and how to prepare for them?

The first conference day will end with networking  where you will find a pleasant community atmosphere, colleagues from the industry and the opportunity to establish new business cooperation. The evening culminates in a bowling tournament, which promises great fun and an opportunity for informal meetings with other participants.

Second day: Intensive training with experts

The second day of the conference will focus on practical trainings, which will be led by experienced experts. Participants have the opportunity to choose from three full-day training sessions:

  • Solving incidents using open source tools -  This training, led by an expert from ESET, Ladislav Bač, will focus on the effective use of open source tools in solving cyber incidents.
  • Code Strong - mental resilience in times of chaos   - Zuzana Reľovská from Wellbeing will teach you how to develop mental resilience in times of constant change and cyber threats.
  • We manage cyber risks quantitatively -  Michal Hanus from Cyber ​​Rangers will present methods of quantitative assessment and management of cyber risks.

These trainings will provide participants with a deeper understanding and practical skills that they can immediately apply in their daily work.

Why should you be part of the Qubit Conference® Slovakia 2024?

Qubit Conference® Slovakia 2024 is not only about lectures and trainings. It is a platform where cybersecurity experts from various industries, including finance, energy, manufacturing, telecommunications, pharmaceuticals, critical infrastructure, IT and government institutions, meet. The conference offers a unique opportunity for networking and exchanging experiences with colleagues and specialists from the entire region.

In addition, the Qubit Conference® is known for gradually becoming a key event in the field of cyber security in Central Europe, welcoming more than 200 participants every year. It's a place to share the latest technologies and strategies that are shaping the future of digital security.

Register Today!

Do not miss the opportunity to participate in this important event. Register for Qubit Conference® Slovakia 2024 as soon as possible and secure your place among cyber security experts. You can find more information about the conference and the possibility of registration on the official website of the event.

We look forward to your participation and believe that the Qubit Conference® Slovakia 2024 will bring you new knowledge, inspiration and valuable contacts that will move you and your organization forward in the field of cyber security.

Kategórie
General

New HPC Ambassador: IT Valley Košice

New HPC Ambassador: IT Valley Košice

The National Competence Center for High-Performance Computing (HPC) has established a new strategic partnership with IT Valley Košice under the HPC Ambassador program. This collaboration aims to strengthen technological innovation and development in eastern Slovakia, contributing to the growth of the innovation ecosystem across the country.

Goals and Vision

The partnership focuses on promoting the adoption of HPC technologies among members of IT Valley Košice  including companies, academic institutions, and research organizations. Through this effort, we aim to create an environment that fosters the development of talent and innovative companies capable of competing on the global stage.

IT Valley Košice strives to build a technologically advanced business environment in eastern Slovakia, and this partnership significantly contributes to establishing the region as a center of excellence for business, research, and education.

Practical Collaboration

The National Competence Center for HPC will provide relevant information, training, and services to IT Valley Košice members, while IT Valley Košice will promote these opportunities and help identify organizations ready to leverage HPC technologies. Members will gain access to top-notch support and expert consultations.

Additionally, IT Valley Košice will play a key role in facilitating knowledge and technology transfer between academia and the IT industry. Educational events, workshops, and competitions will be organized to enhance the region's innovation potential. We look forward to successful collaboration with IT Valley Košice and to projects that will support the innovative and entrepreneurial environment in Slovakia.

Kategórie
General

Webinar: HPC for Small and Medium Enterprises

Webinar: HPC for Small and Medium Enterprises

In today's rapidly changing market, it is crucial for small and medium enterprises (SMEs) to effectively leverage new technologies to stay competitive. One of the innovative solutions that can significantly advance SMEs is High-Performance Computing (HPC).

If you are interested in how HPC can improve your business, don't hesitate to join a special online event organized by the National Competence Centre for HPC. The event will take place online on September 4, 2024. Registration is mandatory.

Why You Shouldn't Miss It?

The event will focus on the possibilities of using HPC in Central Europe and provide practical information and examples of how even smaller companies can use this technology to enhance their business processes. HPC can help speed up product development, optimize manufacturing processes, improve service quality, and reduce costs, which is especially important for SMEs looking for ways to gain a competitive edge.

What Can You Look Forward To?

During the event, you will have the opportunity to hear real case studies from various industries that demonstrate how HPC has helped small and medium-sized businesses achieve their goals. You'll learn how engineering companies use HPC for simulations and optimization of design proposals or how pharmaceutical companies utilize this technology to accelerate the development of new drugs.

Collaboration and Support

Another important topic will be the collaboration between SMEs and technology centers. You will learn how these organizations can provide the necessary infrastructure and expertise that SMEs need to utilize HPC. Experts from the National Competence Centers for HPC in Central Europe will also present opportunities to access modern computing resources that would otherwise be financially inaccessible.

Register Today!

Don't miss this unique opportunity and join the event on September 4, 2024, which will take place online. It's a chance to gain valuable information for free, make new connections, and discover how HPC can take your business to the next level. Registration is open, so don't hesitate and secure your spot today!

Registration is open, so don't hesitate and secure your spot today! We also have a giveaway for event participants: giveaway!

Get inspired and find out how you can gain a competitive advantage in the global market with HPC!

More info about the event

Registration

Kategórie
Success-Stories General

Leveraging LLMs for Efficient Religious Text Analysis

Leveraging LLMs for Efficient Religious Text Analysis

The analysis and research of texts with religious themes have historically been the domain of philosophers, theologians, and other social sciences specialists. With the advent of artificial intelligence, such as the large language models (LLMs), this task takes on new dimensions. These technologies can be leveraged to reveal various insights and nuances contained in religious texts — interpreting their symbolism and uncovering their meanings. This acceleration of the analytical process allows researchers to focus on specific aspects of texts relevant to their studies.

One possible research task in the study of texts with religious themes involves examining the works of authors affiliated with specific religious communities. By comparing their writings with the official doctrines and teachings of their denominations, researchers can gain deeper insights into the beliefs, convictions, and viewpoints of the communities shaped by the teachings and unique contributions of these influential authors.

This report proposes an approach utilizing embedding indices and LLMs for efficient analysis of texts with religious themes. The primary objective is to develop a tool for information retrieval, specifically designed to efficiently locate relevant sections within documents. The identification of discrepancies between the retrieved sections of texts from specific religious communities and the official teaching of the particular religion the community originates from is not part of this study; this task is entrusted to theological experts.

This work is a joint effort of Slovak National Competence Center for High-Performance Computing and the Faculty of Theology at Trnava University. Our goal is to develop a tool for information retrieval using LLMs to help theologians analyze religious texts more efficiently. To achieve this, we are leveraging resources of HPC system Devana to handle the computations and large datasets involved in this project.

Dataset

The texts used for the research in this study originate from the religious community known as the Nazareth Movement (commonly referred to as ”Beňovci”), which began to form in the 1970s. The movement, which some scholars identify as having sect-like characteristics, is still active today, in reduced and changed form. Its founder, Ján Augustín Beňo (1921 - 2006), was a secretly ordained Catholic priest during the totalitarian era. Beňo encouraged members of the movement to actively live their faith through daily reading of biblical texts and applying them in practice through specific resolutions. The movement spread throughout Slovakia, with small communities existing in almost every major city. It also spread to neighboring countries such as Poland, the Czech Republic, Ukraine, and Hungary. In 2000, the movement included approximately three hundred married couples, a thousand children, and 130 priests and students preparing for priesthood. The movement had three main goals: radical prevention in education, fostering priests who could act as parental figures to identify and nurture priestly vocations in children, and the production and distribution of samizdat materials needed for catechesis and evangelization.

27 documents with texts from this community are available for research. These documents, which significantly influenced the formation of the community and its ideological positions, were reproduced and distributed during the communist regime in the form of samizdats — literature banned by the communist regime. After the political upheaval, many of them were printed and distributed to the public outside the movement. Most of the analyzed documents consist of texts intended for ”morning reflections” — short meditations on biblical texts. The documents also include the founder’s comments on the teachings of the Catholic Church and selected topics related to child rearing, spiritual guidance, and catechesis for children.

Although the documents available to us contained a few duplications, this did not pose a problem for the information retrieval task and will thus remain unaddressed in this report. All of the documents are written exclusively in Slovak language.

One of the documents is annotated for test purposes by experts from the partner faculty, who have long been studying the Nazareth Movement. By annotations, we refer to text parts labeled as belonging to one of the five classes, where these classes represent five topics, namely

  1. Directive obedience
  2. Hierarchical upbringing
  3. Radical adoption of life model
  4. Human needs fulfilled only in religious community and family
  5. Strange/Unusual/Intense

Additionally, each of this topics is supplemented with a set of queries designed to test the retrieval capabilities of our solution.

Table 1

Strategy/Solution

There are multiple strategies appropriate for solving this task, including text classification, topic modelling, retrieval-augmented generation (RAG), and fine-tuning of LLMs. However, the theologians’ requirement is to identify specific parts of the text for detailed analysis, necessitating the retrieval of exact wording. Therefore, a choice was made to leverage information retrieval. This approach differs from RAG, which typically incorporates both information retrieval and text generation components, in focusing solely on retrieving textual data, without the additional step of new content generation.

Information retrieval leverages LLMs to transform complex data such as text, into a numerical representation that captures the semantic meaning and context of the input. This numerical representation, known as embedding, can be used to conduct semantic searches by analysing the positions and proximity of embeddings within a multi-dimensional vector space. By using queries, the system can retrieve relevant parts of the text by measuring the similarity between the query embeddings and the text embeddings. This approach does not require any fine-tuning of the existing LLMs, therefore the models can be used without any modification and the workflow remains quite simple.

Model choice

Information retrieval leverages LLMs to transform complex data such as text, into a numerical representation that captures the semantic meaning and context of the input. This numerical representation, known as embedding, can be used to conduct semantic searches by analysing the positions and proximity of embeddings within a multi-dimensional vector space. By using queries, the system can retrieve relevant parts of the text by measuring the similarity between the query embeddings and the text embeddings.

These four models were leverages to acquire vector representations of the chunked text, and their specific contributions will be discussed in the following parts of the study.

Data preprocessing

The first step of data preprocessing involved text chunking. The primary reason for this step was to meet the requirement of religious scholars for retrieval of paragraph-sized chunks. Besides, documents needed to be split into smaller chunks anyway due to the limited input lengths of some LLMs. For this purpose, the Langchain library was utilized. It offers hierarchical chunking that produces overlapping chunks of a specific length (with a desired overlap) to ensure that the context is preserved. Chunks with lengths of 300, 400, 500 and 700 symbols were generated. Subsequent preprocessing steps included removal of diacritics, case normalization according to the requirements of the models and stopwords removal. The removal of stopwords is a common practice in natural language processing tasks. While some models may benefit from the exclusion of stopwords to improve relevancy of retrieved chunks, others may take advantage of retaining stopwords to preserve contextual information essential for understanding the text.

Table 2

Vector Embeddings

Vector embeddings were created from text chunks using selected pre-trained language models.

For the Slovak-BERT model, generating embedding involves leveraging the model without any additional layers for inference and then using the first embedding, which contains all the semantic meaning of the chunk, as the context embedding. Other models produce embeddings in required form, so no further postprocessing was needed.

In the subsequent results section, the performance of all created embedding models will be analyzed and compared based on their ability to capture and represent the semantic content of the text chunks.

Results

Prior to conducting quantitative tests, all embedding indices underwent preliminary evaluation to determine the level of understanding of the Slovak language and the specific religious terminology by the selected LLMs. This preliminary evaluation involved subjective judgement of the relevance of retrieved chunks.

These tests revealed that the E5 model embeddings exhibit limited effectiveness on our data. When retrieving for a specific query, the retrieved chunks contained most of the key words used in the query, but did not contain the context of the query. One of the explanations could be that this model prioritizes word-level matches over the nuanced context in Slovak language, because it’s possible that the training data of this model for Slovak was less extensive or less contextually rich, leading to weaker performance. However, these observations are not definitive conclusions but rather hypotheses based on current, limited results. A decision was made not to further evaluate the performance of the embedding indices leveraging E5 embeddings, as it seemed irrelevant given the inability to effectively capture the nuances of the religious texts. On the other hand, the abilities of Slovak-BERT model, based on the RoBERTa architecture characterized by its relatively simple architecture, exceeded the expectations. Moreover, the performance of text-embedding-3-small and BGE M3 embeddings met expectations, as the first test, subjectively evaluated, demonstrated a very good grasp of the context, proficiency in Slovak language, and understanding of the nuances within the religious texts.

Therefore, quantitative tests were performed only on embedding indices utilizing Slovak-BERT, OpenAI’s text-embedding-3-small and BGE M3 embeddings.

Given the problem specification and the nature of test annotations, there arises a potential concern regarding the quality of the annotations. It is possible that some text parts were misclassified as there may be sections of text that belong to multiple classes. This, combined with the possibility of human error, can affect the consistency and accuracy of the annotations.

With this consideration in mind, we have opted to focus solely on recall evaluation. By recall, we mean the proportion of correctly retrieved chunks out of the total number of annotated chunks, regardless of the fraction of false positive chunks. Recall will be evaluated for every topic and for every length-specific embedding index for all selected LLMs.

Moreover, the provided test queries might also reflect the complexity and interpretative nature of religious studies. For example, consider a query ”God’s will” for the topic Directive obedience. While careful reader understands how this query relates to the given topic, it might not be as clear to a language model. Therefore, apart from evaluating using provided queries, another evaluation was conducted using queries acquired through contextual augmentation. Contextual/query augmentation is a prompt engineering technique for enhancing text data quality and is well-documented in various research papers , . This technique involves prompting a language model to generate a new query based on initial query and other contextual information in order to formulate a better query. Language model used for generation of queries through query augmentation technique was GPT 3.5 and these queries will be referred to as ”GPT queries” throughout the rest of the report.

Slovak-BERT embedding indices

Recall evaluation for embedding indices utilizing Slovak-BERT embeddings for four different chunk sizes with and without stopwords removal is presented in Figure 1The evaluation covers each topic specified in the list in Section 2 and includes both original queries and GPT queries.

We observe, that GPT queries generally yield better results compared to the original queries, except for the last two topics, where both sets of queries produce similar results. Also, it is apparent, that Slovak-BERT-based embeddings benefit from stopwords removal in most cases. The highest recall values were achieved for the third topic Radical adoption of life model, with the chunk size of 700 symbols with removed stopwords, reaching more than 47%. In contrast, the worst results were observed for the topic Strange/Unusual/Intense, where neither the original nor GPT queries successfully retrieved relevant parts. In some cases none of the relevant parts were retrieved at all.

Recall values obtained for all topics using both original and GPT queries, across various chunk sizes of embeddings generated using the Slovak-BERT model. Embedding indices marked as +SW include stopwords, while -NoSW indicates stopwords were removed.

Figure 1: Recall values obtained for all topics using both original and GPT queries, across various chunk sizes of embeddings generated using the Slovak-BERT model. Embedding indices marked as +SW include stopwords, while -NoSW indicates stopwords were removed.

OpenAI’s text-embedding-3-small embedding indices

Similar to the evaluation for Slovak-BERT embedding indices, evaluation charts for embedding indices utilizing OpenAI’s text-embedding-3-small embeddings are presented in Figure 2The recall values are generally much higher than those observed with Slovak-BERT embeddings. As with the previous results, GPT queries produce better outcomes. We can observe a subtle trend in recall value and chunk size dependency – longer chunk sizes generally yield higher recall values.

An interesting observation can be made for the topic Radical adoption of life model. When using the original queries, hardly any relevant results were retrieved. However, when using GPT queries, recall values were much higher, reaching almost 90% for chunk sizes of 700 symbols.

Regarding the removal of stopwords, its impact on embeddings varies. For topics 4 and 5, stopwords removal proves beneficial. However, for the other topics, this preprocessing step does not offer advantages.

Topics 4 and 5 exhibited the weakest performance among all topics. This may be due to the nature of the queries provided for these topics, which are quotes or full sentences, compared to queries for other topics, that are phrases, keywords or expressions. It appears that this model performs better with the latter type of queries. On the other hand, since the queries for topics 4 and 5 are full sentences, the embeddings benefit from stopwords removal, as it probably helps in handling the context of sentence-like queries.

Topic 4 is very specific and abstract, while topic 5 is very general, making it understandable that capturing this topic in queries is challenging. The specificity of topic 4 might require more nuanced test queries, as the provided test queries probably did not contain all nuances of a given topic. Conversely, the general nature of topic 5 might benefit from a different analytical approach. Methods like Sentiment Analysis could potentially grasp the strange, unusual, or intense mood in relation to the religious themes analysed.

Figure 2: Recall values assessed for all topics using both original and GPT queries, utilizing various chunk sizes of embeddings generated with the text-embedding-3-small model. Embedding indices labeled +SW include stopwords, and those labeled -NoSW have stopwords removed.

BGE M3 embedding indices

Evaluation charts for embedding indices utilizing BGE M3 embeddings are presented in Figure 3The recall values demonstrate a performance falling between Slovak-BERT and OpenAI’s text-embedding-3-small embeddings. While, in some cases, not reaching the recall values of OpenAI’s embeddings, BGE M3 embeddings show competitive performance, particularly considering their open-source availability compared to OpenAI’s embeddings, that are accessible through API, which might pose a problem with data confidentiality.

With these embeddings, we also observe the same phenomenon as with OpenAI’s text-embedding-3-small embeddings: shorter, phrase-like queries are preferred over quote-like queries. Therefore, recall values are higher for first three topics.

Stopwords removal seems to be mostly beneficial, mainly for the last two topics.

Figure 3: Recall values for all topics using original and GPT queries, with embeddings of different chunk sizes produced by the BGE M3 model. Indices labeled as +SW contain stopwords, while -NoSW indicates their removal.

Conclusion

This paper presents an approach for analysis of text with religious themes with the use of text numerical representations known as embeddings, generated by three selected pre-trained large language models: Slovak-BERT, OpenAI’s text-embedding-3-small and BGE M3 embedding model. These models were selected after it was evaluated, that their proficiency in Slovak language and religious terminology is sufficient to handle the task of information retrieval for a given set of documents.

Challenges related to quality of test queries were addressed using query augmentation technique. This approach helped in formulating appropriate queries, resulting in more relevant retrieval of text chunks, capturing all the nuances of topics that interest theologians.

Evaluation results proved the effectiveness of the embeddings produced by these models, particularly the text-embedding-3-small from OpenAI, which exhibited a strong contextual understanding and linguistic proficiency. The recall value for this model’s retrieval abilities varied depending of the topic and queries used, with the highest values reaching almost 90% for topic Radical adoption of life model when using GPT queries and chunk length of 700 symbols. Generally, text-embedding-3-small performed best with the longest chunk lengths studied, showing a trend of increasing recall with the increase in chunk length. The topic Strange/Unusual/Intense had the lowest recall, possibly due to the uncertainty in topic specification.

For Slovak-BERT embedding indices, the recall values were slightly lower, but still impressive given the simplicity of this language model. Better results were achieved using GPT queries, with the best recall value of 47.1% for the topic Radical adoption of life model at a chunk length of 700 symbols, with embeddings created from chunks with removed stropwords. Generally, this embedding model benefited most from the stopwords removal preprocessing step.

As for BGE M3 embeddings, the result were impressive, achieving high recall, though not as high as OpenAI’s embeddings. However, considering that BGE M3 is an open-source model, these results are remarkable.

These findings highlight the potential of leveraging LLMs for specialized domains like analysis of texts with religious themes. Future work could explore the connections between text chunks using clustering techniques with embeddings to discover hidden associations and inspirations of the text authors. For theologians, future work lies in examining the retrieved text parts to identify deviations from official teaching of Catholic Church, shedding light on movement’s interpretations and insights.

Acknowledgment

Research results were obtained with the support of the Slovak National competence centre for HPC, the EuroCC 2 project and Slovak National Supercomputing Centre under grant agreement 101101903-EuroCC 2-DIGITAL-EUROHPC-JU-2022-NCC-01.

Computational resources were procured in the national project National competence centre for high performance computing (project code: 311070AKF2) funded by European Regional Development Fund, EU Structural Funds Informatization of society, Operational Program Integrated Infrastructure.

Full version of the article SK
Full version of the article EN

Authors

Bibiána Lajčinová – Slovak National Supercomputing Centre
Jozef Žuffa – Faculty of Theology, Trnava University,
Milan Urbančok – Faculty of Theology, Trnava University,

References:

[1] Matúš Pikuliak, Štefan Grivalský, Martin Konôpka, Miroslav Blšťák, Martin Tamajka, Viktor Bachratý, Marián Šimko, Pavol Balážik, Michal Trnka, and Filip Uhlárik. Slovakbert: Slovak masked language model, 2021.

[2] Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation, 2024.

[3] Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. Multi-lingual e5 text embeddings: A technical report, 2024.

[4] Harrison Chase. Langchain. https://github.com/langchain-ai/langchain, 2022. Accessed: May 2024.

[5] Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan. Query rewriting for retrieval-augmented large language models, 2023.

[6] Rolf Jagerman, Honglei Zhuang, Zhen Qin, Xuanhui Wang, and Michael Bendersky. Query expansion by prompting large language models, 2023.


Devana: Výzva na podávanie projektov pre štandardný prístup k HPC 1/25 19 Jan - Výzva na podávanie projektov pre štandardný prístup k HPC 1/25 je otvorená. Projekty je možné podávať v rámci otvorených výziev 3 krát do roka. O prístup je možné požiadať výhradne prostredníctvom používateľského portálu register.nscc.sk.
Odborná konferencia Superpočítač a Slovensko v Bratislave 15 Nov - Dňa 14. novembra 2024 sa v hoteli Devín v Bratislave uskutočnila odborná konferencia s názvom Superpočítač a Slovensko, ktorú zorganizovalo Ministerstvo investícií, regionálneho rozvoja a informatizácie SR. Konferencia sa zameriavala na aktuálne trendy a vývoj v oblasti vysokorýchlostného počítania na Slovensku. Súčasťou podujatia bola prezentácia L. Demovičovej z Národného kompetenčného centra pre HPC.
Konferencia vysokovýkonného počítania v Portugalsku 12 Nov - V poradí 4. Stretnutie vysokovýkonného počítania 2024, ktoré sa konalo 5. a 6. novembra na Univerzite Beira Interior v Covilhã, sa etablovalo ako kľúčové stretnutie používateľov, technikov a partnerov ekosystému vysokovýkonného počítania v Portugalsku.
Kategórie
General

Starmus Festival: Science, Art and AI

Starmus Festival: Science, Art and AI

The Starmus Science Festival, known for its unique combination of scientific lectures, music, and art, was held this year in May in Bratislava. This unique festival attracted numerous science enthusiasts who enjoyed inspirational presentations and discussions with leading scientists from around the world.

The festival was not just about passively watching lectures. Participants had the opportunity to engage directly in various scientific demonstrations and interactive activities that provided practical examples of scientific principles. The demonstration booths were popular spots where visitors could try out different experiments and technologies.

In the panel discussions, experts also addressed ethical questions and the societal impact of new technologies. These discussions provided a deeper understanding of the challenges we face in connection with rapid technological advancement. One of the most intriguing lectures of the festival was given by Neil Lawrence, titled "What Makes Us Unique in the Age of AI." Neil Lawrence, a renowned scientist in the field of artificial intelligence, covered a wide range of topics related to our uniqueness in an era of rapid AI development. He discussed how we can preserve human values and abilities at a time when artificial intelligence is increasingly penetrating our lives. The lecture was inspiring and provided deep insights into the future of human-technology interaction.

Neil Lawrence spoke about the importance of an interdisciplinary approach in science and technological progress. His presentation emphasized how crucial collaboration between different fields is to achieving significant scientific discoveries. He pointed out that the combination of various scientific disciplines can lead to new and groundbreaking insights.

Another part of the lecture focused on the latest discoveries in space. Lawrence used visualizations and animations to explain complex concepts in a simple way, which appealed to the general audience. During the lecture, he also addressed the history of space exploration. He illustrated his points with numerous historical photographs and videos, which added an authentic and informative character to his presentation. Another significant topic was breakthroughs in genetics and biotechnology. Lawrence explained how these new technologies have the potential to treat previously incurable diseases and improve the quality of life for many people. In discussions about artificial intelligence, he emphasized its ability to transform various sectors, including medicine, transportation, and education. He stressed the importance of ethics and responsibility in the development and implementation of AI technologies.

Another crucial point in his lecture was the advancements in renewable energy sources and sustainability. He highlighted the importance of investing in solar and wind energy and innovative technologies that can help reduce the carbon footprint. The discussion also focused on global initiatives and cooperation between different countries to address climate change. Lawrence also focused on the latest technologies in medicine, particularly AI, explaining how artificial intelligence helps doctors diagnose diseases more quickly and accurately. He also spoke about how new technologies enable personalized medicine tailored to individual patients.

Another topic was quantum technology in computers and communication. He emphasized how quantum computers can revolutionize various sectors, including medicine and finance, by allowing faster and more efficient information processing. He also discussed the importance of oceans for our planet and the need to protect them. He warned about threats like pollution and climate change that endanger the marine ecosystem and stressed the need for international cooperation in ocean protection.

Finally, he addressed the issue of space debris and its impact on future space missions. He discussed technologies and strategies being developed to address the growing problem of debris in orbit. The final part of the lecture focused on the challenges and benefits of integrating neuroscience and artificial intelligence. He discussed how AI can help understand and treat neurological disorders and how studying the brain can contribute to the development of smarter AI systems.

Lawrence also talked about ecological innovations and their potential to change the way we live and work. He discussed how new technologies can contribute to sustainable development and reduce the negative impact on the environment. He also addressed the development of space technologies and their potential to improve life on Earth. He spoke about how space research contributes to progress in areas such as material sciences, energy, and communication.

The last discussion focused on the importance of education in science and technology for future generations. Lawrence emphasized the need for investments in educational programs that promote critical thinking and innovative solutions to global problems. He also discussed virtual reality (VR) technology and its applications in education and healthcare. He explained how VR can enhance learning by providing immersive and interactive environments and how it can help patients in rehabilitation and therapy.

The latest information emphasized the importance of interdisciplinary research and collaboration between different scientific fields. Lawrence explained how the combination of various expertise can lead to innovative solutions to complex global problems such as climate change and health crises (pandemics).

Scientific and Artistic Projects

The festival also brought a discussion about the future of art and science. Collaboration between artists and scientists was showcased through various multimedia projects that demonstrated how these two worlds can come together to create innovative and inspiring works.

The Starmus festival is not only a celebration of science but also a platform for sharing knowledge and inspiration. This year's edition in Bratislava once again highlighted the importance of dialogue between science and the public. It allowed scientists, artists, and the general public to meet, discuss, and jointly seek solutions to current global challenges. We are already looking forward to the next editions and the new discoveries they will bring.

About Starmus Festival

Starmus is a festival of science, art, and music created by Garik Israelian, PhD., an astrophysicist from the Institute of Astrophysics of the Canary Islands (IAC), and Sir Brian May, PhD., an astrophysicist and the lead guitarist of the iconic rock band Queen. It consists of presentations by astronauts, cosmonauts, Nobel laureates, thinkers, and prominent figures from various scientific and musical fields. Starmus brings these exceptional people together to share their knowledge and experiences and to jointly seek answers to humanity's big questions.

Stephen Hawking Medal for Science Communication

In 2015, Stephen Hawking and Alexei Leonov, along with Brian May, created the Stephen Hawking Medal for Science Communication, awarded to individuals and teams for significant contributions to science communication. Previous recipients of the Stephen Hawking Medal include Dr. Jane Goodall, Elon Musk, Neil deGrasse Tyson, Brian Eno, Hans Zimmer, and the documentary Apollo 11.

This year's Starmus festival brought a wealth of new knowledge and inspiration that will resonate with participants and the general public for a long time. The festival once again confirmed its important role in promoting science, art, and education worldwide.

Kategórie
General

Real-life Examples of HPC Utilization in Poland, Czech Republic, and Slovakia

Real-life Examples of HPC Utilization in Poland, Czech Republic, and Slovakia

Today, we hosted an informative webinar that highlighted the potential of high-performance computing (HPC) through real-life success stories and engaging projects supported by National Competence Centers for HPC. In addition to examples from the Slovak NCC, the webinar also showcased the expertise and experiences of neighboring competence centers in the Czech Republic and Poland.

The speakers included Michal Pitoňák from the Slovak National Supercomputing Center, Tomáš Karásek from   IT4Innovations National Supercomputing Center in the Czech Republic, and Szymon Mazurek from AGH University in Krakow.

Michal Pitoňák shared experiences from four successful HPC use cases, including the transfer and optimization of CFD computational workflows in the HPC environment, anomaly detection in time series to prevent gambling using deep learning, entity identification for address extraction from transcribed interviews using synthetic data, and measurement of structural parameters of capsules using AI and ML techniques. Tomáš Karásek presented examples of using artificial intelligence to solve engineering problems focused on energy and transportation. Szymon Mazurek introduced the SpeakLeash initiative, a community-driven project to develop a national large language model (LLM) ecosystem in Poland.

Don't miss the next opportunity to get inspired and discover how HPC can support innovation and success in your projects.

Join us for the next webinar on September 4th from 14:00 to 15:30.

Register on our website EuroCC Slovakia and secure your spot at this event full of inspiring and practical insights.

We look forward to your participation!

Kategórie
Success-Stories General

Mapping Tree Positions and Heights Using PointCloud Data Obtained Using LiDAR Technology

Mapping Tree Positions and Heights Using PointCloud Data Obtained Using LiDAR Technology

The goal of the collaboration between the Slovak National Supercomputing Centre (NSCC) and the company SKYMOVE within the National Competence Center for HPC project was to design and implement a pilot software solution for processing data obtained using LiDAR (Light Detection and Ranging) technology mounted on drones.

Data collection

LiDAR is an innovative method of remote distance measurement that is based on measuring the travel time of laser pulse reflections from objects. LiDAR emits light pulses that hit the ground or object and return to the sensors. By measuring the return time of the light, LiDAR determines the distance to the point where the laser beam was reflected. 

LiDAR can emit 100k to 300k pulses per second, capturing dozens to hundreds of pulses per square meter of the surface, depending on specific settings and the distance to the scanned object. This process creates a point cloud (PointCloud) consisting of potentially millions of points. Modern LiDAR use involves data collection from the air, where the device is mounted on a drone, increasing the efficiency and accuracy of data collection. In this project, drones from DJI, particularly the DJI M300 and Mavic 3 Enterprise (Fig. 1), were used for data collection. The DJI M300 is a professional drone designed for various industrial applications, and its parameters make it suitable for carrying LiDAR.

The DJI M300 drone was used as a carrier for the Geosun LiDAR (Fig. 1). This is a mid-range, compact system with an integrated laser scanner, a positioning, and orientation system. Given the balance between data collection speed and data quality, the data was scanned from a height of 100 meters above the surface, allowing for the scanning of larger areas in a relatively short time with sufficient quality.

The collected data was geolocated in the S-JTSK coordinate system (EPSG:5514) and the Baltic Height System after adjustment (Bpv), with coordinates given in meters or meters above sea level. In addition to LiDAR data, aerial photogrammetry was performed simultaneously, allowing for the creation of orthophotomosaics. Orthophotomosaics provide a photographic record of the surveyed area in high resolution (3 cm/pixel) with positional accuracy up to 5 cm. The orthophotomosaic was used as a basis for visual verification of the positions of individual trees.

Figure 1. DJI M300 Drone (left) and Geosun LiDAR (right).

Data classification

The primary dataset used for the automatic identification of trees was a LiDAR point cloud in LAS/LAZ format (uncompressed and compressed form). LAS files are a standardized format for storing LiDAR data, designed to ensure efficient storage of large amounts of point data with precise 3D coordinates. LAS files contain information about position (x, y, z), reflection intensity, point classification, and other attributes necessary for LiDAR data analysis and processing. Due to their standardization and compactness, LAS files are widely used in geodesy, cartography, forestry, urban planning, and many other fields requiring detailed and accurate 3D representations of terrain and objects.

The point cloud needed to be processed into a form that would allow for an easy identification of individual tree or vegetation points. This process involves assigning a specific class to each point in the point cloud, known as classification.

Various tools can be used for point cloud classification. Given our positive experience, we decided to use the Lidar360 software from GreenValley International [1]. In the point cloud classification, the individual points were classified into the following categories: unclassified (1), ground (2), medium vegetation (4), high vegetation (5), buildings (6). A machine learning method was used for classification, which, after being trained on a representative training sample, can automatically classify points of any input dataset (Fig. 2).

The training sample was created by manually classifying points in the point cloud into the respective categories. For the purposes of automated tree identification in this project, the ground and high vegetation categories are essential. However, for the best classification results of high vegetation, it is also advisable to include other classification categories. The training sample was composed of multiple smaller areas from the entire region including all types of vegetation, both deciduous and coniferous, as well as various types of buildings. Based on the created training sample, the remaining points of the point cloud were automatically classified. It should be noted that the quality of the training sample significantly affects the final classification of the entire area.


Figure 2. Example of a point cloud of an area colored using an orthophotomosaic (left) and the corresponding classification (right) in CloudCompare.

Data segmentation

In the next step, the classified point cloud was segmented using the CloudCompare software [2]. Segmentation generally means dividing classified data into smaller units – segments that share common characteristics. The goal of segmenting high vegetation was to assign individual points to specific trees.

For tree segmentation, the TreeIso plugin in the CloudCompare software package was used, which automatically recognizes trees based on various height and positional criteria (Fig. 3). The overall segmentation consists of three steps:

  1. Grouping points that are close together into segments and removing noise.
  2. Merging neighboring point segments into larger units.
  3. Composing individual segments into a whole that forms a single tree.

The result is a complete segmentation of high vegetation. These segments are then saved into individual LAS files and used for further processing to determine the positions of individual trees. A significant drawback of this tool is that it operates only in serial mode, meaning it can utilize only one CPU core, which greatly limits its use in an HPC environment.

Obrázok, na ktorom je snímka obrazovky, softvér, grafický softvér, multimediálny softvérAutomaticky generovaný popis
Figure 3. Segmented point cloud in CloudCompare using the TreeIso plugin module.

As an alternative method for segmentation, we explored the use of orthophotomosaics of the studied areas. Using machine learning methods, we attempted to identify individual tree crowns in the images and, based on the geolocational coordinates determined, identify the corresponding segments in the LAS file. For detecting tree crowns from the orthophotomosaic, the YOLOv5 model [3] with pretrained weights from the COCO128 database [4] was used. The training data consisted of 230 images manually annotated using the LabelImg tool [5]. The training unit consisted of 300 epochs, with images divided into batches of 16 samples, and their size was set to 1000x1000 pixels, which proved to be a suitable compromise between computational demands and the number of trees per section. The insufficient quality of this approach was particularly evident in areas with dense vegetation (forested areas), as shown in Figure 4. We believe this was due to the insufficient robustness of the chosen training set, which could not adequately cover the diversity of image data (especially for different vegetative periods). For these reasons, we did not develop segmentation from photographic data further and focused solely on segmentation in the point cloud.

Figure 4. Tree segmentation in the orthophotomosaic using the YOLOv5 tool. The image illustrates the problem of detecting individual trees in the case of dense vegetation (continuous canopy).

To fully utilize the capabilities of the Devana supercomputer, we deployed the lidR library [6] in its environment. This library, written in R, is a specialized tool for processing and analyzing LiDAR data, providing an extensive set of functions and tools for reading, manipulating, visualizing, and analyzing LAS files. With lidR, tasks such as filtering, classification, segmentation, and object extraction from point clouds can be performed efficiently. The library also allows for surface interpolation, creating digital terrain models (DTM) and digital surface models (DSM), and calculating various metrics for vegetation and landscape structure. Due to its flexibility and performance, lidR is a popular tool in geoinformatics and is also suitable for HPC environments, as most of its functions and algorithms are fully parallelized within a single compute node, allowing for full utilization of available hardware. When processing large datasets where the performance or capacity of a single compute node is insufficient, splitting the dataset into smaller parts and processing them independently can leverage multiple HPC nodes simultaneously.

The lidR library includes the locate_trees() function, which can reliably identify tree positions. Based on selected parameters and algorithms, the function analyzes the point cloud and identifies tree locations. In our case, the lmf algorithm, based on maximum height localization, was used [7]. The algorithm is fully parallelized, enabling efficient processing of relatively large areas in a short time.

The identified tree positions can then be used in the silva2016 algorithm for segmentation with the segment_trees() function [8]. This function segments the identified trees into separate LAS files (Fig. 5), similar to the TreeIso plugin module in CloudCompare. These segmented trees in LAS files are then used for further processing, such as determining the positions of individual trees using the DBSCAN clustering algorithm [9].

Figure 5. Tree positions determined using the lmf algorithm (left, red dots) and corresponding tree segments identified by the silva2016 algorithm (right) using the lidR library. 

Detection of tree trunks using the DBSCAN clustering algorithm

To determine the position and height of trees in individual LAS files obtained from segmentation, we used various approaches. The height of each tree was obtained based on the z-coordinates for each LAS file as the difference between the minimum and maximum coordinates of the point clouds. Since some point cloud segments contained more than one tree, it was necessary to identify the number of tree trunks within these segments.

Tree trunks were identified using the DBSCAN clustering algorithm with the following settings: maximum distance between two points within one cluster (= 1 meter) and minimum number of points in one cluster (= 10). The position of each identified trunk was then obtained based on the x and y coordinates of the cluster centroids. The identification of clusters using the DBSCAN algorithm is illustrated in Figure 6.

Figure 6. Segments of the point cloud, PointCloud (left column), and the corresponding detected clusters at heights of 1-5 meters (right column).

Determining tree heights using surface interpolation

As an alternative method for determining tree heights, we used the Canopy Height Model (CHM). CHM is a digital model that represents the height of the tree canopy above the terrain. This model is used to calculate the height of trees in forests or other vegetative areas. CHM is created by subtracting the Digital Terrain Model (DTM) from the Digital Surface Model (DSM). The result is a point cloud, or raster, that shows the height of trees above the terrain surface (Fig. 7).

If the coordinates of tree's position are known, we can easily determine the corresponding height of the tree at that point using this model. The calculation of this model can be easily performed using the lidR library with the grid_terrain() function, which creates the DTM, and the grid_canopy() function, which calculates the DSM.

Figure 7. Canopy Height Model (CHM) for the studied area (coordinates in meters on the X and Y axes), with the height of each point in meters represented using a color scale.

Comparison of results

To compare the results achieved by the approaches mentioned before, we focused on the Petržalka area in Bratislava, where manual measurements of tree positions and heights had already been conducted. From the entire area (approximately 3500x3500 m), we selected a representative smaller area of 300x300 m (Fig. 2). We obtained results for the TreeIso plugin module in CloudCompare (CC), working on a PC in a Windows environment, and results for the locate_trees() and segment_trees() algorithms using the lidR library in the HPC environment of the Devana supercomputer. We qualitatively and quantitatively evaluated the tree positions using the Munkres (Hungarian Algorithm) [10] for optimal matching. The Munkres algorithm, also known as the Hungarian Algorithm, is an efficient method for finding the optimal matching in bipartite graphs. Its use in matching trees with manually determined positions means finding the best match between trees identified from LiDAR data and their known positions. By setting an appropriate distance threshold in meters (e.g., 5 m), we can qualitatively determine the number of accurately identified tree positions. The results are processed using histograms and percentage accuracy of tree positions depending on the chosen precision threshold (Fig. 8). We found that both methods achieve almost the same result at a 5-meter distance threshold, approximately 70% accurate tree positions. The method used in CloudCompare shows better results, i.e., a higher percentage at lower threshold values, as reflected in the corresponding histograms (Fig. 8). When comparing both methods, we achieve up to approximately 85% agreement at a threshold of up to 5 meters, indicating the qualitative parity of both approaches. The quality of the results is mainly influenced by the accuracy of vegetation classification in point clouds, as the presence of various artifacts incorrectly classified as vegetation distorts the results. Tree segmentation algorithms cannot eliminate the impact of these artifacts.

Figure 8. The histograms on the left display the number of correctly identified trees depending on the chosen distance threshold in meters (top: CC – CloudCompare - method, bottom: lidR method). The graphs on the right show the percentage success rate of correctly identified tree positions based on the method used and the chosen distance threshold in meters.

Parallel efficiency analysis of the locate_trees() algorithm in the lidR library

To determine the efficiency of parallelizing the locate_trees() algorithm in the lidR library, we applied the algorithm to the same study area using different numbers of CPU cores – 1, 2, 4, up to 64 (the maximum of the compute node of Devana HPC system). To assess sensitivity to problem size, we tested it on three areas of different sizes – 300x300, 1000x1000, and 3500x3500 meters. The times measured are shown in Table 1, and the scalability of the algorithm is illustrated in Figure 9. The results show that the scalability of the algorithm is not ideal. When using approximately 20 CPU cores, the algorithm's efficiency drops to about 50%, and with 64 CPU cores, the efficiency is only 15-20%. The efficiency is also affected by the problem size – the larger the area, the lower the efficiency, although this effect is not as pronounced. In conclusion, for effective use of the algorithm, it is suitable to use 16-32 CPU cores and to achieve maximum efficiency of the available hardware by appropriately dividing the study area into smaller parts. Using more than 32 CPU cores is not efficient but still allows for further acceleration of the computation.

Figure 9. SpeedUp of the lmf algorithm in the locate_trees() function of the lidR library depending on the number of CPU cores (NCPU)CPU) a veľkosti študovaného územia (v metroch).

Final evaluation

We found that achieving good results requires carefully setting the parameters of the algorithms used, as the number and quality of the resulting tree positions depend heavily on these settings. If obtaining the most accurate results is the goal, a possible strategy would be to select a representative part of the study area, manually determine the tree positions, and then adjust the parameters of the respective algorithms. These optimized settings can then be used for the analysis of the entire study area.

The quality of the results is also influenced by various other factors, such as the season, which affects vegetation density, the density of trees in the area, and the species diversity of the vegetation. The quality of the results is further impacted by the quality of vegetation classification in the point cloud, as the presence of various artifacts, such as parts of buildings, roads, vehicles, and other objects, can negatively affect the results. The tree segmentation algorithms cannot always reliably filter out these artifacts.

Regarding computational efficiency, we can conclude that using an HPC environment provides a significant opportunity for accelerating the evaluation process. For illustration, processing the entire study area of Petržalka (3500x3500 m) on a single compute node of the Devana HPC system took approximately 820 seconds, utilizing all 64 CPU cores. Processing the same area in CloudCompare on a powerful PC using a single CPU core took approximately 6200 seconds, which is about 8 times slower.

Full version of the article SK
Full version of the article EN

Authors
Marián Gall – Slovak National Supercomputing Centre
Michal Malček – Slovak National Supercomputing Centre
Lucia Demovičová – Centrum spoločných činností SAV v. v. i., organizačná zložka Výpočtové stredisko
Dávid Murín – SKYMOVE s. r. o.
Robert Straka – SKYMOVE s. r. o.

References::

[1] https://www.greenvalleyintl.com/LiDAR360/

[2] https://github.com/CloudCompare/CloudCompare/releases/tag/v2.13.1

[3] https://github.com/ultralytics/yolov5

[4] https://www.kaggle.com/ultralytics/coco128

[5] https://github.com/heartexlabs/labelImg

[6] Roussel J., Auty D. (2024). Airborne LiDAR Data Manipulation and Visualization for Forestry Applications.

[7] Popescu, Sorin & Wynne, Randolph. (2004). Seeing the Trees in the Forest: Using Lidar and Multispectral Data Fusion with Local Filtering and Variable Window Size for Estimating Tree Height. Photogrammetric Engineering and Remote Sensing. 70. 589-604. 10.14358/PERS.70.5.589.

[8] Silva C. A., Hudak A. T., Vierling L. A., Loudermilk E. L., Brien J. J., Hiers J. K., Khosravipour A. (2016). Imputation of Individual Longleaf Pine (Pinus palustris Mill.) Tree Attributes from Field and LiDAR Data. Canadian Journal of Remote Sensing, 42(5). 

[9] Ester M., Kriegel H. P., Sander J., Xu X.. KDD-96 Proceedings (1996) pp. 226–231

[10] Kuhn H. W., “The Hungarian Method for the assignment problem”, Naval Research Logistics Quarterly, 2: 83–97, 1955


Devana: Výzva na podávanie projektov pre štandardný prístup k HPC 1/25 19 Jan - Výzva na podávanie projektov pre štandardný prístup k HPC 1/25 je otvorená. Projekty je možné podávať v rámci otvorených výziev 3 krát do roka. O prístup je možné požiadať výhradne prostredníctvom používateľského portálu register.nscc.sk.
Odborná konferencia Superpočítač a Slovensko v Bratislave 15 Nov - Dňa 14. novembra 2024 sa v hoteli Devín v Bratislave uskutočnila odborná konferencia s názvom Superpočítač a Slovensko, ktorú zorganizovalo Ministerstvo investícií, regionálneho rozvoja a informatizácie SR. Konferencia sa zameriavala na aktuálne trendy a vývoj v oblasti vysokorýchlostného počítania na Slovensku. Súčasťou podujatia bola prezentácia L. Demovičovej z Národného kompetenčného centra pre HPC.
Konferencia vysokovýkonného počítania v Portugalsku 12 Nov - V poradí 4. Stretnutie vysokovýkonného počítania 2024, ktoré sa konalo 5. a 6. novembra na Univerzite Beira Interior v Covilhã, sa etablovalo ako kľúčové stretnutie používateľov, technikov a partnerov ekosystému vysokovýkonného počítania v Portugalsku.
Kategórie
General

Workshop: POP3 Profiling and Optimisation Tools

Workshop: POP3 Profiling and Optimisation Tools

We invite you to the interesting event POP3 Profiling and Optimization Tools 46th VI-HPS Tuning Workshop. The event is organized by POP3 CoE in cooperation with the National Competence Centers for HPC from Slovakia, Czechia, Poland and Austria Hungary and Slovenia.

Virtual Institute—High Productivity Supercomputing (VI-HPS) is an initiative that aims to enhance the productivity of supercomputing applications by providing a comprehensive set of tools and methodologies for performance analysis, debugging, and tuning. It brings together expertise and resources from various organisations to support developing and optimising high-performance computing applications.

The workshop is designed to facilitate collaborative learning and application tuning, with a particular emphasis on teams of two or more participants working with the same or closely related application codes the teams are developing.

  • The first day of the workshop introduces participants to the POP Centre of Excellence (CoE), detailing its services, methodology, and tools for performance assessments and second-level services.
  • On the second day, the focus shifts to getting started with open-source multi-platform tools for analysing MPI+OpenMP application executions on CPU architectures.
  •  The third day delves into more advanced usage, including analysing application executions on combined CPU and GPU architectures. During this hands-on workshop, participants will be introduced to the use of Paraver/Extrae and Scalasca/Score-P/CUBE toolsets for CPUs and GPUs.

Paraver/Extrae is a performance analysis toolset designed for tracing and analysing the execution of parallel applications. Extrae captures detailed execution traces, while Paraver provides powerful visualisation and analysis capabilities to help identify performance bottlenecks and optimise parallel code.

Scalasca/Score-P/CUBE is an integrated performance analysis toolkit for parallel applications. Score-P collects performance data in profiles and execution traces, Scalasca analyses and identifies performance issues, and CUBE facilitates exploration of the results, helping developers tune their applications.

Annotation The course is organised in collaboration with POP3 CoE, NCC Austria, NCC Czechia, NCC Hungary, NCC Poland, NCC Slovakia and NCC Slovenia. Virtual Institute—High Productivity Supercomputing (VI-HPS) is an initiative that aims to enhance the productivity of supercomputing applications by providing a comprehensive set of tools and methodologies for performance analysis, debugging, and tuning. It brings together expertise and resources from various organisations to support developing and optimising high-performance computing applications. The workshop is designed to facilitate collaborative learning and application tuning, with a particular emphasis on teams of two or more participants working with the same or closely related application codes the teams are developing. The first day of the workshop introduces participants to the POP Centre of Excellence (CoE), detailing its services, methodology, and tools for performance assessments and second-level services. On the second day, the focus shifts to getting started with open-source multi-platform tools for analysing MPI+OpenMP application executions on CPU architectures. The third day delves into more advanced usage, including analysing application executions on combined CPU and GPU architectures. During this hands-on workshop, participants will be introduced to the use of Paraver/Extrae and Scalasca/Score-P/CUBE toolsets for CPUs and GPUs. Paraver/Extrae is a performance analysis toolset designed for tracing and analysing the execution of parallel applications. Extrae captures detailed execution traces, while Paraver provides powerful visualisation and analysis capabilities to help identify performance bottlenecks and optimise parallel code. Scalasca/Score-P/CUBE is an integrated performance analysis toolkit for parallel applications. Score-P collects performance data in profiles and execution traces, Scalasca analyses and identifies performance issues, and CUBE facilitates exploration of the results, helping developers tune their applications. Additionally, other tools from the POP CoE will be available for participants to utilise throughout the workshop.

Target Audience and Purpose of the Course:
Attendees will learn how to use the parallel performance analysis tools of the Performance Optimisation and Productivity (POP) CoE and a corresponding methodology for applying those tools to assess execution performance and scaling efficiency of their own parallel application codes in a portable fashion.

Level
Intermediate/advanced, as no knowledge of any parallel performance tools is required (though serial code profiling experience is advantageous). However, participants are expected to be familiar with building/running (potentially hybrid, GPU-enabled) parallel applications.

Course format
The hands-on parts will only be available for on-site participants, who should bring their codes to work on.

Výukové/prednáškové časti budú dostupné pre neobmedzený počet účastníkov, ktorí sa môžu zúčastniť online.

Prerequisites
Participants should be familiar with one or more parallel programming paradigms, such as MPI and OpenMP (on CPUs), and preferably also the use of OpenMP, OpenACC, CUDA, or similar (for GPUs). When registering for the workshop, participants should report the programming languages and paradigms employed by their application codes, along with relevant framework/library dependencies. Note that applications using AI/ML frameworks such as TensorFlow are unsuitable for this workshop.

Technical requirements
Participants with their own application code(s) should have these installed and running on Karolina supercomputer before the event. Also, a representative execution test case should be prepared, suitable for running on a single node in several minutes. The required tools will be available on Karolina (CPU and GPU partitions). However, participants may also install graphical tools on their own notebook computers. Each participant will get access to the mentioned clusters before the event.

Starts: 4.09.2024. 9:00 CET
Ends: 6.09.2024 17:00 CET
Venue: online and F2F in IT4Innovations v Ostrave

The event will be held in English.

More info about the event
Registration

Kategórie
Calls-Finished General

Call for Ideas: Seeking Slovak SME Partners for FFPlus Project Consortium

Call for Ideas: Seeking Slovak SME Partners for FFPlus Project Consortium

NCC Slovakia hľadá slovenských MSP partnerov na vytvorenie konzorcia pre návrh prestížneho projektu FFPlusis looking for Slovak SME partners to form a consortium for the prestigious FFPlus project proposal. The aim is to leverage High-Performance Computing (HPC) in addressing specific business challenges comprising e.g. modelling and simulation, data analytics, AI, etc. and achieving significant industrial impact.

The selected SMEs can benefit from our or state-of-the-art European Tier-0 HPC infrastructure, code efficiency optimization and/or parallelization, and domain and technical support. The expected output is a Success story in form of a white paper with no obligation in revealing details of the technical solution, or any other proprietary / IP information or data.

What We Offer:

  • HPC Infrastructure: Access to state-of-the-art HPC systems.
  • Technical Support and co-development: Expert guidance in HPC utilization, workflow and code optimization and parallelization.
  • Application Guidance: NCC Slovakia will guide and accompany partners throughout the application process.

Expected Output:

  • White Paper: A short success story documenting the business impact achieved through HPC adoption. Note: There is no open science condition for this output.

Key Focus Areas:

  • Uptake of HPC by SMEs: Targeting businesses with no prior experience in HPC to solve real-world challenges.
  • Positive Business Impact: Demonstrate how HPC adoption leads to tangible business benefits.
  • Diverse Application Domains: Prioritizing projects with the highest business impact potential.

Eligibility:

  • Slovak SMEs: Must have less than 250 employees and an annual turnover of less than 50 million EUR.
  • Non-Research-Oriented: SMEs should be commercially driven, focus on acedemic / fundamental research is not supported.

Project Details:

  • Submission Deadline: September 4th, 2024, 17:00 Brussels local time
  • Project Duration: Maximum of 15 months, starting January 1st, 2025
  • Funding Budget: Total of €4M for all sub-projects
  • Maximum Funding per Experiment: Up to 200 K€, up to 150 K€ per organization in the consortium. Main participant, i.e. SME, can participate in only one experiment.
  • Total maximum number of consortium partners: 5 - main participant and supporting participants

Proposal Expectations:

  • Alignment: Clearly define the business challenge and the necessity of HPC use.
  • Impact: Present the potential positive business impact.
  • Objectives: Set specific, achievable goals.
  • Consortium: Include all necessary parties for effective project execution.
  • Resources and Costs: Outline required resources and associated costs.
  • Data Protection: Address any data protection concerns.
  • Success Stories: Support in generating publishable success stories.

Submission Guidelines:

  • Format: Proposals must be submitted in English and comprise two parts: Part A (administrative information) and Part B (proposal body).
  • Electronic Submission: Proposals must be submitted electronically using the designated submission tool.

Join us in demonstrating the transformative potential of HPC for SMEs. Contact NCC Slovakia today, aby sme mohli vybudovať partnerstvo a spoločne sa uchádzať o tento projekt.


Fortissimo: call for proposal for SME

Kategórie
Calls-Finished General

Fortissimo: call for proposal for SME

Fortissimo: call for proposal for SME

The FFplus project launched a new open call for European small and medium-sized enterprises. They are looking for agile innovative companies that decide to use supercomputers in practice and thus gain a competitive advantage on the market.

The FFplus project is already the fourth continuation of a very successful initiative that directly deals with how to help businesses overcome obstacles in the use of supercomputers and high-performance data analysis in practice or in the work and development of generative AI. The goal is primarily to strengthen the global competitiveness of European industry.

In the past years, dozens of companies from all over Europe that used supercomputers have successfully passed open challenges. Let their stories inspire you, you can find them on the FFplus website.

The FFplus project call is divided into 2 parts:

  1. BUSINESS EXPERIMENTS

The first part of this challenge is intended for businesses with no previous experience with supercomputing across all disciplines. As part of this challenge, companies have the opportunity to submit their "experiments", i.e. projects solving a specific business challenge with the help of supercomputer technologies, high-performance data analysis or artificial intelligence. Estimated duration of the experiment max. 15 months with a planned start on January 1, 2025.

A sum of EUR 4 million will be distributed among all the selected projects for the financing of experiments.

The deadline for submitting applications is September 4, 2024 at 5 p.m.

  1. INNOVATION STUDIES

The second part of the FFplus challenge will support companies and startups that are already active in the field of generative AI and that lack the necessary computing resources to develop their own models. The goal is to facilitate and strengthen the technological development of European companies in the field of AI.

Participating enterprises will be supported to increase their innovation potential by leveraging new generative UI models, such as large language models (LLM), based on their existing expertise, application area, business model and potential for expansion.

Submitted "innovation studies" must use extensive European supercomputing resources (pre-exascale and exascale) to develop and adapt generative AI models (pr. LLM).

A sum of EUR 4 million intended for the financing of experiments will be distributed among all selected sub-projects.

The deadline for submitting applications is September 4, 2024 at 5 p.m.

Are you interested in this opportunity? You can find out more information on the project website.

The experts from the National Competence Center for HPC will be happy to help you with the submission of the project - contact us..