Kategórie
Success-Stories General

Semi-Supervised Learning in Aerial Imagery: Implementing Uni-Match with Frame Field learning for Building Extraction

Semi-Supervised Learning in Aerial Imagery: Implementing Uni-Match with Frame Field learning for Building Extraction

Building extraction in GIS (geographic information system) is pivotal for urban planning, environmental studies, and infrastructure management, allowing for accurate mapping of structures, including the detection of illegal constructions for regulatory compliance. Integrating extracted building data with other geospatial layers enhances the understanding of urban dynamics and spatial relationships. Given the scale and complexity of these tasks, there is a growing need to automate building extraction using deep learning techniques, which offer improved accuracy and efficiency in handling large-scale geospatial data.

illustrative image

State-of-the-art image segmentation models primarily output in raster format, whereas GIS applications often require vector polygons. One such method to meet this requirement is Frame Field learning, which addresses the gap between raster format outputs of image segmentation models and the vector format needed in GIS. This approach significantly enhances the accuracy of building vectorization by aligning with ground truth contours and provide topologically clean vector objects.

These models are trained using a 'supervised learning' method, necessitating a large amount of labeled examples for training. However, obtaining such a significant volume of data can be extremely challenging and expensive. A potential solution to this problem is 'semi-supervised learning,' a method that reduces reliance on labeled data. In semi-supervised learning, the model is trained with a mix of a small set of labeled data and a larger set of unlabeled data. Hence, the goal of this collaboration between the Slovak National Competence Center for High-Performance Computing and Geodeticca Vision s.r.o. was to identify, implement, and evaluate an appropriate semi-supervised method for Frame Field learning.

The aim of this cooperation between the National Competence Center for HPC and Geodeticca Vision s.r.o. was to identify, implement and evaluate a suitable partial tutor learning method for Frame Field learning.

Methods
Frame Field learning

The key idea of the frame field learning [1] is to help the polygonization method in solving ambiguous cases caused by discrete probability maps (output from image segmentation models). This is accomplished by introducing an additional output to the neural network of image segmentation, namely a frame field (see. Fig. 1), which represents the structural features and geometrical characteristics of the building.

Frame fields

Frame field is a 4-PolyVector field that assigns four vectors to each point on a plane. Specifically, the first two vectors are constrained to be opposite to the other two, meaning each point is assigned a set of vectors {u, −u, v, −v}. This approach is particularly necessary for buildings, as they are regular structures with sharp corners, and capturing directionality at these sharp corners requires two directions.


Figure 1: Visualization of the frame field output on the image from training set [1].

Frame Field learning

Figure 2: Diagram of the frame field learning [1]

The learning process of frame fields can be summarized as follows:

  1. The network's input is a 3×H×W RGB image.
  2. To generate a feature map, any deep segmentation model could be used, such as U-Net, which is then processed to output detailed segmentation maps.
  3. The training is supervised with ground truth rasterized polygons for interiors and edges, utilizing a mix of cross-entropy and Dice loss for accurate segmentation.
  4. To train the frame field, three losses are used:
    1. Lalign enforces alignment of the frame field to the tangent direction.
    1. Lalign90 prevents the frame field from collapsing to a line field.
    1. Lsmooth measures the smoothness of the frame field.
  5. Additional losses, regularization losses, are introduced to maintain output consistency, aligning the spatial gradients of the predicted maps with the frame field.

Vectorization

Figure 3: Visualization of the vectorization process [1]

The vectorization process transforms classified raster images into vector polygons using a polygonization method using the Active Skeleton Model (ASM). The principle of this algorithm is the iterative shifting of the vertices of the skeleton graph to their ideal positions. This method optimizes a skeleton graph - a network of pixels outlining the building's structure - created by a thinning method applied on a building wall probability map. The iterative shifting is controlled by a gradient optimization method aimed at minimizing an energy function, which includes specific components related to the structure and geometry being analyzed:

  1. Eprobability – fits the skeleton paths to the contour of the building interior probability map at a certain probability threshold, e.g. 0.5
  2. Eframe field align aligns each edge of the skeleton graph to the frame field.
  3. Elength ensures that the node distribution along paths remains homogeneous as well as tight.

UniMatch semi-supervised learning

UniMatch [2], an advanced semi-supervised learning method in the consistency regularization category, builds upon the foundational principles established by FixMatch [3], a baseline method in this domain. primarily operates on the principle of pseudo-labeling combined with consistency regularization.

The basic principle of the FixMatch method involves generating pseudo-labels for unlabeled data from the predictions of a neural network. Specifically, for a weakly perturbed unlabeled input xw , a prediction pwpw is generated, which serves as a pseudo-label for the prediction of xwith, a strongly perturbed input. Subsequently, the loss function value, for example, cross-entropypw,  pwithis calculated, considering only areas from pwpw with a probability value greater than a certain threshold, e.g., >0.95. 

UniMatch builds upon and extends the FixMatch methodology, introducing two core enhancements:

  1. UniPerb (Unified Perturbations for Images and Features) - This involves applying perturbations at the feature level. Practically, this means applying a dropout function to the output (i.e., the feature) from the encoder layer of the neural network, randomly ignore features, which then proceed to the decoder part of the network, generating pfp.
  2. Instead of using one strong perturbation, two perturbations are utilized. xs1 from xs2.
Figure 4: (a) The FixMatch baseline (b) used UniMatch method. The FP denotes feature pertubation, w and s means weak and strong pertubation, respectively [2].

Ultimately, there are three error functions: crossentropy(pw,  pfp), cross-entropy(pw,  ps1), cross-entropy(pw,  ps2These are then linearly combined with the supervised error function.

Táto metóda v súčasnosti patrí medzi state-of-the-art metódy učenia s čiastočným učiteľom. Hlavnou výhodou tejto metódy je jej jednoduchosť pri implementácií a nevýhodou je jej citlivosť na výber vhodnej slabej a silnej perturbácie.

Integrating UniMatch Semi-Supervised Learning with Frame Field Learning

Implementation Strategy for UniMatch in Frame Field Learning

To integrate UniMatch into our Frame Field learning framework, we first differentiated between weak and strong perturbations. For weak perturbations, we chose basic spatial transformations such as rotation, mirroring, and vertical/horizontal flips. These are well-suited for aerial imagery and straightforward to implement.

For strong perturbations, we opted for photometric transformations. These include adjustments in hue, color, and brightness, providing a more significant alteration to the images compared to spatial transformations. 

Incorporating feature perturbation loss was a crucial step. We implemented this by introducing a dropout mechanism between the encoder and decoder parts of the network. This dropout selectively omits features at the feature level, which is essential for the UniMatch approach.

Regarding the dual-stream perturbations of UniMatch, we adapted our model to handle two types of strong perturbations. The dual-stream approach involves using the weak perturbation prediction as a pseudo-label and training the model using the strong perturbation predictions as loss functions. We have two strong perturbations, hence the term 'dual-stream'. Each of these perturbations contributes to the overall robustness and effectiveness of the model in semi-supervised learning scenarios, especially in the context of building extraction from complex aerial imagery.

Prostredníctvom týchto úprav bola UniMatch metóda úspešne integrovaná do Frame Field learning algoritmu, čím sa zvýšila jeho schopnosť efektívne spracúvať a učiť sa z anotovaných a hlavne neanotovaných dát.

Experiments
Dataset
Labeled Data

Our labeled data comes from three different sources, which we'll detail in the accompanying Table 1.

Table 1: Overview of 3 data sources of labeled data used for training the models with details.

Unlabeled Data

For the unlabeled dataset, we selected high-quality aerial images from Geodetický a kartografický ústav (GKÚ) [6], available for free public use. We specifically targeted a diverse area of 7000 km2ensuring a wide representation of various landscapes and urban settings.

Data Processing: Patching

We processed both labeled and unlabeled images into patches of size 320x320 px. This patch size is specifically chosen to match the input requirements of our neural network. From the labeled data, this process resulted in approximately 55,000 patches. Similarly, from the unlabeled dataset, we obtained around 244,000 patches.

Training setup
Model Architecture

We designed our model using a U-Net architecture with an EfficientNet-B4 backbone. This combination provides a good balance of accuracy and efficiency, crucial for handling the complexity of our segmentation tasks. The EfficientNet-B4 backbone was specifically chosen for its optimal balance between memory usage and performance. In Frame Field learning, U-Net architecture has been shown to be highly effective, as evidenced by its strong performance in prior studies.

Training Process

For training, we used the AdamW optimizer, which combines the advantages of Adam optimization with weight decay, aiding in better model generalization. To prevent overfitting, we implemented L2 regularization. Additionally, we used the ReduceLROnPlateau learning rate scheduler. This scheduler adjusts the learning rate based on validation loss, ensuring efficient training progress.

Semi-Supervised Learning Adjustments

A key aspect of our training was adjusting the ratio of unlabeled to labeled patches. We experimented with ratios ranging from 1:1 to 1:5 (labeled:unlabeled). This variability allowed us to explore the impact of different amounts of unlabeled data on the learning process. It enabled us to identify the optimal balance for training our model, ensuring effective learning while leveraging the advantages of semi-supervised learning in handling large and diverse datasets.

Model evaluation

In our evaluation of the building footprint extraction model, we chose metrics that precisely measure how well our predictions align with real-world structures.

Intersection over Union (IoU)

Kľúčovou metrikou, ktorú sme využívali je metrika s názvom Intersection over Union (IoU). Počíta zhodu medzi predikciami modelu a skutočným tvarom budov. Hodnota skóre IoU blízka 1 znamená, že naše predikcie sú podobné skutočným budovám. Táto metrika je nevyhnutná na posúdenie geometrickej presnosti pre segmentované oblasti, pretože odráža presnosť vytýčenia hraníc budov. Okrem toho, vyhodnotením pomeru správne predikovanej oblasti ku kombinovanej oblasti (zjednotenie oblasti predikcie a skutočnej oblasti), nám IoU poskytuje jasnú mieru efektivity modelu v zachytávaní skutočného kontextu a tvaru budov v komplexnej mestskej krajine.

Precision, Recall and F1

Precision measures the accuracy of the model's building predictions, indicating the proportion of correctly identified buildings out of all identified buildings, thereby reflecting the model's specificity. Recall assesses the model's ability to capture all actual buildings, with a high recall score highlighting its sensitivity in detecting buildings. The F1 Score combines precision and recall into a single metric, offering a balanced view of the model's performance by ensuring that high scores result from both high precision and high recall.

Complexity Aware IoU (cIoU)

We also utilized Complexity Aware IoU (cIoU) [7]. This metric addresses a shortfall in IoU by balancing segmentation accuracy and the complexity of the polygon shapes. While IoU alone can lead models to create overly complex polygons, cIoU ensures that the complexity of the polygons (number of vertices) is kept realistic, reflecting the typically less complex structure of real buildings.

N Ratio Metric

The N ratio metric was an additional component of our evaluation strategy. It contrasts the number of vertices in our predicted shapes with those in the actual buildings [7]. This helps in understanding whether our model accurately replicates the detailed structure of the buildings.

Max Tangent Angle Error

To ensure clean geometry in building extraction tasks, accurately measuring contour regularity is essential. The Max Tangent Angle Error (MTAE) [1] metric is designed to address this need by supplementing the Intersection over Union (IoU) metric. It specifically targets the limitation of IoU, where segmentations with rounded corners may receive higher scores than those with more precise, sharp corners. By evaluating the alignment of edges through the comparison of tangent angles at sampled points along predicted and ground truth contours, MTAE effectively penalizes inaccuracies in edge orientation. This focus on edge precision is critical for producing clean vector representations of buildings, emphasizing the importance of accurate edge delineation in segmentation tasks.

Evaluation Process

Natrénované modely boli testované na veľkej dátovej množne leteckých snímok v plnej veľkosti (namiesto malých častí, pomocou ktorých bola sieť trénovaná). Takéto testovanie poskytuje presnejšie zobrazenie reálnych použití takýchto modelov. Na extrakciu budov zo snímok v plnej veľkosti sme použili techniku posuvného okna, čím boli vytvorené predikcie po jednotlivých segmentoch obrázku. Na okraje prekrývajúcich sa segmentov bola použitá pokročilá priemerovacia technika, dôležitá pre minimalizáciu nežiadúcich efektov a zachovanie konzistentnosti v rámci predikčnej mapy. Výstupná predikčná mapa v plnej veľkosti bola následne vektorizovaná do presných vektorových polygónov s použitím algoritmu Active Skeleton Model (ASM).

Results

Tabuľka 2: Výsledky trénovania modelov pre základný prístup (učenie s učiteľom) a prístupy učenia s čiastočným učiteľom s rôznymi podielmi použitých anotovaných a neanotovaných obrázkov.

The results from our experiments, reflecting performance of segmentation model trained under different conditions, reveal significant insights (see Table 2). We evaluated the model's performance in a baseline scenario without semi-supervised learning and in scenarios where semi-supervised learning was applied with varying ratios of labeled to unlabeled data (1:1, 1:3, and 1:5).

  1. IoU: Starting from the baseline IoU of 80.50%, we observed a steady increase in this metric as we introduced more unlabeled data into the training process, reaching up to 85.77% with a 1:5 labeled to unlabeled ratio
  2. 2. Precision, Recall, and F1 Score: The precision of the model, which measures how accurate the predictions are, improved from 85.75% in the baseline to 90.04% in the 1:5 ratio setup. Similarly, recall, which indicates how well the model can find all relevant instances, slightly increased from 94.27% to 94.76%. The F1 Score, which balances precision and recall, also saw an improvement from 89.81% to 92.34%. These improvements suggest that the model became more accurate and reliable in its predictions when semi-supervised learning was used.
  3. N Ratio a cIoU: The results show a notable decrease in the N Ratio from 2.33 in the baseline to 1.65 in the semi-supervised 1:5 ratio setup, indicating that the semi-supervised model generates simpler, yet accurate, vector shapes that more closely resemble the actual structures. This simplification likely contributes to the enhanced usability of the output in practical GIS applications. Concurrently, the complexity-aware IoU (cIoU) significantly improved from 48.89% in the baseline to 64.75% in the 1:5 ratio, suggesting that the semi-supervised learning approach not only improves the overlap between the predicted and actual building footprints but also produces simpler vector shapes, which are closer to real-world buildings in terms of geometry.
  4. Mean Max Tangent Angle Error MTAE: The Mean MTAE's reduction from 18.60° in the baseline to 17.45° in the 1:5 semi-supervised setting signifies an improvement in the geometric precision of the model's predictions. This suggests that the semi-supervised learning model is better at capturing the architectural features of buildings with more accurately defined angles, contributing to the production of topologically simpler and cleaner vector polygons.

Training on High-Performance Computing (HPC) Machine

HPC Configuration

Our training was conducted on a High-Performance Computing (HPC) machine equipped with substantial computational resources. The HPC had 8 nodes, each outfitted with 4 NVIDIA A100 GPUs with 40GB of VRAM, 64 CPU cores, and 256GB of RAM. For task scheduling, the system utilized Slurm.

PyTorch Lightning Framework

We employed the PyTorch Lightning framework, which offers user-friendly multi-GPU settings. This framework allows the specification of the number of GPUs per node, the total number of nodes, various distributed strategies, and the option for mixed-precision training.

Experiences with Slurm and PyTorch Lightning

When training on a single GPU, our Slurm configuration was as follows:
#SBATCH –partition=ngpu
#SBATCH –gres=gpu:1
#SBATCH –cpus-per-task=16
#SBATCH –mem=64000

In PyTorch Lightning, we set the trainer as: Trainer:

trainer = Trainer(accelerator=”gpu”, devices=1)

Since, here, we allocated one GPU from four available in one node, we allocated 16 CPUs from 64 available. Therefore, for the data loaders, we assigned 16 workers. Since semi-supervised learning uses two data loaders (one for labeled and one for unlabeled data), we allocated 8 workers to each. It was critical to ensure that the total number of cores for the data loaders did not exceed the available CPUs to prevent training crashes.

Distributed Data Parallel (DDP) Strategy

Using PyTorch Lightning's Distributed Data Parallel (DDP) option, we ensured each GPU across the nodes operated independently:

  • Each GPU processed a portion of the dataset.
  • All processes initiated the model independently.
  • Each conducted forward and backward passes in parallel.
  • Gradients were synchronized and averaged across processes.
  • Each process updated its optimizer individually.

With this approach, the total number of data loaders equaled the number of GPUs multiplied by the number of data loaders. For example, in a semi-supervised learning setup with 4 GPUs and two types of data loaders (labeled and unlabeled), we ended up with 8 data loaders, each with 8 workers – 64 workers in total.

To fully utilized one node with four GPU, we used following configurations:

#SBATCH –partition=ngpu

#SBATCH –gres=gpu:4

#SBATCH –exclusive

#SBATCH –cpus-per-task=64

#SBATCH –mem=256000

In PyTorch Lightning, we set the trainer as:

PyTorch Lightning Trainer, nastavíme nasledovne:

trainer = Trainer(accelerator=”gpu”, devices=4, strategy=”ddp”)

Utilizing Multiple Nodes

Using PyTorch Lighting, it is possible to leverage multiple nodes on HPC. For instance, using 4 nodes with 4 GPUs each (16 GPUs in total) was configured as:

trainer = Trainer(accelerator=”gpu”, devices=4, strategy=”ddp”, num_nodes=4)

Correspondingly, the Slurm configuration was set to:

#SBATCH –nodes=4

#SBATCH –ntasks-per-node=4

#SBATCH –gres=gpu:4

These settings and experiences highlight the scalability and flexibility of training complex machine learning models on an HPC environment, especially for tasks demanding significant computational resources like semi-supervised learning in geospatial data analysis.

Training Scalability Analysis

Tabuľka 3: Výsledky trénovania prístupov učenia s učiteľom a učenia s čiastočným učiteľom s 1, 2, 4 a 8 GPU. Pre každú konfiguráciu je uvedený čas na jednu epochu a pomer urýchlenia proti 1 GPU.

In the Training Scalability Analysis, we carefully examined the impact of expanding computational resources on the efficiency of training models, utilizing the PyTorch Lightning framework.
This investigation covered both supervised and semi-supervised learning approaches, with a particular emphasis on the effects of increasing GPU numbers, including setups involving 2 nodes (or 8 GPUs).

Figure 5: This graph compares the actual speedup ratios for supervised and semi-supervised learning against the number of GPUs, alongside the ideal linear speedup ratio. It showcases the closer alignment of semi-supervised learning with ideal scalability, emphasizing its greater efficiency gains from increased computational resources.

A key finding from this analysis was that the increase in speedup ratios for supervised learning did not perfectly align with the number of GPUs utilized. Ideally, doubling the number of GPUs would directly double the speedup ratio (e.g., using 4 GPUs would result in a 4x speedup). However, the actual speedup ratios were lower than this ideal expectation. This discrepancy can be attributed to the overhead associated with managing multiple GPUs and nodes, particularly the need to synchronize data across all GPUs, which introduces efficiency losses.

Učenie s čiastočným učiteľom ukázalo mierne iný trend, viac približujúci sa ideálnemu (lineárnemu) nárastu urýchlenia. Zdá sa, že komplexnosť a vyššie výpočtové nároky učenia s čiastočným učiteľom zmierňujú dopad overhead nákladov a tým umožňujú efektívnejšie využívanie viacerých GPU. Napriek výzvam spojeným so synchronizáciou dát cez viacero GPU kariet a výpočtových uzlov, vyššie výpočtové nároky učenia s čiastočným učiteľom umožňujú efektívnejšie škálovanie zdrojov, t.j. urýchlenie bližšie ideálnemu scenáru.

Conclusion

The research presented in this whitepaper has successfully demonstrated the effectiveness of integrating UniMatch semi-supervised learning with Frame Field learning for the task of building extraction from aerial imagery. This integration addresses the challenges associated with the scarcity of labeled data in deep learning applications for geographic information systems (GIS), providing a cost-effective and scalable solution.

Our findings reveal that employing semi-supervised learning significantly enhances the model's performance across several key metrics, including Intersection over Union (IoU), precision, recall, F1 Score, N Ratio, complexity-aware IoU (cIoU), and Mean Max Tangent Angle Error (MTAE). Notably, the improvements in IoU and cIoU metrics underscore the model's increased accuracy in delineating building footprints and generating vector shapes that closely resemble actual structures. This outcome is pivotal for applications in urban planning, environmental studies, and infrastructure management, where precise mapping and analysis of building data are crucial.

The methodology adopted, which combines Frame Field learning with the innovative UniMatch approach, has proven to be highly effective in leveraging both labeled and unlabeled data. This strategy not only improves the geometric precision of the model's predictions but also ensures the generation of cleaner, topologically accurate vector polygons. Furthermore, the scalability and efficiency of training on a High-Performance Computing (HPC) machine using the PyTorch Lightning framework and Distributed Data Parallel (DDP) strategy have been instrumental in handling the extensive computational demands of the semi-supervised learning process on the data at hand, within a time frame ranging from tens of minutes to hours.

Práca zdôrazňuje potenciál učenia s čiastočným učiteľom v zlepšovaní automatickej extrakcie budov z leteckých snímok. Implementácia UniMatch do Frame Field learning metódy predstavuje významný krok vpred, poskytujúc robustné riešenie pre výzvy spojené s nedostatkom dát a potreby vysokej presnosti geopriestorovej dátovej analýzy. Tento prístup zlepšuje efektívnosť a presnosť extrakcie budov, a taktiež otvára nové možnosti pre aplikácie metód učenia s čiastočným učiteľom v GIS a príbuzných oblastiach.

Acknowledgment

Research results were obtained with the support of the Slovak National competence centre for HPC, the EuroCC 2 project and Slovak National Supercomputing Centre under grant agreement 101101903-EuroCC 2-DIGITAL-EUROHPC-JU-2022-NCC-01.

Computational resources were procured in the national project National competence centre for high performance computing (project code: 311070AKF2) funded by European Regional Development Fund, EU Structural Funds Informatization of society, Operational Program Integrated Infrastructure.

Authors

Patrik Sabol – Geodeticca Vision s.r.o., Floriánska 19, 044 01 Košice, Slovakia

 Bibiána Lajčinová – National Supercomputing Center, Dúbravská cesta 3484/9, 84104 Bratislava-Karlová Ves, Slovakia

Full version of the article SK

Full version of the article EN

References:

[1] Nicolas Girard, Dmitriy Smirnov, Justin Solomon, and Yuliya Tarabalka. “Polygonal Building Extraction by Frame Field Learning”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2021), pp. 5891-5900.

[2] L. Yang, L. Qi, L. Feng, W. Zhang, and Y. Shi. “Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation”. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2023), pp. 7236-7246. doi: 10.1109/CVPR52729.2023.00699.

[3] Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, and Colin Raffel. “FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence”. In: CoRR, vol. abs/2001.07685 (2020). Available: https://arxiv.org/abs/2001.07685.

[4] Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat, and Pierre Alliez. “Can Semantic Labeling Methods Generalize to Any City? The Inria Aerial Image Labeling Benchmark”. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (2017). IEEE.

[5] Adrian Boguszewski, Dominik Batorski, Natalia Ziemba-Jankowska, Tomasz Dziedzic, and Anna Zambrzycka. “LandCover.ai: Dataset for Automatic Mapping of Buildings, Woodlands, Water and Roads from Aerial Imagery”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2021), pp. 1102-1110.

[6] “Ortofotomozaika.” Geoportal SK. Accessed February 14, 2024. https://www.geoportal.sk/sk/zbgis/ortofotomozaika/.

[7] Stefano Zorzi, Shabab Bazrafkan, Stefan Habenschuss, and Friedrich Fraundorfer. “PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 1848-1857.

 

 



BeeGFS in Practice — Parallel File Systems for HPC, AI and Data-Intensive Workloads 6 Feb - This webinar introduces BeeGFS, a leading parallel file system designed to support demanding HPC, AI, and data-intensive workloads. Experts from ThinkParQ will explain how parallel file systems work, how BeeGFS is architected, and how it is used in practice across academic, research, and industrial environments.
When a production line knows what will happen in 10 minutes 5 Feb - Every disruption on a production line creates stress. Machines stop, people wait, production slows down, and decisions must be made under pressure. In the food industry—especially in the production of filled pasta products, where the process follows a strictly sequential set of technological steps—one unexpected issue at the end of the line can bring the entire production flow to a halt. But what if the production line could warn in advance that a problem will occur in a few minutes? Or help decide, already during a shift, whether it still makes sense to plan packaging later the same day? These were exactly the questions that stood at the beginning of a research collaboration that brought together industrial data, artificial intelligence, and supercomputing power.
Who Owns AI Inside an Organisation? — Operational Responsibility 5 Feb - This webinar focuses on how organisations can define clear operational responsibility and ownership of AI systems in a proportionate and workable way. Drawing on hands-on experience in data protection, AI governance, and compliance, Petra Fernandes will explore governance approaches that work in practice for both SMEs and larger organisations. The session will highlight internal processes that help organisations stay in control of their AI systems over time, without creating unnecessary administrative burden.
Kategórie
Success-Stories

Named Entity Recognition for Address Extraction in Speech-to-Text Transcriptions Using Synthetic Data

Named Entity Recognition for Address Extraction in Speech-to-Text Transcriptions Using Synthetic Data

Many businesses spend large amounts of resources for communicating with clients. Usually, the goal is to provide clients with information, but sometimes there is also a need to request specific information from them. In addressing this need, there has been a significant effort put into the development of chatbots and voicebots, which on one hand serve the purpose of providing information to clients, but they can also be utilized to contact a client with a request to provide some information. A specific real-world example is to contact a client, via text or via phone, to update their postal address. The address may have possibly changed over time, so a business needs to update this information in its internal client database.

illustrative image

Nonetheless, when requesting such information through novel channels|like chatbots or voicebots| it is important to verify the validity and format of the address. In such cases, an address information usually comes by a free-form text input or as a speech-to-text transcription. Such inputs may contain substantial noise or variations in the address format. To this end it is necessary to lter out the noise and extract corresponding entities, which constitute the actual address. This process of extracting entities from an input text is known as Named Entity Recognition (NER). In our particular case we deal with the following entities: municipality name, street name, house number, and postal code. This technical report describes the development and evaluation of a NER system for extraction of such information.

Problem Description and Our Approach

This work is a joint effort of Slovak National Competence Center for High-Performance Computing and nettle, s.r.o., which is a Slovak-based start-up focusing on natural language processing, chatbots, and voicebots. Our goal is to develop highly accurate and reliable NER model for address parsing. The model accepts both free text as well as speech-to-text transcribed text. Our NER model constitutes an important building block in real-world customer care systems, which can be employed in various scenarios where address extraction is relevant.

The challenging aspect of this task was to handle data which was present exclusively in Slovak language. This makes our choice of a baseline model very limited. Currently, there are several publicly available NER models for the Slovak language. These models are based on the general purpose pre-trained model SlovakBERT [1]. Unfortunately, all these models support only a few entity types, while the support for entities relevant to address extraction is missing. A straightforward utilization of popular Large Language Models (LLMs) like GPT is not an option in our use cases because of data privacy concerns and time delays caused by calls to these rather time-consuming LLM APIs.

We propose a fine-tuning of SlovakBERT for NER. The NER task in our case is actually a classification task at the token level. We aim at achieving proficiency at address entities recognition with a tiny number of real-world examples available. In Section 2.1 we describe our dataset as well as a data creation process. The significant lack of available real-world data prompts us to generate synthetic data to cope with data scarcity. In Section 2.2 we propose SlovakBERT modifications in order to train it for our task. In Section 2.3 we explore iterative improvements in our data generation approach. Finally, we present model performance results in Section 3.

Data

The aim of the task is to recognize street names, house numbers, municipality names, and postal codes from the spoken sentences transcribed via speech-to-text. Only 69 instances of real-world collected data were available. Furthermore, all of those instances were highly affected by noise, e.g., natural speech hesitations and speech transcription glitches. Therefore, we use this data exclusively for testing. Table 1 shows two examples from the collected dataset.

Table 1: Two example instances from our collected real-world dataset. The Sentence column show- cases the original address text. The Tokenized text column contains tokenized sentence representation, and the Tags column contains tags for the corresponding tokens. Note here that not every instance necessarily contains all considered entity types. Some instances contain noise, while others have gram- mar/spelling mistakes: The token \ Dalsie" is not a part of an address and the street name \bauerova" is not capitalized.

Artificial generation of training dataset occurred as the only, but still viable option to tackle the problem of data shortage. Inspired by the 69 real instances, we programmatically conducted numerous external API calls to OpenAI to generate similar realistic-looking examples. BIO annotation scheme [2] was used to label the dataset. This scheme is a method used in NLP to annotate tokens in a sequence as the beginning (B), inside (I), or outside (O) of entities. We are using 9 annotations: O, B-Street, I-Street, B-Housenumber, I-Housenumber, B-Municipality, I-Municipality, B-Postcode, I-Postcode.

We generated data in multiple iterations as described below in Section 2.3. Our final training dataset consisted of more than 104 sentences/address examples. For data generation we used GPT3.5-turbo API along with some prompt engineering. Since the data generation through this API is limited by the number of tokens — both generated as well as prompt tokens—we could not pass the list of all possible Slovak street names and municipality names within the prompt. Hence, data was generated with placeholders streetname and municipalityname only to be subsequently replaced by randomly chosen street and municipality names from the list of street and municipality names, respectively. A complete list of Slovak street and municipality names was obtained from the web pages of the Ministry of Interior of the Slovak republic [3].

With the use of OpenAI API generative algorithm we were able to achieve organic sentences without the need to manually generate the data, which sped up the process significantly. However, employing this approach did not come without downsides. Many mistakes were present in the generated dataset, mainly wrong annotations occurred and those had to be corrected manually. The generated dataset was split, so that 80% was used for model’s training, 15% for validation and 5% as synthetic test data, so that we could compare the performance of the model on real test data as well as on artificial test data.

Model Development and Training

Two general-purpose pre-trained models were utilized and compared: SlovakBERT [1] and a distilled version of this model [4]. Herein we refer to the distilled version as DistilSlovakBERT. SlovakBERT is an open-source pretrained model on Slovak language using a Masked Language Modeling (MLM) objective. It was trained with a general Slovak web-based corpus, but it can be easily adapted to new domains to solve new tasks [1]. DistilSlovakBERT is a pre-trained model obtained from SlovakBERT model by a method called knowledge distillation, which significantly reduces the size of the model while retaining 97% of its language understanding capabilities.

We modified both models by adding a token classification layer, obtaining in both cases models suitable for NER tasks. The last classification layer consists of 9 neurons corresponding to 9 entity annotations: We have 4 address parts and each is represented by two annotations – beginning and inside of each entity, and one for the absence of any entity. The number of parameters for each model and its components are summarized in Table 2.

Table 2: The number of parameters in our two NER models and their respective counts for the base model and the classication head.

Models’ training was highly susceptible to overfitting. To tackle this and further enhance the training process we used linear learning rate scheduler, weight decay strategies, and some other hyperparameter tuning strategies.

Computing resources of the HPC system Devana, operated by the Computing Centre, Centre of operations of the Slovak Academy of Sciences were leveraged for model training, specifically utilizing a GPU node with 1 NVidia A100 GPU. For a more convenient data analysis and debugging, an interactive environment using OpenOnDemand was employed, which allows researches remote web access to supercomputers.

The training process required only 10-20 epochs to converge for both models. Using the described HPC setting, one epoch’s training time was on average 20 seconds for 9492 samples in the training dataset for SlovakBERT and 12 seconds for DistilSlovakBERT. Inference on 69 samples takes 0.64 seconds for SlovakBERT and 0.37 seconds for DistilSlovakBERT, which demonstrates model’s efficiency in real-time NLP pipelines.

Iterative Improvements

Although only 69 instances of real data were present, the complexity of it was quite challenging to imitate in generated data. The generated dataset was created using several different prompts, resulting in 11,306 sentences that resembled human-generated content. The work consisted of a number of iterations. Each iteration can be split into the following steps: generate data, train a model, visualize obtained prediction errors on real and artificial test datasets, and analyze. This way we identified patterns that the model failed to recognize. Based on these insights we generated new data that followed these newly identified patterns. The patterns we devised in various iterations are presented in Table 3. With each newly expanded dataset both of our models were trained, with SlovakBERT’s accuracy always exceeding the one of DistilSlovakBERT’s. Therefore, we have decided to further utilize only SlovakBERT as a base model.

Results

The confusion matrix corresponding to the results obtained using model trained in Iteration 1 (see Table 3)—is displayed in Table 4. This model was able to correctly recognize only 67.51% of entities in test dataset. Granular examination of errors revealed that training dataset does not represent the real-world sentences well enough and there is high need to generate more and better representative data. In Table 4 it is evident, that the most common error was identification of a municipality as a street. We noticed that this occurred when municipality name appeared before the street name in the address. As a result, this led to data generation with Iteration 2 and Iteration 3.

Table 3: The iterative improvements of data generation. Each prompt was used twice: First with and then without noise, i.e., natural human speech hesitations. Sometimes, if mentioned, prompt allowed to shue or omit some address parts.

This process of detailed analysis of prediction errors and subsequent data generation accounts for most of the improvements in the accuracy of our model. The goal was to achieve more than 90% accuracy on test data. Model’s predictive accuracy kept increasing with systematic data generation. Eventually, the whole dataset was duplicated, with the duplicities being in uppercase/lowercase. (The utilized pre-trained model is case sensitive and some test instances contained street and municipality names in lowercase.) This made the model more robust to the form in which it receives input and led to final accuracy of 93.06%. Confusion matrix of the final model can be seen in Table 5.

Table 4: Confusion matrix of model trained on dataset from the rst iteration, reaching model's predictive accuracy of 67.51%.
Table 5: Confusion matrix of the nal model with the predictive accuracy of 93.06%. Comparing the results to the results in Table 4, we can see that the accuracy increased by 25.55%.

There are still some errors; notably, tokens that should have been tagged as outside were occasionally misclassified as municipality. We have opted not to tackle this issue further, as it happens on words that may resemble subparts of our entity names, but, in reality, do not represent entities themselves. See an example below in Table 6.

Table 6: Examples of the nal model's predictions for two test sentences. The rst sentence contains one incorrectly classied token: the third token \Kal" with ground truth label O was predicted as B-Municipality. The misclassication of \Kal" as a municipality occurred due to its similarity to subwords found in \Kalsa", but ground truth labeling was based on context and authors' judgment. The second sentence has all its tokens classied correctly.

Conclusions

In this technical report we trained a NER model built upon SlovakBERT pre-trained LLM model as the base. The model was trained and validated exclusively on artificially generated dataset. This well representative and high quality synthetic data was iteratively expanded. Together with hyperparameter fine-tuning this iterative approach allowed us to reach predictive accuracy on real dataset exceeding 90%. Since the real dataset contained a mere 69 instances, we decided to use it only for testing. Despite the limited amount of real data, our model exhibits promising performance. This approach emphasizes the potential of using exclusively synthetic dataset, especially in cases where the amount of real data is not sufficient for training.

This model can be utilized in real-world applications within NLP pipelines to extract and verify the correctness of addresses transcribed by speech-to-text mechanisms. In case a larger real-world dataset is available, we recommend to retrain the model and possibly also expand the synthetic dataset with more generated data, as the existing dataset might not represent potentially new occurring data patterns. This model can be utilized in real-world applications within NLP pipelines to extract and verify the correctness of addresses transcribed by speech-to-text mechanisms. In case a larger real-world dataset is available, we recommend to retrain the model and possibly also expand the synthetic dataset with more generated data, as the existing dataset might not represent potentially new occurring data patterns.
The model is available on https://huggingface.co/nettle-ai/slovakbert-address-ner

Acknowledgement

The research results were obtained with the support of the Slovak National competence centre for HPC, the EuroCC 2 project and Slovak National Supercomputing Centre under grant agreement 101101903-EuroCC 2-DIGITAL-EUROHPC-JU-2022-NCC-01.

AUTHORS

Bibiána Lajčinová – Slovak National Supercomputing Centre

Patrik Valábek – Slovak National Supercomputing Centre, ) Institute of Information Engineering, Automation, and Mathematics, Slovak University of Technology in Bratislava

Michal Spišiak – nettle, s. r. o.

Full version of the article SK
Full version of the article EN

References::

[1] Matús Pikuliak, Stefan Grivalsky, Martin Konopka, Miroslav Blsták, Martin Tamajka, Viktor Bachratý, Marián Simko, Pavol Balázik, Michal Trnka, and Filip Uhlárik. Slovakbert: Slovak masked language model. CoRR, abs/2109.15254, 2021.

[2] Lance Ramshaw and Mitch Marcus. Text chunking using transformation-based learning. In Third Workshop on Very Large Corpora, 1995.

[3] Ministerstvo vnútra Slovenskej republiky. Register adries. https://data.gov.sk/dataset/register-adries-register-ulic. Accessed: August 21, 2023.

[4] Ivan Agarský. Hugging face model hub. https://huggingface.co/crabz/distil-slovakbert, 2022. Accessed: September 15, 2023.


When a production line knows what will happen in 10 minutes 5 Feb - Every disruption on a production line creates stress. Machines stop, people wait, production slows down, and decisions must be made under pressure. In the food industry—especially in the production of filled pasta products, where the process follows a strictly sequential set of technological steps—one unexpected issue at the end of the line can bring the entire production flow to a halt. But what if the production line could warn in advance that a problem will occur in a few minutes? Or help decide, already during a shift, whether it still makes sense to plan packaging later the same day? These were exactly the questions that stood at the beginning of a research collaboration that brought together industrial data, artificial intelligence, and supercomputing power.
AI pomáha zachraňovať ženské životy 17 Dec - Strach z rakoviny prsníka je tichým spoločníkom mnohých žien. Stačí jedno pozvanie na preventívne vyšetrenie, jeden telefonát od lekára či jedno čakanie na výsledky – a myseľ je plná otázok: „Som v poriadku?“ „Čo ak nie?“ „Môže sa niečo prehliadnuť?“ Aj keď skríning potvrdí negatívny nález, obavy často pretrvávajú.
Budúcnosť pôdy ukrytá v dátach  5 Nov - High-Performance Computing (HPC) offers researchers the ability to process enormous volumes of data and uncover connections that would otherwise remain hidden. Today, it is no longer just a tool for technical disciplines – it is increasingly valuable in social and environmental research as well. A great example is a project that harnessed the power of HPC to gain deeper insight into the relationship between humans, soil, and the landscape.
Kategórie
Success-Stories

Anomaly Detection in Time Series Data: Gambling prevention using Deep Learning

Anomaly Detection in Time Series Data: Gambling prevention using Deep Learning

Gambling prevention of online casino players is a challenging ambition with positive impacts both on player’s well-being, and for casino providers aiming for responsible gambling. To facilitate this, we propose an unsupervised deep learning method with an objective to identify players showing signs of problem gambling based on available data in a form of time series. We compare the transformer-based autoencoder architecture for anomaly detection proposed by us with recurrent neural network and convolutional neural network autoencoder architectures and highlight its advantages. Due to the fact that the players’ clinical diagnosis was not part of the data at hand, we evaluated the outcome of our study by analyzing correlation of anomaly scores obtained from the autoencoder and several proxy indicators associated with the problem gambling reported in the literature.

illustrative image

Gambling prevention of players with problem or pathological gambling, currently conceptualized as a behavioural pattern where individuals stake an object of value (typically money) on the uncertain prospect of a larger reward [1], [2], is of high societal importance. Research over the past decade has revealed multiple similarities between pathological gambling and the substance use disorders [3]. With the high accessibility of the Internet, the incidence of pathological gambling has increased. This disorder can result in significant negative consequences for the affected individual and his/her family too. Therefore detecting early warning signs of problem gambling is crucial for maintaining player’s wellbeing. This work is a joint effort of Slovak National Competence Center for High-performance Computing, DOXXbet, ltd. – sports betting and online casino, and Codium, ltd. – software developer of the DOXXbet sports betting and iGaming platform, with the goal to enhance customer service and players’ engagement via identification and prevention of gambling behaviour. This proof of concept is a foundation for future tools, which will help casino mitigate negative consequences for players, even for a price of less provision for the provider, as in line with European trends in risk management related to problem gambling.

In our study we propose a completely unsupervised deep learning approach using transformer-based AE architecture to detect anomalies in the dataset - players with anomalous behaviour. The dataset at hand does not comprehend the clinical diagnosis, and amongst other proxy indicators mentioned before only few are available - requests to increase spending limits, chasing losses by gambling more (referred to as chasing episodes later in this article), usage of multiple payment methods, frequent withdrawals of small amount of money and other mentioned later in the text. Clearly, not all the anomalous users must necessarily have problem gambling, hence the proxy indicators are used in combination with AE results, namely the anomaly score. The foundation of our approach rests on the idea that a compulsive gambler is an anomaly within the active casino players, with the literature mentioning their fraction amongst all players being between 0.5% to 5% for chancebased games.

Data

The data acquired for this research consist of sequences of data points collected over time, tracking multiple aspects of player’s behaviour such as frequency and timing of their gaming activities, frequency and amount of cash deposits, payment methods used when depositing cash, information about the bets, wins, losses, withdrawals and requests for change of deposit limit. Feature engineering resulted in 19 features in a form of time series (TS), so that each feature consists of multiple time stamps. These features can be classified into three categories - ”time”, ”money” and ”despair”, as inspired by Seth et al. [7]. Table 1 summarizes the full set of TS features with a short explanation. Each feature is a sequence of N values, where each value stands for one out of N consecutive time windows. This value was produced by aggregating daily data in the respective time window, with the time window length being specified in the Table 1 together with the information about the time window being sliding or not. Hence, for each sample we needed a history of N time windows. Feature engineering procedure is displayed in Figure 1 and the final data shape is depicted in Figure 2.

Figure 1: Visualization of the data aggregation from daily basis into time windows, and eventually to TS features. t1, …, t450 represent time stamps for daily data x1, ..., x450. Daily data points from a time window are aggregated into a single value zi for all i ∈ (1, . . . , 8).1, …, x450. Denné záznamy z časového okna sú agregované do jednej hodnoty zi pre všetky i ∈ (1, . . . , 8).
Figure 2: Final data shape obtained after feature engineering. Each sample is represented by 19 features consisting of 8 time windows.

AE models comparison

Autoencoder is a "self-supervised" deep learning method suitable for anomaly detection in the Czech Republic. The idea behind using this type of neural network for anomaly detection is based on the model's reconstruction capability. AE learns to reconstruct the data in the training set and since the training set should ideally only contain "normal" observations, the model learns to reconstruct only such observations correctly. Therefore, when the input observation is anomalous, the trained AE model cannot reconstruct this input sufficiently correctly, resulting in a high reconstruction error. This reconstruction error can be used as an anomaly score for the given observation, where a higher score means a higher probability that the observation deviates from the general trend.

In the study, we trained an AE model based on transformers, where both the encoder and decoder contain a layer called "Multi Head Attention" with four "heads" and 32-dimensional key and value vectors. This layer is followed by a classical neural network with so-called "dropout" layers and residual connections. The entire AE model has just over 100k trainable parameters.

Reconstruction loss and Prediction ability

We performed a 3-fold cross-validation by splitting the data into training, validation, and test sets, and trained the models for each split to assess their stability. Resulting average loss values and their variances are displayed in the Table 3. The average reconstruction error of Transformer model is significantly lower than all the other models. LSTM B model comes second in the reconstruction performance and CNN model seems to have the worst prediction performance. Generally, the test loss is observed to be always higher than train and validation losses. The reason for this is that those 211 data points that were removed from the training set in the data cleaning process, were moved to the test set. Without moving these samples, the test loss for transformer-based model would be as low as 0.012, for CNN model 0.33, for LSTM A model 0.27, and for LSTM B model 0.13. More detailed overview of the models’ performance is displayed on the Figure 6 as histograms of loss values of the test set. All histograms have heavy right tail, which is expected for datasets containing anomalies.

Figure 3: Reconstruction error histograms of the transformer-based AE model for the test set. On the x-axis is the value of the anomaly score and on the y-axis is the frequency of the corresponding value.

To demonstrate the quality of the CR reconstruction, the original (blue line) and predicted (red line) values for a randomly selected anomalous observation of one player are shown in Figure 4. The value of the anomaly score for the respective models is given in the caption of the graphs.

Figure 4: Comparison of the predictive ability of AE models. All models reconstructed the same observation coming from the test set. Predictive ability: the blue line represents the input data, the red reconstruction obtained using the transformer-based AE model. The number shown in the graph header represents the anomaly score for that data sample.

Results

Since clinical diagnosis was not part of the data we had, we can only rely on auxiliary indicators to identify players with potentially problem gambling. We approached this task by detecting anomalies in the data, but we are aware that not all anomalies necessarily indicate a gambling problem. Therefore, we will correlate the results of the AE model with the following auxiliary indicators:

  • Mean number of logins in a time window.
  • Mean number of withdrawals in a time window.
  • Mean number of small and frequent withdrawals in a time window.
  • Mean number of requests for the change of the deposit limit in a time window.
  • Sum of the chasing episodes in the time slot of N time window

Figure 5 depicts the correlation of the anomaly score with the proxy indicators. Each subplot contains 10 bars, each bar representing one decile of the data samples (i.e. each bar represents 10% of data samples sorted by anomaly score). The bar colors represent the category value of the respective proxy indicator.

(a)
(b)
(c)
(d)
(e)
Figure 5: Each bar in the graphs represents one decile of the anomaly score (MSE). The colors represent the categories of the relevant auxiliary indicators, with category values specified in the legend.

A distinctive pattern in players’ behavior can be observed, where players with larger anomaly scores tend to exhibit high values for all the indicators evaluated. Higher frequency of logins is proportionate to higher anomaly score with more than half of the players in the last decile of reconstruction error having a mean number of logins in a time window greater than 50. The same applies for mean number of cash withdrawals in a time window. Players with low anomaly score have almost none or very few withdrawals, whilst more than one fourth of players in the last anomaly score decile have two or more withdrawals in a time window on average. Another secondary indicator we utilize is the number of small and frequent withdrawals. Most of the players with at least one of these events is in 10% of players with the highest MSE. When analyzing another indicator, namely the number of requests for a deposit limit change, we observe a more subtle pattern. It is evident that players in the first five deciles generally have no requests for a limit change (with very few exceptions), while as the anomaly score increases, the frequency of limit change requests also tends to rise. The last proxy indicator depicted is the number of chasing episodes. A rising frequency of these events proportionate to their anomaly score can be observed. More than half of the players in the last decile have at least one chasing episode in the time window.

If these plots are overlapped in order to identify the portion of players fulfilling multiple proxy indicators, following observations result: in the last five percentiles of the anomaly scores 98.6% of players satisfy at least one proxy indicator, and 77.3% satisfy at least three indicators. As for the last two percentiles, so 2% of players with the highest reconstruction error, almost 90% of them satisfy at least three indicators. The thresholds used to calculate these proportion are >= 1 chasing episode, >= 1 limit change, >= 1 small and frequent withdrawal, >= 31 logins and >= 1.25 withdrawal on average per time window.

Conclusion

In this work, we successfully applied a transformer-based autoencoder (AE) to detect anomalies in the dataset of online casino players. The aim was to detect problem gamblers in dataset at hand in an unsupervised manner. 19 features were derived from the raw time series (TS) data reflecting players’ behavior in the context of time, money and despair. We compared the performance of this architecture with three other AE architectures based on LSTM and convolutional layers and found that the transformer-based AE achieved the best results amongst the four models in terms of reconstruction capability. This model also showcases high correlation with proxy indicators such as the number of logins, number of player’s withdrawals, number of chasing episodes and other, that are commonly mentioned in literature in relation to the gambling disorder. This alignment of AE’s anomaly score with proxy indicators enables us to gain insights into prediction’s effectiveness in identifying players with potential problem gambling. Even though these proxy indicators were also used as predictors, we suggest to use them as a secondary check when detecting players with potential problem gambling in order to avoid false positives, as not all anomalies must be linked to the condition of gambling disorder. Our findings demonstrate the potential of transformer-based AEs for unsupervised anomaly detection tasks in TS data, particularly in the context of online casino player behavior.

Full version of the article

References::

[1] Alex Blaszczynski and Lia Nower. “A Pathways Model of Problem and Pathological Gambling”. In: Addiction (Abingdon, England) 97 (June 2002), pp. 487–99. doi: 10.1046/j.1360-0443.2002.00015.x.

[2] National Research Council. Pathological Gambling: A Critical Review. Washington, DC: The National Academies Press, 1999. isbn: 978-0-309-06571-9. doi: 10 . 17226 / 6329. url: https ://nap .nationalacademies.org/catalog/6329/pathological – gambling – a – critical -review.

[3] Luke Clark et al. “Pathological Choice: The Neuroscience of Gambling and Gambling Addiction”. In: Journal of Neuroscience 33.45 (2013), pp. 17617–17623. issn: 0270-6474. doi:  0.1523/JNEUROSCI.3231-13.2013.eprint: https : / / www . jneurosci . org /content / 33 / 45 / 17617 . full . pdf. url: https://www.jneurosci.org/content/33/45/17617.

[4] Deepanshi Seth et al. “A Deep Learning Framework for Ensuring Responsible Play in Skill-based Cash Gaming”. In: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) (2020), pp. 454–459.


When a production line knows what will happen in 10 minutes 5 Feb - Every disruption on a production line creates stress. Machines stop, people wait, production slows down, and decisions must be made under pressure. In the food industry—especially in the production of filled pasta products, where the process follows a strictly sequential set of technological steps—one unexpected issue at the end of the line can bring the entire production flow to a halt. But what if the production line could warn in advance that a problem will occur in a few minutes? Or help decide, already during a shift, whether it still makes sense to plan packaging later the same day? These were exactly the questions that stood at the beginning of a research collaboration that brought together industrial data, artificial intelligence, and supercomputing power.
AI pomáha zachraňovať ženské životy 17 Dec - Strach z rakoviny prsníka je tichým spoločníkom mnohých žien. Stačí jedno pozvanie na preventívne vyšetrenie, jeden telefonát od lekára či jedno čakanie na výsledky – a myseľ je plná otázok: „Som v poriadku?“ „Čo ak nie?“ „Môže sa niečo prehliadnuť?“ Aj keď skríning potvrdí negatívny nález, obavy často pretrvávajú.
Budúcnosť pôdy ukrytá v dátach  5 Nov - High-Performance Computing (HPC) offers researchers the ability to process enormous volumes of data and uncover connections that would otherwise remain hidden. Today, it is no longer just a tool for technical disciplines – it is increasingly valuable in social and environmental research as well. A great example is a project that harnessed the power of HPC to gain deeper insight into the relationship between humans, soil, and the landscape.
Kategórie
Success-Stories

Measurement of microcapsule structural parameters using artificial intelligence (AI) and machine learning (ML)

Measurement of microcapsule structural parameters using artificial intelligence (AI) and machine learning (ML)

The main aim of collaboration between the National Competence Centre for HPC (NCC HPC) and the Institute of Polymers of SAV (IP SAV) was design and implementation of a pilot software solution for automatic processing of polymer microcapsules images using artificial intelligence (AI) and machine learning (ML) approach. The microcapsules consist of semi-permeable polymeric membrane which was developed at the IP SAV.

illustrative image

Automatic image processing has several benefits for IP SAV. It will save time since manual measurement of microcapsule structural parameters is time-consuming due to a huge number of images produced during the process. In addition, the automatic image processing will minimize the errors which are inevitably connected with manual measurements. The images from optical microscope obtained with 4.0 zoom usually contain one or more microcapsules, and they represent an input for AI/ML process. On the other hand, the images from optical microscope obtained with 2.5 zoom usually contain (three to seven) microcapsules. Herein, a detection of the particular microcapsule is essential.

The images from optical microscope are processed in two steps. The first one is a localization and detection of the microcapsule, the second one consists of a series of operations leading to obtaining structural parameters of the microcapsules.

Microcapsule detection

YOLOv5 model with pre-trained weights from COCO128 dataset was employed for microcapsule detection. Training set consisted of 96 images, which were manually annotated using graphical image annotation tool LabelImg [3]. Training unit consisted of 300 epochs, images were subdivided into 6 batches per 16 images and the image size was set to 640 pixels. Computational time of one training unit on the NVIDIA GeForce GTX 1650 GPU was approximately 3.5 hours.

The detection using the trained YOLOv5 model is presented in Figure 1. The reliability of the trained model, verified on 12 images, was 96%, with the throughput on the same graphics card being approximately 40 frames per second.

Figure 1: (a) microcapsule image from optical microscope (b) detected microcapsule (c) cropped detected microcapsule for 4.0 zoom, (d) microcapsule image from optical microscope (e) detected microcapsule (f) cropped detected microcapsule for 2.5 zoom.

Measurement of microcapsule structural parameters using AI/ML

The binary masks of inner and outer membrane of the microcapsules are created individually, as an output from the deep-learning neural network of the U-Net architecture [4]. This neural network was developed for image processing in biomedicine applications. The first training set for the U-Net neural network consisted of 140 images obtained from 4.0 zoom with the corresponding masks and the second set consisted of 140 images obtained from 2.5 zoom with the corresponding masks. The training unit consisted of 200 epochs, images were subdivided into 7 batches per 20 images and the image size was set to 1280 pixels (4.0 zoom) or 640 pixels (2.5 zoom). The 10% of the images were used for validation. Reliability of the trained model, verified on 20 images, exceeded 96%. Training process lasted less than 2 hours on the HPC system with IBM Power 7 type nodes, and it had to be repeated several times. Obtained binary masks were subsequently post-processed using fill-holes [5] and watershed [6] operations, to get rid of the unwanted residues. Subsequently, the binary masks were fitted with an ellipse using scikit-image measure library [7]. First and second principal axis of the fitted ellipse are used for the calculation of the microcapsule structural parameters. An example of inner and outer binary masks, and the fitted ellipses is shown in Figure 2.

Figure 2: (a) input image from optical microscope (b) inner binary mask (c) outer binary mask (d) output image with fitted ellipses.

Structural parameters obtained by our AI/ML approach (denoted as “U-Net“) were compared to the ones obtained by manual measurements performed at the IP SAV. A different model (denoted as “Retinex”) was used as another independent source of reference data. The Retinex approach was implemented by RNDR. Andrej Lúčny, PhD. from the Department of Applied Informatics of the Faculty of Mathematics, Physics and Informatics in Bratislava. This approach is not based on the AI/ML, the ellipse fitting is performed by the aggregation of line elements with low curvature using so-called retinex filler [8]. The Retinex approach is a good reference due to its relatively high precision, but it is not fully automatic, especially for the inner membrane of the microcapsule.

Figure 3 summarizes a comparison between the three approaches (U-Net, Retinex, UP SAV) to obtain the 4.0 zoom microcapsule structural parameters.

(a)
(b)
(c)

Figure 3: (a) microcapsule diameter for different batches (b) difference between the diameters of the fitted ellipse (first principal axis) and microcapsule (c) difference between the diameters of the fitted ellipse (second principal axis) and microcapsule. Red lines in (b) and (c) represents the threshold given by IP SAV. The images were obtained using 4.0 zoom.

All obtained results, except 4 images of batch 194 (ca 1.5%), are within the threshold defined by the IP SAV. As can be seen from Figure 3(a), the microcapsule diameters calculated using U-net and Retinex are in a good agreement to each other. The U-Net model performance can be significantly improved in future, either by the training set expansion or by additional post-processing. The agreement between the manual measurement and the U-Net/Retinex may be further improved by unifying the method of obtaining microcapsule structural parameters from binary masks.

The AI/ML model will be available as a cloud solution on the HPC systems of CSČ SAV. Additional investment into the HPC infrastructure of IP SAV will not be necessary. Production phase, which goes beyond the scope of the pilot solution, accounts for an integration of this approach into the desktop application.

References::

[1] https://github.com/ultralytics/yolov5

[2] https://www.kaggle.com/ultralytics/coco128

[3] https://github.com/heartexlabs/labelImg

[4] https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/

[5] https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.binary_fill_holes.html

[6] https://scikit-image.org/docs/stable/auto_examples/segmentation/plot_watershed.html

[7] https://scikit-image.org/docs/stable/api/skimage.measure.html

[8] D.J. Jobson, Z. Rahman, G.A. Woodell, IEEE Transactions on Image Processing 6 (7) 965-976, 1997.


When a production line knows what will happen in 10 minutes 5 Feb - Every disruption on a production line creates stress. Machines stop, people wait, production slows down, and decisions must be made under pressure. In the food industry—especially in the production of filled pasta products, where the process follows a strictly sequential set of technological steps—one unexpected issue at the end of the line can bring the entire production flow to a halt. But what if the production line could warn in advance that a problem will occur in a few minutes? Or help decide, already during a shift, whether it still makes sense to plan packaging later the same day? These were exactly the questions that stood at the beginning of a research collaboration that brought together industrial data, artificial intelligence, and supercomputing power.
AI pomáha zachraňovať ženské životy 17 Dec - Strach z rakoviny prsníka je tichým spoločníkom mnohých žien. Stačí jedno pozvanie na preventívne vyšetrenie, jeden telefonát od lekára či jedno čakanie na výsledky – a myseľ je plná otázok: „Som v poriadku?“ „Čo ak nie?“ „Môže sa niečo prehliadnuť?“ Aj keď skríning potvrdí negatívny nález, obavy často pretrvávajú.
Budúcnosť pôdy ukrytá v dátach  5 Nov - High-Performance Computing (HPC) offers researchers the ability to process enormous volumes of data and uncover connections that would otherwise remain hidden. Today, it is no longer just a tool for technical disciplines – it is increasingly valuable in social and environmental research as well. A great example is a project that harnessed the power of HPC to gain deeper insight into the relationship between humans, soil, and the landscape.
Kategórie
Success-Stories

Use case: Transfer and optimization of CFD calculations workflow in HPC environment

Use case: Transfer and optimization of CFD calculations workflow in HPC environment

Authors: Ján Škoviera (National competence centre for HPC), Sylvain Suzan (Shark Aero)

Shark Aero company designs and manufactures ultralight sport aircrafts with two-seat tandem cockpit. For design development they use popular open-source software package openFOAM [1]. The CFD (Computational Fluid Dynamics) simulations use the Finite Elements Method (FEM). After the model is created, using a Computer-Aided Design (CAD) software, it is divided into discrete cells, so called “mesh”. The simulation accuracy depends strongly on mesh density with the computational and memory requirements rising with the 3rd power of the number of mesh vertices. For some simulations the computational demands can be a limiting factor. Workflow transfer into High-Performance Computing (HPC) environment was thus undertaken, with a special focus on the investigation of computational tasks parallelization efficiency for a given model type.

METHODS

Compute nodes with 2x6 cores Intel Xeon L5640 @ 2,27GHz, 48 GB RAM and 2x500 GB were used for this project. All calculations were done in a standard HPC environment using Slurm job scheduling system. This is an acceptable solution for this type of workloads where no real-time response, nor immediate data processing is required. For the CFD simulations we continued to use OpenFOAM & ParaView version 9 software packages. Singularity container was used for calculation deployment, having in mind potential transfer of the workload to another HPC system. The speed-up gained from just straight away transfer to HPC system was approximately 1.5x compared to a standard laptop.

PARALLLEZIATION

Parallelized task execution can increase the speed of the overall calculation by utilizing more computing units concurrently. In order to parallelize the task one needs to divide the original mesh into domains - parts that will be processed concurrently. The domains, however, need to communicate through the processor boundaries i.e. domain sides where the original enclosing mesh was divided. The larger the processor boundary surface is, the more I/O is required in order to resolve the boundary conditions. Processor boundary communication is facilitated by the distributed memory Message Passing Interface (MPI) protocol, and the distinction of difference between CPU cores and different compute nodes is abstracted from user. This leads to certain limitations on efficient usage of many parallel processes, since overly parallelized job executions can be actually slower due to communication and I/O bottlenecks. Therefore, the domains should be created in a way that minimizes the processor boundaries. One possible strategy is to divide the original mesh only in co-planar direction with the smallest side of the original enclosing mesh. By careless division into domains the amount of data to be transferred increases beyond reasonable measure. If one chooses to use mesh division in multiple axes, one also creates more processor boundaries.

Figure 1: Illustration of mesh segmentation. The encoling mesh is represented by the transparent boxes

The calculations were done in four steps: enclosing mesh creation, mesh segmentation, model inclusion and CFD simulation. The enclosing mesh creation was done using the blockMesh utility, the mesh segmentation step was done using the decomposePar utility, the model inclusion was done using the snappyHexMesh program, and the CFD simulation itself was done using SimpleFoam. The most computationally demanding step is snappyHexMesh. This is understandable from the fact that while in CFD simulation the calculation needs to be done several times for every edge of the mesh and every iteration, in the case of model inclusion one creates new vertices and deletes old ones based on the position of vertices in the model mesh. This requires creation of an “octree” (partitioning of three-dimensional space by recursively subdividing it into eight octants), repeated inverse search, and octree re-balancing. Each of these processes is N*log(N) in the best case scenario, and N2 in the worst case, N being the number of vertices. The CFD itself scales linearly with number of edges, i.e. “close to” linearly with N (only spatially proximate nodes are interconnected).2 We developed a workflow that creates a number of domains that can be directly parallelized with the yz plane (x being the axis of the aircraft nose), which simplifies the decision making. After inclusion of a new model, one can simply specify the number of domains and run the calculation minimizing the human intervention needed to parallelize the calculation.

RESULTS AND CONCLUSION

The relative speedup of the processes calculation is mainly determined by limited I/O. If the computational tasks are well below I/O bounding, the speed is inversely proportional to the number of domains. In less demanding calculations, i.e. for small models, the processes can be easily over-parallelized.

Figure 2: Dependence of real elapsed time on the number of processes for snappyHexMesh and simpleFoam. In the case of simpleFoam the time starts to diverge for more than 8 processes, since the data trafic overcomes the paralellization advantage. Ideal scaling shows the theoretical time needed to finish the calculation, if the data trafic and processor boundary condition resolution was not involved.

Once the mesh density is high enough, the time to calculate the CFD step is also inversely proportional to the number of parallel processes. As shown in the second pair of figures with twofold increase in mesh density, the calculations are below I/O bounding even in the CFD step. Even though the CFD step is in this case comparatively fast to the meshing process, the calculation of long time intervals could make it the most time consuming step.

The aircraft parts design requires simulations of a relatively small models multiple times under altering conditions. The mesh density needed for these simulations falls into medium category. When transferring the calculations to the HPC environment, we had to take into account the real needs of the end user in terms of model size, mesh density and result precision required. There are several advantages of using HPC:

  • The end user is relieved of the need to maintain his own computational capacities.
  • Even when restricted to single thread jobs the simulations can be offloaded to HPC with high speed up, making even very demanding and precise calculations feasible.
  • For even more effective calculations a simple way of utilizing parallelization was determined, for this particular workload. Limitations of parallel runs for the given use case and conditions were identified. The total increase in speed that was reached in practical conditions is 7.3 times. The speed-up generally grows with the calculation complexity and the mesh precision.


When a production line knows what will happen in 10 minutes 5 Feb - Every disruption on a production line creates stress. Machines stop, people wait, production slows down, and decisions must be made under pressure. In the food industry—especially in the production of filled pasta products, where the process follows a strictly sequential set of technological steps—one unexpected issue at the end of the line can bring the entire production flow to a halt. But what if the production line could warn in advance that a problem will occur in a few minutes? Or help decide, already during a shift, whether it still makes sense to plan packaging later the same day? These were exactly the questions that stood at the beginning of a research collaboration that brought together industrial data, artificial intelligence, and supercomputing power.
AI pomáha zachraňovať ženské životy 17 Dec - Strach z rakoviny prsníka je tichým spoločníkom mnohých žien. Stačí jedno pozvanie na preventívne vyšetrenie, jeden telefonát od lekára či jedno čakanie na výsledky – a myseľ je plná otázok: „Som v poriadku?“ „Čo ak nie?“ „Môže sa niečo prehliadnuť?“ Aj keď skríning potvrdí negatívny nález, obavy často pretrvávajú.
Budúcnosť pôdy ukrytá v dátach  5 Nov - High-Performance Computing (HPC) offers researchers the ability to process enormous volumes of data and uncover connections that would otherwise remain hidden. Today, it is no longer just a tool for technical disciplines – it is increasingly valuable in social and environmental research as well. A great example is a project that harnessed the power of HPC to gain deeper insight into the relationship between humans, soil, and the landscape.
Kategórie
Success-Stories

MEMO98

MEMO98

MEMO98 is a non-profit non-government organisation that has been monitoring the media in context of elections and other events for more than 20 years, and has carried out its activities in more than 50 countries. Recently, the organisation has also been dealing with the impact of social media on the integrity of electoral processes.

The information environment has significantly changed in recent years, especially due to the advent of social media. Apart from some positive aspects, such as the enhanced possibilities of receiving and sharing information, social media has also enabled the dissemination of misinformation to a wide audience quickly and at low cost. MEMO98 analysed the election campaign of the parliamentary elections held on July 11, 2021 in Moldova on five social media platforms: Facebook, Instagram, Odnoklassniki, Telegram and YouTube.

Social media data was collected using CrowdTangle (a Facebook-owned social media analysis tool). The number of posts interactions of candidates and individual political parties on Facebook alone was 1.82 million. The number of posts interactions of party chairmen climbed to 1.09 million. Prior to the start of this project, MEMO98 had no experience with using tools for big data processing and analysis. NCC experts helped design a solution for data processing and visualization utilizing the freely available software Gephi [1] in the HPC environment. The output is a so-called network map, an interactive scheme for finding and analysing the dissemination of specific terms and web addresses in the context of the election campaign. As part of the project, NCC also provided access to computing resources for solution testing, as well as individual training so that MEMO98 can work independently with this solution in the HPC environment in the future.

Preliminary results and conclusions of the monitoring are published by MEMO98 on its website [2].

References


[1] Bastian M., Heymann S., Jacomy M. (2009). Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media.

[2] Network mapping, Moldova Early Parliamentary Elections July 2021, Monitoring of Social Media – Preliminary Findings. Available here:

https://memo98.sk/article/moldovan-social-media-reflected-a-division-in-society

https://memo98.sk/uploads/content_galleries/source/memo/moldova/2021/preliminary-findings-on-the-monitoring-of-parliamentary-elections-2021-on-social-media.pdf


When a production line knows what will happen in 10 minutes 5 Feb - Every disruption on a production line creates stress. Machines stop, people wait, production slows down, and decisions must be made under pressure. In the food industry—especially in the production of filled pasta products, where the process follows a strictly sequential set of technological steps—one unexpected issue at the end of the line can bring the entire production flow to a halt. But what if the production line could warn in advance that a problem will occur in a few minutes? Or help decide, already during a shift, whether it still makes sense to plan packaging later the same day? These were exactly the questions that stood at the beginning of a research collaboration that brought together industrial data, artificial intelligence, and supercomputing power.
AI pomáha zachraňovať ženské životy 17 Dec - Strach z rakoviny prsníka je tichým spoločníkom mnohých žien. Stačí jedno pozvanie na preventívne vyšetrenie, jeden telefonát od lekára či jedno čakanie na výsledky – a myseľ je plná otázok: „Som v poriadku?“ „Čo ak nie?“ „Môže sa niečo prehliadnuť?“ Aj keď skríning potvrdí negatívny nález, obavy často pretrvávajú.
Budúcnosť pôdy ukrytá v dátach  5 Nov - High-Performance Computing (HPC) offers researchers the ability to process enormous volumes of data and uncover connections that would otherwise remain hidden. Today, it is no longer just a tool for technical disciplines – it is increasingly valuable in social and environmental research as well. A great example is a project that harnessed the power of HPC to gain deeper insight into the relationship between humans, soil, and the landscape.