Kategórie
Success-Stories

Measurement of microcapsule structural parameters using artificial intelligence (AI) and machine learning (ML)

Measurement of microcapsule structural parameters using artificial intelligence (AI) and machine learning (ML)

The main aim of collaboration between the National Competence Centre for HPC (NCC HPC) and the Institute of Polymers of SAV (IP SAV) was design and implementation of a pilot software solution for automatic processing of polymer microcapsules images using artificial intelligence (AI) and machine learning (ML) approach. The microcapsules consist of semi-permeable polymeric membrane which was developed at the IP SAV.

illustrative image

Automatic image processing has several benefits for IP SAV. It will save time since manual measurement of microcapsule structural parameters is time-consuming due to a huge number of images produced during the process. In addition, the automatic image processing will minimize the errors which are inevitably connected with manual measurements. The images from optical microscope obtained with 4.0 zoom usually contain one or more microcapsules, and they represent an input for AI/ML process. On the other hand, the images from optical microscope obtained with 2.5 zoom usually contain (three to seven) microcapsules. Herein, a detection of the particular microcapsule is essential.

The images from optical microscope are processed in two steps. The first one is a localization and detection of the microcapsule, the second one consists of a series of operations leading to obtaining structural parameters of the microcapsules.

Microcapsule detection

YOLOv5 model with pre-trained weights from COCO128 dataset was employed for microcapsule detection. Training set consisted of 96 images, which were manually annotated using graphical image annotation tool LabelImg [3]. Training unit consisted of 300 epochs, images were subdivided into 6 batches per 16 images and the image size was set to 640 pixels. Computational time of one training unit on the NVIDIA GeForce GTX 1650 GPU was approximately 3.5 hours.

The detection using the trained YOLOv5 model is presented in Figure 1. The reliability of the trained model, verified on 12 images, was 96%, with the throughput on the same graphics card being approximately 40 frames per second.

Figure 1: (a) microcapsule image from optical microscope (b) detected microcapsule (c) cropped detected microcapsule for 4.0 zoom, (d) microcapsule image from optical microscope (e) detected microcapsule (f) cropped detected microcapsule for 2.5 zoom.

Measurement of microcapsule structural parameters using AI/ML

The binary masks of inner and outer membrane of the microcapsules are created individually, as an output from the deep-learning neural network of the U-Net architecture [4]. This neural network was developed for image processing in biomedicine applications. The first training set for the U-Net neural network consisted of 140 images obtained from 4.0 zoom with the corresponding masks and the second set consisted of 140 images obtained from 2.5 zoom with the corresponding masks. The training unit consisted of 200 epochs, images were subdivided into 7 batches per 20 images and the image size was set to 1280 pixels (4.0 zoom) or 640 pixels (2.5 zoom). The 10% of the images were used for validation. Reliability of the trained model, verified on 20 images, exceeded 96%. Training process lasted less than 2 hours on the HPC system with IBM Power 7 type nodes, and it had to be repeated several times. Obtained binary masks were subsequently post-processed using fill-holes [5] and watershed [6] operations, to get rid of the unwanted residues. Subsequently, the binary masks were fitted with an ellipse using scikit-image measure library [7]. First and second principal axis of the fitted ellipse are used for the calculation of the microcapsule structural parameters. An example of inner and outer binary masks, and the fitted ellipses is shown in Figure 2.

Figure 2: (a) input image from optical microscope (b) inner binary mask (c) outer binary mask (d) output image with fitted ellipses.

Structural parameters obtained by our AI/ML approach (denoted as “U-Net“) were compared to the ones obtained by manual measurements performed at the IP SAV. A different model (denoted as “Retinex”) was used as another independent source of reference data. The Retinex approach was implemented by RNDR. Andrej Lúčny, PhD. from the Department of Applied Informatics of the Faculty of Mathematics, Physics and Informatics in Bratislava. This approach is not based on the AI/ML, the ellipse fitting is performed by the aggregation of line elements with low curvature using so-called retinex filler [8]. The Retinex approach is a good reference due to its relatively high precision, but it is not fully automatic, especially for the inner membrane of the microcapsule.

Figure 3 summarizes a comparison between the three approaches (U-Net, Retinex, UP SAV) to obtain the 4.0 zoom microcapsule structural parameters.

(a)
(b)
(c)

Figure 3: (a) microcapsule diameter for different batches (b) difference between the diameters of the fitted ellipse (first principal axis) and microcapsule (c) difference between the diameters of the fitted ellipse (second principal axis) and microcapsule. Red lines in (b) and (c) represents the threshold given by IP SAV. The images were obtained using 4.0 zoom.

All obtained results, except 4 images of batch 194 (ca 1.5%), are within the threshold defined by the IP SAV. As can be seen from Figure 3(a), the microcapsule diameters calculated using U-net and Retinex are in a good agreement to each other. The U-Net model performance can be significantly improved in future, either by the training set expansion or by additional post-processing. The agreement between the manual measurement and the U-Net/Retinex may be further improved by unifying the method of obtaining microcapsule structural parameters from binary masks.

The AI/ML model will be available as a cloud solution on the HPC systems of CSČ SAV. Additional investment into the HPC infrastructure of IP SAV will not be necessary. Production phase, which goes beyond the scope of the pilot solution, accounts for an integration of this approach into the desktop application.

References::

[1] https://github.com/ultralytics/yolov5

[2] https://www.kaggle.com/ultralytics/coco128

[3] https://github.com/heartexlabs/labelImg

[4] https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/

[5] https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.binary_fill_holes.html

[6] https://scikit-image.org/docs/stable/auto_examples/segmentation/plot_watershed.html

[7] https://scikit-image.org/docs/stable/api/skimage.measure.html

[8] D.J. Jobson, Z. Rahman, G.A. Woodell, IEEE Transactions on Image Processing 6 (7) 965-976, 1997.


Supercomputer for Everyone: Dare to Discover the World of Modern Computing 28 Oct - Once, supercomputers were a mysterious technology accessible only to top scientists working in futuristic laboratories. Today, however, a completely new story is being written. Supercomputers are now available to ordinary users — from universities, small companies, and even public administration — anyone who needs to handle computations far beyond the capabilities of a regular computer.
Slovak scientists join forces in the fight against staphylococcal infection 23 Oct - Baktérie patria medzi najmenších, no zároveň najnebezpečnejších protivníkov v medicíne. Kým niektoré sú neškodné, iné dokážu spôsobiť vážne infekcie, ktorých včasná diagnostika rozhoduje o úspechu liečby. Tím slovenských vedcov preto skúma, ako odhaliť prítomnosť baktérií priamo v tkanive, rýchlo, presne a bez potreby invazívnych zásahov. Výskum spája konfokálnu Ramanovu mikroskopiu, fotodynamickú terapiu a analýzu dát na superpočítači. 
Supercomputer Accelerated the Development of Eco-Friendly Hydrogen Production 13 Oct - Hydrogen is one of the key elements driving the transition toward sustainable energy. Its carbon-free production represents a cornerstone of the green energy future — from industry to transportation. However, finding an efficient and affordable way to produce it remains a scientific challenge that brings together chemistry, materials research, and computational modeling.
Kategórie
Success-Stories

Use case: Transfer and optimization of CFD calculations workflow in HPC environment

Use case: Transfer and optimization of CFD calculations workflow in HPC environment

Authors: Ján Škoviera (National competence centre for HPC), Sylvain Suzan (Shark Aero)

Shark Aero company designs and manufactures ultralight sport aircrafts with two-seat tandem cockpit. For design development they use popular open-source software package openFOAM [1]. The CFD (Computational Fluid Dynamics) simulations use the Finite Elements Method (FEM). After the model is created, using a Computer-Aided Design (CAD) software, it is divided into discrete cells, so called “mesh”. The simulation accuracy depends strongly on mesh density with the computational and memory requirements rising with the 3rd power of the number of mesh vertices. For some simulations the computational demands can be a limiting factor. Workflow transfer into High-Performance Computing (HPC) environment was thus undertaken, with a special focus on the investigation of computational tasks parallelization efficiency for a given model type.

METHODS

Compute nodes with 2x6 cores Intel Xeon L5640 @ 2,27GHz, 48 GB RAM and 2x500 GB were used for this project. All calculations were done in a standard HPC environment using Slurm job scheduling system. This is an acceptable solution for this type of workloads where no real-time response, nor immediate data processing is required. For the CFD simulations we continued to use OpenFOAM & ParaView version 9 software packages. Singularity container was used for calculation deployment, having in mind potential transfer of the workload to another HPC system. The speed-up gained from just straight away transfer to HPC system was approximately 1.5x compared to a standard laptop.

PARALLLEZIATION

Parallelized task execution can increase the speed of the overall calculation by utilizing more computing units concurrently. In order to parallelize the task one needs to divide the original mesh into domains - parts that will be processed concurrently. The domains, however, need to communicate through the processor boundaries i.e. domain sides where the original enclosing mesh was divided. The larger the processor boundary surface is, the more I/O is required in order to resolve the boundary conditions. Processor boundary communication is facilitated by the distributed memory Message Passing Interface (MPI) protocol, and the distinction of difference between CPU cores and different compute nodes is abstracted from user. This leads to certain limitations on efficient usage of many parallel processes, since overly parallelized job executions can be actually slower due to communication and I/O bottlenecks. Therefore, the domains should be created in a way that minimizes the processor boundaries. One possible strategy is to divide the original mesh only in co-planar direction with the smallest side of the original enclosing mesh. By careless division into domains the amount of data to be transferred increases beyond reasonable measure. If one chooses to use mesh division in multiple axes, one also creates more processor boundaries.

Figure 1: Illustration of mesh segmentation. The encoling mesh is represented by the transparent boxes

The calculations were done in four steps: enclosing mesh creation, mesh segmentation, model inclusion and CFD simulation. The enclosing mesh creation was done using the blockMesh utility, the mesh segmentation step was done using the decomposePar utility, the model inclusion was done using the snappyHexMesh program, and the CFD simulation itself was done using SimpleFoam. The most computationally demanding step is snappyHexMesh. This is understandable from the fact that while in CFD simulation the calculation needs to be done several times for every edge of the mesh and every iteration, in the case of model inclusion one creates new vertices and deletes old ones based on the position of vertices in the model mesh. This requires creation of an “octree” (partitioning of three-dimensional space by recursively subdividing it into eight octants), repeated inverse search, and octree re-balancing. Each of these processes is N*log(N) in the best case scenario, and N2 in the worst case, N being the number of vertices. The CFD itself scales linearly with number of edges, i.e. “close to” linearly with N (only spatially proximate nodes are interconnected).2 We developed a workflow that creates a number of domains that can be directly parallelized with the yz plane (x being the axis of the aircraft nose), which simplifies the decision making. After inclusion of a new model, one can simply specify the number of domains and run the calculation minimizing the human intervention needed to parallelize the calculation.

RESULTS AND CONCLUSION

The relative speedup of the processes calculation is mainly determined by limited I/O. If the computational tasks are well below I/O bounding, the speed is inversely proportional to the number of domains. In less demanding calculations, i.e. for small models, the processes can be easily over-parallelized.

Figure 2: Dependence of real elapsed time on the number of processes for snappyHexMesh and simpleFoam. In the case of simpleFoam the time starts to diverge for more than 8 processes, since the data trafic overcomes the paralellization advantage. Ideal scaling shows the theoretical time needed to finish the calculation, if the data trafic and processor boundary condition resolution was not involved.

Once the mesh density is high enough, the time to calculate the CFD step is also inversely proportional to the number of parallel processes. As shown in the second pair of figures with twofold increase in mesh density, the calculations are below I/O bounding even in the CFD step. Even though the CFD step is in this case comparatively fast to the meshing process, the calculation of long time intervals could make it the most time consuming step.

The aircraft parts design requires simulations of a relatively small models multiple times under altering conditions. The mesh density needed for these simulations falls into medium category. When transferring the calculations to the HPC environment, we had to take into account the real needs of the end user in terms of model size, mesh density and result precision required. There are several advantages of using HPC:

  • The end user is relieved of the need to maintain his own computational capacities.
  • Even when restricted to single thread jobs the simulations can be offloaded to HPC with high speed up, making even very demanding and precise calculations feasible.
  • For even more effective calculations a simple way of utilizing parallelization was determined, for this particular workload. Limitations of parallel runs for the given use case and conditions were identified. The total increase in speed that was reached in practical conditions is 7.3 times. The speed-up generally grows with the calculation complexity and the mesh precision.


Supercomputer for Everyone: Dare to Discover the World of Modern Computing 28 Oct - Once, supercomputers were a mysterious technology accessible only to top scientists working in futuristic laboratories. Today, however, a completely new story is being written. Supercomputers are now available to ordinary users — from universities, small companies, and even public administration — anyone who needs to handle computations far beyond the capabilities of a regular computer.
Slovak scientists join forces in the fight against staphylococcal infection 23 Oct - Baktérie patria medzi najmenších, no zároveň najnebezpečnejších protivníkov v medicíne. Kým niektoré sú neškodné, iné dokážu spôsobiť vážne infekcie, ktorých včasná diagnostika rozhoduje o úspechu liečby. Tím slovenských vedcov preto skúma, ako odhaliť prítomnosť baktérií priamo v tkanive, rýchlo, presne a bez potreby invazívnych zásahov. Výskum spája konfokálnu Ramanovu mikroskopiu, fotodynamickú terapiu a analýzu dát na superpočítači. 
Supercomputer Accelerated the Development of Eco-Friendly Hydrogen Production 13 Oct - Hydrogen is one of the key elements driving the transition toward sustainable energy. Its carbon-free production represents a cornerstone of the green energy future — from industry to transportation. However, finding an efficient and affordable way to produce it remains a scientific challenge that brings together chemistry, materials research, and computational modeling.
Kategórie
Success-Stories

MEMO98

MEMO98

MEMO98 is a non-profit non-government organisation that has been monitoring the media in context of elections and other events for more than 20 years, and has carried out its activities in more than 50 countries. Recently, the organisation has also been dealing with the impact of social media on the integrity of electoral processes.

The information environment has significantly changed in recent years, especially due to the advent of social media. Apart from some positive aspects, such as the enhanced possibilities of receiving and sharing information, social media has also enabled the dissemination of misinformation to a wide audience quickly and at low cost. MEMO98 analysed the election campaign of the parliamentary elections held on July 11, 2021 in Moldova on five social media platforms: Facebook, Instagram, Odnoklassniki, Telegram and YouTube.

Social media data was collected using CrowdTangle (a Facebook-owned social media analysis tool). The number of posts interactions of candidates and individual political parties on Facebook alone was 1.82 million. The number of posts interactions of party chairmen climbed to 1.09 million. Prior to the start of this project, MEMO98 had no experience with using tools for big data processing and analysis. NCC experts helped design a solution for data processing and visualization utilizing the freely available software Gephi [1] in the HPC environment. The output is a so-called network map, an interactive scheme for finding and analysing the dissemination of specific terms and web addresses in the context of the election campaign. As part of the project, NCC also provided access to computing resources for solution testing, as well as individual training so that MEMO98 can work independently with this solution in the HPC environment in the future.

Preliminary results and conclusions of the monitoring are published by MEMO98 on its website [2].

References


[1] Bastian M., Heymann S., Jacomy M. (2009). Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media.

[2] Network mapping, Moldova Early Parliamentary Elections July 2021, Monitoring of Social Media – Preliminary Findings. Available here:

https://memo98.sk/article/moldovan-social-media-reflected-a-division-in-society

https://memo98.sk/uploads/content_galleries/source/memo/moldova/2021/preliminary-findings-on-the-monitoring-of-parliamentary-elections-2021-on-social-media.pdf


Supercomputer for Everyone: Dare to Discover the World of Modern Computing 28 Oct - Once, supercomputers were a mysterious technology accessible only to top scientists working in futuristic laboratories. Today, however, a completely new story is being written. Supercomputers are now available to ordinary users — from universities, small companies, and even public administration — anyone who needs to handle computations far beyond the capabilities of a regular computer.
Slovak scientists join forces in the fight against staphylococcal infection 23 Oct - Baktérie patria medzi najmenších, no zároveň najnebezpečnejších protivníkov v medicíne. Kým niektoré sú neškodné, iné dokážu spôsobiť vážne infekcie, ktorých včasná diagnostika rozhoduje o úspechu liečby. Tím slovenských vedcov preto skúma, ako odhaliť prítomnosť baktérií priamo v tkanive, rýchlo, presne a bez potreby invazívnych zásahov. Výskum spája konfokálnu Ramanovu mikroskopiu, fotodynamickú terapiu a analýzu dát na superpočítači. 
Supercomputer Accelerated the Development of Eco-Friendly Hydrogen Production 13 Oct - Hydrogen is one of the key elements driving the transition toward sustainable energy. Its carbon-free production represents a cornerstone of the green energy future — from industry to transportation. However, finding an efficient and affordable way to produce it remains a scientific challenge that brings together chemistry, materials research, and computational modeling.