Systems and methods for generating a comparison database and its use in a downstream neural network for anomaly detection in medical images. A set of medical images with annotations is received and filtered for a subset with a decisive detection, determined by an anomaly detection algorithm, of a (non-) existence of anomalies, congruently with the annotation. The filtered set is augmented using a first and a second auto-encoder-decoder by optimizing a distance between encoded states of pairs of medical images. The distance is maximized (or minimized) for positive (or negative) pairs having the same (or disjoint) decisively detected (non-) existence of anomalies before encoding and/or after decoding the encoded state using the first (or second) auto-encoder-decoder. A probability of a (non-) existence of an anomalies is determined. The encoded states of the augmented set are stored along with the determined probabilities.
Legal claims defining the scope of protection, as filed with the USPTO.
. A computer-implemented method for generating a comparison database for use in a downstream neural network for anomaly detection in medical images received from a medical scanner, the method comprising:
. The computer-implemented method of, further comprising:
. The computer-implemented method of, wherein filtering the received set of medical images comprises:
. The computer-implemented method of, further comprising:
. The computer-implemented method of, wherein augmenting the filtered set of medical images comprises training the first auto-encoder-decoder and the second auto-encoder-decoder for optimizing the distances between the encoded states of each pair.
. The computer-implemented method of, wherein the medical images are two-dimensional, 2D, images, and/or 2D slices of volumetric images.
. The computer-implemented method of, wherein the distance between encoded states is determined by a similarity metric between the pairs of encoded states.
. The computer-implemented method of, wherein the similarity metric comprises a mean squared error.
. The computer-implemented method of, further comprising:
. The computer-implemented method of, further comprising:
. The computer-implemented method of, wherein the first auto-encoder-decoder is trained for maximizing a distance of positive pairs of medical images, for which the same existence and non-existence of anomalies for any anomaly class within a set of anomaly classes has been decisively detected before encoding and/or after decoding the encoded states.
. The computer-implemented method of, wherein the second auto-encoder-decoder is trained for minimizing a distance of negative pairs of medical images, for which disjoint existences of anomalies have been decisively detected before encoding and/or after decoding the encoded state.
. A computing device for generating a comparison database for use in a downstream neural network for anomaly detection in medical images received from a medical scanner, the computing device comprising:
. The computing device of, wherein the downstream neural network for performing anomaly detection in medical images received from the medical scanner using the comparison database comprises:
. The computing device of, wherein the downstream neural network is trained for performing anomaly detection in medical images received from the medical scanner, wherein in a training phase, in the step of receiving the medical image, annotations in a relation to the medical image are received as Ground Truth, and wherein an assigned existence and non-existence of anomalies to the received medical image is further compared with the received Ground Truth, and wherein the comparing comprises determining at least one value of a loss function, wherein training the downstream neural network comprises optimizing the at least one value of the loss function.
. A system for performing anomaly detection in medical images received from a medical scanner, the system comprising:
Complete technical specification and implementation details from the patent document.
This application claims the benefit of DE 10 2024 113 543.3, filed on May 15, 2024, and EP 24465521.3 filed on May 15, 2024, both of which are hereby incorporated by reference in their entirety.
Embodiments relate to a technique for generating a comparison database for use in a downstream neural network (NN) for anomaly detection in medical images received from a medical scanner.
Chest X-rays are a commonly performed diagnostic imaging procedure, conducted millions of times worldwide each year. However, interpreting these X-rays is a highly subjective and intricate task in radiology, with varying levels of agreement among different readers. The degree of agreement, measured by a kappa value, ranges from 0.2 to 0.77, depending on factors like the reader's experience, the specific abnormality being identified, and the clinical environment (Moncada, 2011), (Hopstaken, 2004).
Radiologists encounter difficulties in analyzing extensive patient health data promptly. This is due to the complexity of interpreting X-ray images, where overlapping tissues and low contrast resolutions often lead to missed detections and diagnoses.
Using deep learning (DL) techniques for medical image analysis presents various challenges. In chest X-ray classification, one notable challenge arises from imbalanced datasets, leading to inaccurate results. Additionally, class imbalance may contribute to model overfitting (Jaiswal, 2019).
Radiologists reviewing chest X-rays have access to essential clinical information, including the patient's medical history, symptoms, and often additional data from tests like blood work. This comprehensive information aids radiologists not only in identifying visible abnormalities (e.g., consolidation) but also in deducing their likely causes (e.g., pneumonia). By integrating data from multiple sources alongside the chest X-ray image, the radiologist may enhance his or her sensitivity and specificity, reducing the risk of blindly following algorithms suggesting inaccurate labels that do not align with external data sources.
Transfer learning (Yang, 2020) is a widely used technique in radiology, particularly in chest X-ray analysis. It involves leveraging pre-trained models, originally developed for tasks like ImageNet (Deng, 2009) classification, and adapting them for medical image analysis (Hashmi, 2020), (Apostolopoulos, 2020), (Farooq, 2020). This approach offers two methods: feature extraction (Yang, 2020), where the pre-trained model's lower layers are kept fixed, and fine-tuning (Yang, 2020), where some layers are adjusted for specific medical tasks. Transfer learning may significantly improve model performance, particularly when dealing with limited medical data. However, it requires pre-training on similar datasets and careful consideration of learning rates for optimal results.
Image segmentation (Sultana, 2020), the process of dividing digital images into fragments, is valuable in medical image analysis (Bhattacharya, 2021). For instance, in (Narayanan, 2020), the authors applied transfer learning with various models to detect pneumonia in chest X-ray images, achieving high accuracy, especially with the InceptionV 3 model. They also introduced a CNN architecture for distinguishing bacterial from viral pneumonia, using image segmentation with a U-Net architecture based on lung masks from the Shenzhen Dataset (Candemir, 2014), (Jaeger, 2014). This segmentation significantly improved accuracy.
(LaLonde, 2018) introduced a convolutional-deconvolutional capsule network for lung segmentation, enhancing results while reducing parameters. Their model achieved an accuracy of 98.47% in lung segmentation. Similarly, (Bonheur, 2019) proposed “Matwo-CapsNet,” a semantic segmentation network based on capsule network concepts, extending previous work.
Ensemble classification (Chollet, 2021) combines multiple models to improve accuracy and reliability by reducing prediction variability. (Hashmi, 2020) used a weighted classifier with five models for pneumonia classification, achieving 98.43% accuracy. (Chouhan, 2020) used majority voting with five models for 96.4% accuracy. (Pant, 2020) combined U-Nets based on ResNet-34 and EfficientNet-B4, reaching 90% accuracy. (Hilmizen, 2020) experimented with various model combinations and achieved 99.87% accuracy by concatenating ResNet-50 and VGG-16.
Many deep neural network architectures have been employed in image classification studies. VGG, ResNet, Inception, MobileNet, DenseNet, CapsNet, U-Net, EfficientNet, and SqueezeNet have all been used for tasks like pneumonia and COVID-19 detection in chest X-rays. These models have achieved varying levels of accuracy, often surpassing 90% accuracy in their respective applications.
However, it remains an open problem that conventional image analysis methods provide in many cases uncertain disease labeling and/or abnormality detection.
Embodiments provide a (for example automated and/or real-time) solution for an improved prediction on the existence and non-existence of anomalies from medical images, for example those which conventionally are considered not sufficiently decisive or clear. Embodiments may further (time-efficiently) integrate (for example in an automated manner) medical data from multiple resources for medical image analysis in view of potential anomalies.
Embodiments provide a method of generating a comparison database for use in a downstream NN for anomaly (also: abnormality) detection in medical images received from a medical scanner, by a method of performing, by a downstream NN, anomaly detection in medical images received from a medical scanner using the generated comparison database, by a method of training a downstream NN for performing anomaly detection in medical images received from a medical scanner, by two (e.g., auto-) encoder-decoders, by a computing device, by a downstream NN, by a system including the downstream NN, by a computer program (and/or computer program product) and by a computer-readable storage medium (also: memory).
Embodiments are described with respect to the claimed methods as well as with respect to the (e.g., auto-) encoder-decoders, computing device and downstream NN. Features, advantages or alternative embodiments herein may be assigned to the other objects (e.g., the computer program or a computer program product) and vice versa. In other words, embodiments for the (e.g., auto-) encoder-decoders, computing device and downstream NN may be improved with features described or claimed in the context of the methods. In this case, the functional features of the method are embodied by structural units of the system and vice versa, respectively.
As to a first method aspect, a (for example computer-implemented) method of generating a comparison database for use in a downstream neural network (NN) for anomaly detection in medical images received from a medical scanner is provided. The method includes a step of receiving a set of medical images. Each medical image within the set includes an annotation. The method further includes a step of filtering the received set of medical images for a subset of medical images with a decisive detection of an existence and non-existence of anomalies for a set of anomaly classes. The decisive detection is determined by an anomaly detection algorithm congruently with the received annotation. The method further includes a step of augmenting the filtered set of medical images using two (e.g., auto-) encoder-decoders by optimizing a distance between encoded states of pairs of medical images within the filtered set. The optimizing of the distance of the encoded states includes maximizing, using a first (e.g., auto-) encoder-decoder, the distance for positive pairs of medical images having the same decisively detected existence and non-existence of anomalies before encoding and/or after decoding the encoded state. The optimizing of the distance of the encoded states further includes minimizing, using a second (e.g., auto-) encoder-decoder, the distance for negative pairs of medical images having disjoint decisively detected existences of anomalies before encoding and/or after decoding the encoded state. It may further be required that any medical image within a negative pair has (e.g., only) decisively determined non-existences of anomalies (e.g., for no anomaly within the set of anomaly classes, the anomaly detection algorithm is non-decisive and/or ambiguous). The augmenting further includes determining a probability of an existence or non-existence of an anomaly for each anomaly class from the decoding of the encoded state. The method further includes a step of storing the encoded states of the augmented set of medical images along with the determined (e.g., decisive) probabilities of existences and non-existences of anomalies for generating the comparison database.
An improved anomaly detection in medical images (e.g., caused by pneumonia and/or COVID-19, e.g., derived from X-ray chest images) and related diagnosis, for example in the presence of overlapping tissues and/or low contrast resolution, is thus provided. The improvement may advantageously include both an accuracy (and/or reliability) and a time-efficiency.
By the generated comparison database, for example imbalanced datasets (e.g., for training a downstream NN for anomaly detection and/or diagnostic support) may be compensated, and/or an overfitting may be avoided.
Embodiments aim at solving the issue of uncertain disease labelling (and/or abnormality detection) by using for example contrastive learning, optionally in conjunction with a (e.g., large language model, LLM) generative task using related (and/or similar) medical (e.g., radiology) findings (e.g., from medical reports, vital functions, laboratory results and/or further medical images such as historic and/or previous medical images of the same patient) as input.
For example, by using the comparison database for anomaly detection in medical images, an accuracy and reliability of the anomaly detection may be improved and a prediction variability may be reduced. By the computer implementation, the anomaly detection may further be performed in a time-efficient manner, for example for real-time application, such as during a medical consultation. Thereby, a need for follow-up visits and/or follow-up dates for performing further medical imaging may be avoided.
The medical scanner may make use of a predetermined imaging modality (e.g., X-ray imaging, also: radiography; and/or magnetic resonance tomography, MRT) and/or may provide images of a predetermined anatomical area (e.g., the chest).
The annotation may include an image classification, an image segmentation, and/or text indicative of an existence or non-existence of an anomaly per anomaly class. The annotation may, e.g., be provided by a medical expert. Alternatively, or in addition, the annotation may be obtained at least semi-automatically (e.g., based on an automated segmentation, the medical expert may determine the existence of an anomaly).
Alternatively, or in addition, the annotation may include a kappa value, which conventionally measures how often multiple medical experts (also: clinicians), examining the same patients (or the same imaging results), agree that a particular finding (e.g., an anomaly) is present (also: existent) or absent (also: non-existent).
The set of anomaly classes may include multiple anomaly classes, for example per anatomical region. Alternatively, or in addition, an anomaly class may include a tumor, a lesion, and/or an alteration of a type and/or density of tissue. E.g., anomalies that may exist and/or may be detectable in chest images (and/or images including the lungs) may include pneumonia, a mass, atelectasis, consolidation, edema, emphysema and/or fibrosis. Chest images (and/or images including the lungs) may, e.g., be acquired by radiography.
Alternatively, or in addition, anomalies that may exist and/or may be detectable in chest images (and/or images including the heart) may include a myocardial infarct (also: heart attack), cardiomyopathy, valve disorders, congenital heart defects, pericardial diseases, a mass (and/or tumor), and/or coronary artery disease. Chest images (and/or images including the heart) may, e.g., be acquired by MRT, e.g., as a two-dimensional (2D) slice.
Further alternatively or in addition, anomalies that may exist and/or may be detectable in brain images may include an aneurysm, a focal point (and/or or indication) of epilepsy, a mass (e.g., tumor), white matter hyperintensities, anomalous calcifications, and/or a (e.g., chronic) brain infarct. Brain images may, e.g., be acquired by MRT, e.g., as a two-dimensional (2D) slice.
The anomaly detection algorithm may be configured to determine a value of a probability of an existence or non-existence of an anomaly per anomaly class. A decisive detection may include the determined probability value exceeding (and/or reaching) a high (and/or detection) threshold, e.g., at least 90% (and for example 95%). A decisive non-detection may include the determined probability value undercutting (and/or reaching) a low (and/or non-detection) threshold, e.g., 10% (and for example 5%).
The first (e.g., auto-) encoder-decoder (also: positive-pair, e.g., auto-, encoder-decoder) and/or the second (e.g., auto-) encoder-decoder (also: negative-pair, e.g., auto-, encoder-decoder) may be trained (and/or continue to be trained) using inverse contrastive learning for generating (and/or augmenting; also: enhancing) the comparison database.
Conventional contrastive learning involves training a model to differentiate between similar and dissimilar pairs of data points by maximizing their similarity within the same (and/or positive) class and minimizing it between different (and/or negative) classes. By contrast, the first (e.g., auto-) encoder-decoder minimizes the similarity (and/or maximizes the distance) between positive pairs of encoded states, and the second (e.g., auto-) encoder-decoder maximize the similarity (and/or minimizes the distance) between negative pairs of encoded states.
An encoded state (also: encoded representation, encoded embedding, latent space, and/or state in feature space) may be the output of the encoder of an (e.g., auto-) encoder-decoder. Alternatively, or in addition, the encoded state may be a compressed state and/or a condensed state (e.g., including less Bytes than the medical image, on which it is based).
The first (e.g., auto-) encoder-decoder and/or the second (e.g., auto-) encoder-decoder may be (e.g., continuously) fine-tuned for improved inverse contrastive learning and/or for modifying encoded states (for example by the encoder of the corresponding, e.g., auto-, encoder-decoder) complying with decisive anomaly detection results of the related decoded images (for example provided by the decoder of the corresponding (e.g., auto-) encoder-decoder).
The generated comparison database may be used for transfer learning for anomaly detection in medical images by a downstream NN.
The method may further include a step of augmenting the set of medical images. The augmenting of the set of medical images may include, for any medical image within the set adding noise, and/or performing at least one geometric transformation. The at least one geometric transformation may include flipping the medical image (e.g., horizontally and/or vertically), rotating the medical image (e.g., by 90 degrees), cropping the medical image, and/or re-scaling (briefly also: scaling) the medical image. Scaling may include changing an image size (e.g., in terms of a digital data volume and/or resolution).
The step of augmenting of set of medical images by adding noise and/or performing one or more geometric transformations may be performed externally (e.g., before the set of medical images is received by a computing device). Alternatively, or in addition, the step of augmenting of set of medical images by adding noise and/or performing one or more geometric transformations may be performed on the received set of medical images before the step of filtering.
By the augmenting of the set of medical images by adding noise and/or performing geometric transformations, a flexibility and/or effectiveness of the two (e.g., auto-) encoder-decoders may be improved.
Filtering the received set of medical images may include applying the anomaly detection algorithm to each received medical image within the set. The applied anomaly detection algorithm may determine a probability of an existence or non-existence of an anomaly for each anomaly class within a predetermined set of anomaly classes. The filtering of the received set of medical images may further include selectively retaining a medical image if the determined probability of the applied anomaly detection algorithm is decisive. The probability being decisive may include, for each anomaly class within the predetermined set of anomaly classes, a probability of the existence of an anomaly reaching and/or being above a predetermined high threshold (e.g., at and/or above 95% probability). Alternatively, or in addition, the probability being decisive may include, for each anomaly class within the predetermined set of anomaly classes, a probability of the non-existence of an anomaly reaching and/or being below a predetermined low threshold (e.g., at and/or below 5% probability). The filtering may require a (e.g., independent and/or separate) decisive determination for each anomaly class.
The filtering of the received set of medical images may further include comparing, for each retained (e.g., by the determination by the anomaly detection algorithm being decisive for each anomaly class) medical image, the detected existences and non-existences of anomalies for each anomaly class within the predetermined set of anomaly classes with the received annotation (e.g., without an assigned probability and/or with statements like “anomaly—e.g., an abnormal mass-exists” only). The filtering of the received set of medical images may further include selectively retaining each medical image, for which the result of the comparing is consensual for each anomaly class.
Alternatively, or in addition, the filtering of the received set of medical images may include an optional grouping of the retained medical images according to their detected existences and non-existences of anomalies into positive pairs of identical decisively detected existences of one or more anomalies (for example obtained by the anomaly detection algorithm and confirmed by the annotation).
The comparison database may be composed of groups, each having the same detected existences of some anomalies and non-existences of the other anomalies within the set of anomaly classes.
Some of the groups may be disjoint in terms of each detected existence of an anomaly. Any negative pair of medical images (and/or their encoded states) may correspond to one medical image selected from a first group and a second medical image being selected from a second group that is disjoint from the first group in terms of each detected existence of an anomaly (and/or anomaly class). Alternatively, or in addition, any positive pair may include two medical images (and/or their encoded states) within the same group (e.g., the first group).
The step of augmenting the filtered set of medical images may include training the two (e.g., auto-) encoder-decoders for optimizing the distances between the encoded states of each pair.
The first (e.g., auto-) encoder-decoder may be trained to modify an encoded state, such that a distance with respect to a reference encoded state is maximized, while a result of detecting the existences and non-existences of anomalies remains unchanged (e.g., up to fluctuations in the corresponding probability values within the limits of the high and low thresholds, such as changing from 95% to 96%, and/or from 5% to 4%, for the decoded image obtained from the modified encoded state).
The second (e.g., auto-) encoder-decoder may be trained to modify an encoded state, such that a distance with respect to a reference encoded state is minimized, while a result of detecting the existences and non-existences of anomalies remains unchanged (e.g., up to fluctuations in the corresponding probability values within the limits of the high and low thresholds, such as changing from 95% to 96%, and/or from 5% to 4%, for the decoded image obtained from the modified encoded state).
Each encoder-decoder may be an auto-encoder-decoder, for example configured to reconstruct the same medical image.
The medical images may be two-dimensional (2D) images and/or 2D slices of volumetric images (also denoted as three-dimensional, 3D, images).
The medical images may be planar images and/or may include planar slices of volumetric images.
The medical images may include a predetermined anatomical area, e.g., a (for example human) chest and/or (for example human) brain. The anatomical detection algorithm may be configured for detecting anomalies in relation to the predetermined anatomical area and/or one or more organs located within the predetermined anatomical area (e.g., the lung being located in the chest area).
The medical images may be acquired in relation to a human (e.g., patient) or an animal (e.g., a mammal, such as a horse).
The medical images, for which the anomaly detection is to be performed, may be acquired by radiography (also: X-ray imaging); ultrasound (US), for example echocardiography; scintigraphy; optical coherence tomography (OCT); magnetic resonance tomography (MRT); computed tomography (CT); positron emission tomography (PET); and/or single-photon emission computed tomography (SPECT).
X-ray images and US images may be acquired as 2D images.
Unknown
November 20, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.