An image processing device includes a processor, in which the processor is configured to: derive, for each of a plurality of two-dimensional images including a structure and having a spatial connection, existence range information for defining a spatial existence range of the structure in a direction intersecting the two-dimensional images, by using a derivation model; and integrate the existence range information derived for each of the plurality of two-dimensional images, based on spatial positions of the plurality of two-dimensional images, by using an integration model.
Legal claims defining the scope of protection, as filed with the USPTO.
. An image processing device comprising:
. The image processing device according to,
. The image processing device according to,
. The image processing device according to,
. The image processing device according to,
. The image processing device according to,
. The image processing device according to,
. The image processing device according to,
. The image processing device according to,
. The image processing device according to,
. The image processing device according to,
. An image processing method executed by a computer, the image processing method comprising:
. A non-transitory computer-readable storage medium that stores an image processing program causing a computer to execute a procedure comprising:
Complete technical specification and implementation details from the patent document.
The present application claims priority from Japanese Patent Application No. 2024-045627, filed on Mar. 21, 2024, the entire disclosure of which is incorporated herein by reference.
The present disclosure relates to an image processing device, an image processing method, and an image processing program.
In recent years, with the advancement of medical equipment, such as a computed tomography (CT) apparatus and a magnetic resonance imaging (MRI) apparatus, three-dimensional images having a higher quality and a higher resolution have been used for image diagnosis.
In a case in which a subject is imaged by using an imaging apparatus, such as the CT apparatus or the MRI apparatus, in order to determine an imaging range, scout imaging is performed before main imaging for acquiring a three-dimensional image to acquire a two-dimensional image for positioning (scout image). An operator of an imaging apparatus, such as a technician, sets the imaging range at the time of main imaging while viewing the scout image.
Meanwhile, the setting of the imaging range while viewing the scout image requires time because the operator needs to perform the setting manually. In addition, since the setting accuracy depends on the ability and the experience of the operator, there is a variation in the setting accuracy. Therefore, a method of estimating a three-dimensional position of an organ included in a tomographic image constituting a scout image has been proposed (see, for example, WO2021/205990A).
By using the method described in WO2021/205990A, it is possible to derive a bounding box for defining the three-dimensional position of the organ included in the tomographic image, for each tomographic image, to integrate position coordinates by obtaining a statistical value of position coordinates of the bounding box derived for each tomographic image, and to obtain a bounding box for specifying a most probable three-dimensional position of the organ. However, since the method described in WO2021/205990A does not consider the position of the tomographic image, there is a possibility that a coordinate position of the integrated bounding box is not accurate.
The present disclosure has been made in view of the above-described circumstances, and an object of the present disclosure is to enable highly accurate derivation of a position of a structure such as an organ in consideration of a position of a tomographic image.
The present disclosure provides an image processing device comprising: a processor, in which the processor is configured to: derive, for each of a plurality of two-dimensional images including a structure and having a spatial connection, existence range information for defining a spatial existence range of the structure in a direction intersecting the two-dimensional images, by using a derivation model; and integrate the existence range information derived for each of the plurality of two-dimensional images, based on spatial positions of the plurality of two-dimensional images, by using an integration model.
The present disclosure provides an image processing method executed by a computer, the image processing method including: deriving, for each of a plurality of two-dimensional images including a structure and having a spatial connection, existence range information for defining a spatial existence range of the structure in a direction intersecting the two-dimensional images, by using a derivation model; and integrating the existence range information derived for each of the plurality of two-dimensional images, based on spatial positions of the plurality of two-dimensional images, by using an integration model.
The present disclosure provides an image processing program causing a computer to execute a procedure including: deriving, for each of a plurality of two-dimensional images including a structure and having a spatial connection, existence range information for defining a spatial existence range of the structure in a direction intersecting the two-dimensional images, by using a derivation model; and integrating the existence range information derived for each of the plurality of two-dimensional images, based on spatial positions of the plurality of two-dimensional images, by using an integration model.
According to the present disclosure, the position of the structure can be derived with high accuracy in consideration of the position of the tomographic image.
Hereinafter, an embodiment of the present disclosure will be described with reference to the drawings. First, a configuration of a medical information system to which an image processing device according to the present embodiment is applied will be described.is a diagram showing a schematic configuration of the medical information system. In the medical information system shown in, a computerincluding the image processing device according to the present embodiment, an imaging apparatus, and an image storage serverare connected via a networkin a communicable state.
The computerincludes the image processing device according to the present embodiment, and an image processing program according to the present embodiment is installed in the computer. The computermay be a workstation or a personal computer directly operated by a doctor who makes a diagnosis, or may be a server computer connected to the workstation or the personal computer via the network. The image processing program is stored in a storage device of the server computer connected to the network or in a network storage to be accessible from the outside, and is, in response to a request, downloaded and installed in the computerused by the doctor. Alternatively, the image processing program is distributed in a state of being recorded on a recording medium, such as a digital versatile disc (DVD) or a compact disc read-only memory (CD-ROM), and is installed in the computerfrom the recording medium.
The imaging apparatusis an apparatus that generates a two-dimensional image or a three-dimensional image representing a part of a subject to be diagnosed by imaging the part, and is specifically a radiography apparatus, a computed tomography (CT) apparatus, a magnetic resonance imaging (MRI) apparatus, a positron emission tomography (PET) apparatus, or the like. The image of the subject generated by the imaging apparatusis transmitted to the image storage serverand stored in the image storage server. It should be noted that the three-dimensional image includes a plurality of tomographic images or an image composed of three-dimensional coordinates generated from the plurality of tomographic images.
The image storage serveris a computer that stores and manages various types of data, and comprises a large-capacity external storage device and software for database management. The image storage servercommunicates with another device via the wired or wireless network, and transmits and receives image data and the like to and from the other device. Specifically, the image storage serveracquires various types of data including the image data of the image generated by the imaging apparatusvia the network, and stores and manages the various types of data in the recording medium, such as the large-capacity external storage device. It should be noted that a storage format of the image data and the communication between the devices via the networkare based on a protocol such as digital imaging and communication in medicine (DICOM).
Hereinafter, the image processing device according to the present embodiment will be described.is a diagram showing a hardware configuration of the image processing device according to the present embodiment. As shown in, the image processing deviceincludes a central processing unit (CPU), a display, an input device, a memory, and a network interface (I/F)connected to the network. The CPU, the display, the input device, the memory, and the network I/Fare connected to a bus. It should be noted that the CPUis an example of a processor in the present disclosure.
The memoryincludes the storage unitand a random access memory (RAM). The RAMis a primary storage memory, and is, for example, a RAM such as a static random access memory (SRAM) or a dynamic random access memory (DRAM).
The storage unitis a non-volatile memory and is implemented by, for example, at least one of a hard disk drive (HDD), a solid state drive (SSD), an electrically erasable and programmable read only memory (EEPROM), or a flash memory. An image processing program according to the present embodiment is stored in the storage unitas a storage medium. The CPUreads out the image processing programfrom the storage unit, loads the readout image processing programin the RAM, and executes the loaded image processing program. It should be noted that the storage unitalso stores a derivation modeland a transformer, which will be described below.
The displayis a device that displays various screens, and is, for example, a liquid crystal display or an electro luminescence (EL) display. The input deviceis a device for a user to perform input, and is, for example, at least any one of a keyboard, a mouse, a microphone for audio input, a touchpad for proximity input including contact, or a camera for gesture input. The network I/Fis an interface for connection to the network.
Hereinafter, a functional configuration of the image processing device according to the present embodiment will be described.is a diagram showing the functional configuration of the image processing device according to the present embodiment. As shown in, the image processing devicecomprises an information acquisition unit, a derivation unit, an integration unit, a learning unit, and a display controller. In a case in which the CPUexecutes the image processing program, the CPUfunctions as the information acquisition unit, the derivation unit, the integration unit, the learning unit, and the display controller.
The information acquisition unitacquires a medical image Gthat is a processing target from the image storage serverin response to an instruction issued from an operator by using the input device. In the present embodiment, the medical image Gis a scout image used for positioning during the imaging using the CT apparatus or during the imaging using the MRI apparatus. The scout image includes a plurality of tomographic images, has a larger slice interval than the three-dimensional image, and has a smaller number of tomographic images than the three-dimensional image. Therefore, in the present embodiment, the three-dimensional image is referred to as a thin slice image and an image having a large slice interval, such as the scout image, is referred to as a thick slice image.
A difference between the thin slice image and the thick slice image is a difference in resolution in a direction perpendicular to a slice plane. Since the slices are dense in a direction perpendicular to the slice plane in the thin slice image, a structure can be recognized with high accuracy. Meanwhile, since the slice interval in a direction perpendicular to the slice plane is larger in the thick slice image than in the thin slice image, the accuracy of reproducing the structure is lower in the thick slice image than in the thin slice image. It should be noted that, in the present embodiment, the scout image is an image having a larger slice interval than the three-dimensional image, but the present disclosure is not limited to this. Since the slice image need only have a resolution in a direction perpendicular to the slice plane smaller than that of the three-dimensional image, the slice image includes an image having a slice thickness larger than that of the three-dimensional image.
It should be noted that, in the present embodiment, it is assumed that the scout image for acquiring a three-dimensional image of a liver is acquired as the medical image Gby the imaging apparatus.
In addition, the information acquisition unitacquires training data used to train a derivation model and an integration model, which will be described below, from the image storage server. The training data and the learning will be described below.
The derivation unitderives, for each of a plurality of two-dimensional images including the structure and having a spatial connection, existence range information for defining a spatial existence range of the structure in a direction intersecting the two-dimensional images, by using the derivation model. That is, the derivation unituses the derivation modelto define the existence range of the structure in a plane of the tomographic image and to derive, as the existence range information, the three-dimensional coordinates for defining the positions of the upper and lower end parts of the liver outside the tomographic plane in a direction intersecting the tomographic image for each of the plurality of tomographic images included in the medical image G. The existence range information is, for example, three-dimensional coordinates of a plurality of vertices for defining a rectangular parallelepiped surrounding the structure in the three-dimensional space, that is, a bounding box. In particular, in the present embodiment, the derivation unitderives, as the existence range information, the three-dimensional coordinates of two vertices (diagonal vertices) farthest from each other among the plurality of vertices for defining the bounding box.
The integration unitintegrates the existence range information derived for each of the plurality of two-dimensional images, by using the transformer, based on the spatial positions of the plurality of two-dimensional images. That is, the three-dimensional coordinates of the diagonal vertices of the bounding box derived for each of the plurality of tomographic images included in the medical image Gare subjected to positional encoding and input to the transformer, to derive the integrated three-dimensional coordinates of the diagonal vertices. The transformeris an example of an integration model according to the present disclosure.
is a diagram showing processing performed by the derivation unitand the integration unit. It should be noted that, in, it is assumed that the medical image Gincludes four tomographic images Dto Dof axial cross sections. The tomographic images Dto Dare examples of a plurality of tomographic images having a same imaging direction according to the present disclosure. As shown in, the orthogonal directions in the plane of the tomographic images Dto Dare an x-direction and a y-direction, and the direction intersecting the tomographic images Dto Dis a z-direction.
The derivation unitderives, as the existence range information, three-dimensional coordinates of diagonal vertices of bounding boxes Bto Bfor defining the spatial existence range of the liver included in the tomographic images Dto D, respectively, based on the tomographic images Dto Dby using the derivation model.
Here, the bounding boxes Bto Bare rectangular parallelepipeds having sides parallel to the x-direction, the y-direction, and the z-direction. In a case in which the diagonal vertex is defined among the eight vertices for defining the bounding boxes Bto B, a shape of the rectangular parallelepiped can be defined. For example, in a case in which two verticesandthat are end points of a diagonal line are defined for the bounding box Bof the tomographic image Dshown in, a rectangular parallelepiped shape of the bounding box Bcan be defined. Therefore, in the present embodiment, the derivation modeloutputs the three-dimensional coordinates of the diagonal vertex among the eight vertices for defining the bounding box as the existence range information.
The derivation modelis constructed by, for example, a method described in WO2021/205990A. That is, the derivation modelconsists of, for example, a convolutional neural network (hereinafter, a CNN), and is constructed by performing machine learning using training data so that, in a case in which the tomographic image is input, a position of the structure included in the input tomographic image within the tomographic plane is defined, and three-dimensional coordinate information for defining the position outside the tomographic plane of the end part of the structure in a direction intersecting the tomographic image is output. The machine learning for constructing the derivation modelwill be described below.
In the present embodiment, the derivation modelis the three-dimensional coordinates of the two diagonal vertices of the bounding box. Therefore, the existence range information is a six-dimensional feature vector. The existence range information derived for each of the tomographic images Dto Dis denoted by μto μ.
The transformeris constructed by using a transformer model. The transformeris proposed, for example, in “Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems. 2017”. The transformerintegrates the feature vectors by repeating processing of deriving a degree of similarity between the input feature vectors and adding the feature vectors in accordance with a weight corresponding to the derived degree of similarity, and outputs the integrated feature vectors.
In the present embodiment, in a case in which pieces of the existence range information μto μare input to the transformer, positional encoding is performed to incorporate the position information in the z-direction of the tomographic image into each existence range information (that is, the feature vector). In the present embodiment, pieces of position information pto pof the tomographic images Dto Din the z-direction are incorporated into the existence range information μto μ, respectively. The position information of the tomographic images Dto Din the z-direction can be represented by a one-dimensional vector only in the z-direction. Therefore, the feature vector input to the transformerhas seven dimensions together with the feature value vector of the diagonal vertex.
As a reference point of the positional encoding, for example, in the present embodiment, the centroids of the bounding boxes Bto Bderived for the tomographic images Dto Dcan be derived, and a representative value of the z-coordinates of the derived centroids can be used. As the representative value, an average value and a median value of the z-coordinates of the centroids can be used, but the present disclosure is not limited to this. Alternatively, the centroid of any of the bounding boxes Bto Bmay be derived as the reference point, and the z-coordinate of the center of the surface intersecting the bounding box in the tomographic image at a position closest to the z-coordinate of the centroid may be used.
In the present embodiment, pieces of the positional-encoded existence range information μto μare input to the transformer, the existence range information is integrated, and the integrated existence range information, that is, the three-dimensional coordinates of the diagonal vertex of an integrated bounding box UB are output from the transformeras integrated existence range information Uμ.
It should be noted that, in a case in which the feature vector is input to the transformer, the positional encoding may be performed by weighting the feature vector. For example, the positional encoding may be performed by increasing the weighting for the existence range information derived from the tomographic image located closer to the center among the tomographic images from which the existence range information is derived, and decreasing the weighting for the existence range information derived from the tomographic image located farther from the center.
In addition, the weighting may be increased for the existence range information derived from the tomographic image having a larger area of the included structure (that is, the liver) among the plurality of tomographic images. In addition, the weighting may be increased for the existence range information derived from the tomographic image including the structure having a higher anatomical relevance to the structure to be diagnosed among the structures included in the tomographic image. For example, in a case in which the structure is the liver, the weighting in a case of performing the positional encoding may be increased for the existence range information derived from the tomographic image including a blood vessel related to the liver, a part of the intestine, or the like.
The learning unitconstructs the derivation modeland the transformerthrough the machine learning. The derivation modelis constructed by training a convolutional neural network (CNN), and the transformeris constructed by training a transformer model (TFM). For training the CNN and the TFM, the tomographic image and the three-dimensional coordinates of the diagonal vertices of the bounding box for the structure included in the tomographic image are used as the training data.
is a diagram showing the training data. As shown in, training dataconsists of a medical image for training TGincluding a plurality of tomographic images for training TDto TDand three-dimensional coordinate information Tμof a bounding box for training TBrepresenting the existence range of the structure (liver) included in the medical image for training TG. The three-dimensional coordinate information Tμis three-dimensional coordinates of diagonal verticesandof the bounding box for training TB. The three-dimensional coordinate information Tμis ground truth data in the training data.
is a diagram showing training of the CNN and the TFM. The learning unitinputs the tomographic images for training TDto TDincluded in the medical image for training TGto a CNNand causes the CNNto output the bounding boxes for training TBto TBfor defining the existence ranges of the structure included in the tomographic images for training TDto TD. Specifically, the three-dimensional coordinates of the diagonal vertices of the bounding boxes for training TBto TBare output as existence range information for training Tμto Tμ.
The learning unitperforms the positional encoding on the existence range information for training Tμto Tμto incorporate the position information Tpto Tpin the z-direction of the tomographic images for training TDto TDinto each of the existence range information for training Tμto Tμ. Then, the learning unitinputs the positional-encoded existence range information for training Tμto Tμto a TFM, integrates the existence range information for training Tμto Tμ, and outputs the three-dimensional coordinates of the diagonal vertex of the integrated bounding box TUBO from the TFMas the integrated existence range information Uμ.
The learning unitderives a difference between the existence range information for training Tμto Tμand the three-dimensional coordinate information Tμof the diagonal coordinates of the bounding box for training TBof the training data, which is the ground truth data, as a first loss L-. In addition, the learning unitderives a difference between the integrated existence range information Uμand the three-dimensional coordinate information Tμof the training dataas a second loss L-. Then, the learning unittrains the CNNby performing, as appropriate, weighting on the first loss L-and the second loss L-so that the first loss L-and the second loss L-are equal to or less than a predetermined threshold value. In addition, the learning unittrains the TFMby performing, as appropriate, weighting on the second loss L-so that the second loss L-is equal to or less than the predetermined threshold value.
As the learning progresses, in a case in which each of the plurality of tomographic images included in the medical image is input, the CNNcan accurately derive the three-dimensional coordinates of the diagonal vertex of v bounding box for defining the spatial existence range of the structure included in the tomographic image. In addition, in a case in which the existence range information (that is, the three-dimensional coordinates of the diagonal vertex of the bounding box) of the structure in the plurality of positional-encoded tomographic images is input, the TFMcan integrate the existence range information and accurately derive the three-dimensional coordinates of the diagonal vertex of the bounding box for defining the spatial existence range of the structure included in the medical image. By advancing the learning in this way, the CNNis constructed as the derivation model, and the TFMis constructed as the transformer.
The display controllerdisplays the existence range of the structure derived from the input medical image G, that is, the bounding box, on the medical image Gin a superimposed manner.is a diagram showing a display screen. As shown in, the medical image Gon which a bounding boxis displayed in a superimposed manner is displayed on a display screen. It should be noted that the displayed medical image Gis one tomographic image among the tomographic images included in the medical image G, and the operator can switch the displayed tomographic image by operating the input device.
Hereinafter, the processing performed in the present embodiment will be described.is a flowchart showing processing performed in the present embodiment. First, the information acquisition unitacquires the medical image Gas a processing target from the image storage server(step ST). Next, the derivation unitderives the existence range information indicating the spatial existence range of the structure for each of the plurality of tomographic images Dto Dincluded in the medical image G(step ST). Next, the integration unitintegrates the existence range information derived for the plurality of tomographic images Dto Dto derive the integrated existence range information (step ST). Next, the display controllerdisplays the medical image Gon the displayby superimposing the bounding box based on the integrated existence range information (step ST), and the processing ends.
As described above, in the present embodiment, for each of the plurality of tomographic images included in the medical image G, the existence range information for defining the spatial existence range of the structure in a direction intersecting the tomographic image is derived by using the derivation model, and the existence range information is integrated by using the transformerbased on the spatial positions of the plurality of tomographic images Dto D. Therefore, it is possible to accurately define the position of the structure in consideration of the position of the tomographic image.
It should be noted that, in the above-described embodiment, the existence range information indicating the existence range of the structure is derived from the plurality of tomographic images of the axial cross section included in the medical image G, but the present disclosure is not limited to this. The existence range information indicating the existence range of the structure may be derived by using the medical image Gincluding the tomographic images of a sagittal cross section and a coronal cross section in addition to the axial cross section.
For example, as shown in, bounding boxes Bto B, that is, existence range information μ, μ, and μmay be derived from each of a tomographic imageof the axial cross section, a tomographic imageof the sagittal cross section, and a tomographic imageof the coronal cross section by using the derivation model, the positional encoding may be performed to incorporate the position information p, p, and pof each of the tomographic imageof the axial cross section, the tomographic imageof the sagittal cross section, and the tomographic imageof the coronal cross section, then the existence range information μ, μ, and μmay be integrated by using the transformer, and the three-dimensional coordinates of the diagonal vertex of an integrated bounding box UBmay be derived as integrated existence range information Uμ. The tomographic imagestoare examples of tomographic images having different imaging directions and obtained for a plurality of tomographic planes that share three-dimensional absolute coordinate values according to the present disclosure.
In this case, the position information pto pof the tomographic plane in a case of performing the positional encoding is three-dimensional information, not one-dimensional information. Therefore, since the three-dimensional position information is incorporated into the existence range information, which is the six-dimensional feature vector, by the positional encoding, a nine-dimensional feature vector is input to the transformer, and the integrated existence range information Uμis derived.
In addition, in the above-described embodiment, the derivation unitderives the three-dimensional coordinates for defining the diagonal vertex of the bounding box in the tomographic images Dto Das the existence range information, but the present disclosure is not limited to this. The feature values for specifying the three-dimensional coordinates, which is derived in the derivation modelat a stage of calculation, may be derived. In this case, the integration unitintegrates the feature values by using the transformerto derive the integrated existence range information of the structure included in the medical image G.
Unknown
September 25, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.