Patentable/Patents/US-20250349116-A1

US-20250349116-A1

Computer-Implemented Systems and Methods for Intelligent Image Analysis Using Spatio-Temporal Information

PublishedNovember 13, 2025

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A computer-implemented method is provided for detecting at least one feature of interest in images captured with an imaging device. The method includes receiving an ordered set of images from the captured images, the ordered set of images being temporally ordered and analyzing one or more subsets of the ordered set of images using a local spatio-temporal processing module, the local spatio-temporal processing module being configured to determine the presence of characteristics related to the at least one feature of interest in each image of each subset of images and to annotate the subset of images based on the determined characteristics in each image of each subset of images. The method further includes processing a set of feature vectors of the ordered set of images using a global spatio-temporal processing module, the global spatio-temporal processing module being configured to refine the determined characteristics associated with each subset of images, and calculating one or more values for each image using a timeseries analysis module, the numerical value being representative of the at least one feature of interest and calculated using the refined characteristics associated each subset of images and spatio-temporal information. Still further, the method may include generating a report, a data or electronic file, integration into another reporting system or electronic medical records, and/or generating an electronic display on the at least one feature of interest using the multiple values associated with each image of each subset of the ordered set of images.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. A computer-implemented system for processing images for features of interest, comprising:

. (canceled)

. The system of, wherein the determined likelihood of characteristics in each image of the subset of images includes a float value between 0 and 1.

. (canceled)

. The system of, wherein the one or more processors are further configured to:

. The system of, wherein to refine the likelihood of the characteristics the one or more processors are further configured to:

. The system of, wherein to analyze the ordered set of images using the local spatio-temporal processing module to determine presence of characteristics the one or more processors are further configured to:

. The system of, wherein each quality score is an ordinal number between 0 and R, wherein a score 0 represents minimum quality and a score R represents maximum quality.

. The system of, wherein to process the ordered set of images using the global spatio-temporal processing module the one or more processors are further configured to:

. The system of, wherein to analyze the one or more subsets of the ordered set of images using the local spatio-temporal processing module to determine the presence of characteristics the one or more processors are further configured to:

. The system of, wherein to process the one or more subsets of the ordered set of images using the global spatio-temporal processing module the one or more processors are further configured to:

. The system of, wherein the numerical value associated with each image is interpretable to determine a probability to identify the at least one feature of interest within the image.

. The system of, wherein the one or more processors are further configured to:

. The system of, wherein a size of the subset of images is configurable by a user of the system.

. The system of, wherein a size of the subset of images is dynamically determined based on a requested feature of interest.

. The system of, wherein a size of the subset of images is dynamically determined based on the determined characteristics.

. The system of, wherein the one or more subsets of images include shared images.

. The system of, wherein the ordered set of images are received directly from the imaging device during a medical procedure.

. The system of, wherein the presences of at least one feature of interest is determined from a portion of the captured images.

. The system of, wherein the generated report on the at least one feature of interest is generated from the captured images during or right after a medical procedure.

. The system of, wherein the generated report of at least one feature of interest is provided in a predefined format to integrate it with another reporting system.

. The system of, wherein the generated report of at least one feature of interest includes at least one of a recommended action based on a medical guideline, a recommended action of a set of recommended actions based on medical guidelines, or another action performed during a medical procedure.

. A non-transitory computer readable medium including instructions that when executed by at least one processor, cause the at least one processor to perform operations to detect at least one feature of interest in images captured with an imaging device, the operations comprising:

. A computer-implemented method for detecting at least one feature of interest in images captured with an imaging device, the method comprising the following operations performed by at least one processor:

-. (canceled)

Detailed Description

Complete technical specification and implementation details from the patent document.

The present disclosure relates generally to the field of video processing and image analysis. More specifically, and without limitation, this disclosure relates to systems, methods, and computer-readable media for processing captured video content from an imaging device and performing intelligent image analysis, such as determining the presence of one or more features of interest or actions taken during a medical procedure. The systems and methods disclosed herein may be used in various applications, including for medical image analysis and diagnosis.

In video processing and image analysis systems, it is often desirable to detect objects or features of interest. A feature of interest may be a person, place, or thing. In some applications, such as systems and methods for medical image analysis, the location and classification of a detected feature of interest (e.g., an abnormality such as a formation on or of human tissue) is important for diagnosis of a patient. However, extant computer-implemented systems and methods suffer from a number of drawbacks, including the inability to accurately detect features of interest and/or recognize characteristics related to features of interest. In addition, extant systems and methods are inefficient and do not provide ways to analyze images intelligently, including with regard to the image sequence or presence of events.

Modern medical procedures require precise and accurate examination of a patient's body and organs. Endoscopy is a medical procedure aimed at providing a physician with video images of the internal parts of a patient's body and organs for diagnosis. In the gastrointestinal tract of the human body, the procedure can be performed by introducing a probe with a video camera through the mouth or anus of the patient. During an endoscopic procedure, a physician navigates manually the probe through the gastrointestinal tract while watching in real-time the video on a display device. The video may also be captured, stored, and examined after the endoscopic procedure. As an alternative, capsule endoscopy is a procedure where a capsule containing a small camera is swallowed to examine the gastrointestinal tract of a patient. The sequence of images taken by the capsule during its transit are transmitted wirelessly to a receiving device and stored for examination by the physician after completion of the procedure. The frame rate of capsule device can vary (e.g., 2 to 6 frames per second) and a large volume of images may be taken during an examination procedure.

From a computer vision perspective, the captured content from either a real-time video endoscopy or capsule procedure is a temporally ordered succession of images containing information about a patient, e.g., the internal mucosa of the gastrointestinal tract. Accurate and precise analysis of the captured image data is essential to properly examine the patient and identify lesions, polyps, or other features of interest. Also, there is usually a large number of images collected for each patient. One of the most important medical tasks that needs to be performed by the physician is the examination of this large set of images to make a proper diagnosis including with respect to the presence or absence of features of interest, such as pathological regions in the imaged mucosa. However, going through these images manually is time consuming and inefficient. As a result, the review process can lead to a physician making errors and/or making a misdiagnosis.

In order to improve diagnosis, decrease the time needed for medical image examination, and reduce the possibility of errors, the inventors have determined that it is desirable to have a computer-implemented system and method that is able to intelligently process images and identify the presence of a pathology or other features of interest within all images from a video endoscopy or capsule procedure, or other medical procedure. By way of example, a feature of interest may also include an action being taken on or in the images, an anatomical location or other location of interest in the images, a clinical index level of the images, and so on. Trained neural networks, spatio-temporal image analysis, and other features and techniques are disclosed herein for this purpose. As will be appreciated from this disclosure, the present invention and embodiments may be applied to a wide variety of image capture and analysis applications and are not limited to the examples presented herein.

Embodiments of the present disclosure include systems, methods, and computer-readable media for processing images captured from an imaging device and performing an intelligent image analysis, such as determining the presence of one or more features of interest. Systems and methods consistent with the present disclosure can provide benefits over extant systems and techniques, including by addressing one more of the above-referenced drawbacks and/or other shortcomings of extant systems and techniques. Consistent with some disclosed embodiments, systems, methods, and computer-readable media are provided for processing images from a video endoscopy or capsule procedure or other medical procedure, where the images are temporally ordered. Example embodiments include systems and methods that intelligently process captured images using spatio-temporal information to accurately assess the likelihood of the presence of an abnormality, a pathology, or other features of interest within the images. As a further example, a feature of interest can be a parameter or statistic related to an endoscopy or capsule procedure or other medical procedure. By way of example, a feature of interest of an endoscopy procedure may be a clean withdrawal time or time for traversal of a probe or a capsule through an organ. A feature of interest in an image may also be determined based on the presence or absence of characteristics related to that feature of interest. These and other embodiments, features, and implementations are described more fully herein. A feature of interest may be any feature in or related to one or more image, in particular in or related to a scene or field of view represented in one or more image, that is identifiable, or detectable, by analyzing the or each image. A feature of interest may for example be an object, or a location, or an action or a condition (e.g. a clinical index level).

In some embodiments, images captured by an imaging device, such as an endoscopy video camera or capsule camera, include images of a gastrointestinal tract or organ. The images may come from a medical imaging device used during, for example, a gastroscopy, a colonoscopy, or an enteroscopy. A feature of interest in the images may be an abnormality or other pathology, for example. The abnormality or pathology may comprise a formation on or of human tissue, a change in human tissue from one type of cell to another type of cell, an absence of human tissue from a location where the human tissue is expected, or a formation on or of human tissue. The formation may comprise a lesion, a polypoid lesion, or a non-polypoid lesion. Other examples of features of interest include an anatomical or other location, an action, a clinical index (e.g., cleanliness), and so on. Consequently, as will be appreciated from this disclosure, the example embodiments may be utilized in a medical context in a manner that is not specific to any single disease but may rather be generally applied.

According to one general aspect of the present disclosure, a computer-implemented system is provided for processing images captured by an imaging device. The computer-implemented system may include at least one processor configured to detect at least one feature of interest in images captured by an imaging device. The at least one processor may be configured to: receive an ordered set of images from the captured images, the ordered set of images being temporally ordered; analyze one or more subsets of the ordered set of images individually using a local spatio-temporal processing module, the local spatio-temporal processing module being configured to determine the presence of characteristics related to at least one feature of interest in each image of each subset of images and to annotate the subset images with a feature vector based on the determined characteristics in each image of each subset of images; process a set of feature vectors of the ordered set of images using a global spatio-temporal processing module, the global spatio-temporal processing module being configured to refine the determined characteristics associated with each subset of images, wherein each feature vector of the set of feature vectors includes information about each determined characteristic of the at least one feature of interest; and calculate a numerical value for each image using a timeseries analysis module, the numerical value being representative of the presence of at least one feature of interest and calculated using the refined characteristics associated each subset of images and spatio-temporal information. Further, the at least one processor may be configured to generate a report on the at least one feature of interest using the numerical value associated with each image of each subset of the ordered set of images. The report may be generated after the completion of the endoscopy or other medical procedure. The report may include information related to all features of interest identified in the processed images.

The at least one processor of the computer-implemented system may be further configured to determine a likelihood of characteristics related to at least one feature of interest in each image of the subset of images. Additionally, the at least one processor may be configured to determine the likelihood of characteristics in each image of the subset of images by encoding each image of the subset of the images and aggregating the spatio-temporal information of the determined characteristics using a recurrent neural network or a temporal convolution network.

To refine the determined characteristics, a non-causal temporal convolution network may be utilized. For example, the at least one processor of the system may be configured to refine the likelihood of the characteristics in each image of the subset of images by applying a non-causal temporal convolution network. The at least one processor may be further configured to refine the likelihood of the characteristics by applying one or more signal processing techniques including low pass filtering and/or Gaussian smoothing, for example.

According to a still further aspect, the at least one processor of the system may be configured to analyze the ordered set of images using the local spatio-temporal processing module to determine presence of characteristics by determining a vector of quality scores, wherein each quality score in the vector of quality scores corresponds to each image of the subset of the images. Additionally, the at least one processor may be configured to process ordered set of images using the global spatio-temporal processing module by refining quality scores of each image of the subset of images of the one or more subsets of the ordered set of images using signal processing techniques. The at least one processor may be further configured to analyze the one or more subsets of the ordered set of images using the local spatio-temporal processing module to determine the presence of characteristics by generating, using a deep convolutional neural network, a pixel-wise binary mask for each image of the subset of images. The at least one processor may be further configured to process the one or more subsets of the ordered set of images using the global spatio-temporal processing module by refining the binary mask for image segmentation using morphological operations exploiting prior information about the shape and distribution of the determined characteristics

As disclosed herein, implementations may include one or more of the following features. The determined likelihood of characteristics in each image of the subset of images may include a float value between 0 and 1. The quality score may be an ordinal number between 0 and R, wherein a score 0 represents minimum quality and a score R represents the maximum quality. The numerical value may be associated with each image is interpretable to determine the probability to identify the at least one feature of interest within the image. The output may be a first numerical value for an image where the at least one feature of interest is not detected. The output may be a second numerical value for an image where the at least one feature of interest is detected. The size or volume of the subset of images may be configurable by a user of the system. The size or volume of the subset of images may be dynamically determined based on a requested feature of interest. The size or volume of the subset of images may be dynamically determined based on the determined characteristics. The one or more subsets of images may include shared images.

Another general aspect of the present disclosure related to a computer-implemented system for spatio-temporal analysis of images captured with an imaging device. The computer-implemented system may comprise at least one processor configured to receive video captured from an imaging device including a plurality of image frames. The at least one processor may be further configured to: access a temporally ordered set of images from the captured images; detect, using an event detector module, an occurrence of an event in the temporally ordered set of images, wherein a start time and an end time of the event are identified by a start image frame and an end image frame in the temporally ordered set of images; select, using a frame selector module, an image from a group of images in the temporally ordered set of images, bounded by the start image frame and the end image frame, based on an associated score and a quality score of the image, wherein the associated score of the selected image indicates a presence of at least one feature of interest; merge a subset of images from the selected images based on a matching presence of the at least one feature of interest using an objects descriptor module, wherein the subset of images is identified based on spatial and temporal coherence using spatio-temporal information; and split the temporally ordered set of images in temporal intervals which satisfy the temporal coherence of a selected task.

According to the disclosed system, the at least one processor may be further configured to determine spatio-temporal information of characteristics related to the at least one feature of interest for subsets of images of the video content using a local spatio-temporal processing module and determine the spatio-temporal information of all images of the video content using a global spatio-temporal processing module. In addition, the at least one processor may be configured to split the temporally ordered set of images in temporal intervals by identifying a subset of temporally ordered set of images with the presence of the at least one feature of interest. The at least one processor may also be configured to identify a subset of temporally ordered set of images with the presence of the at least one future of interest by adding bookmarks to images in the temporally ordered set of images, wherein the bookmarked images are part of the subset of temporally ordered set of images. Additionally, or alternatively, the at least one processor may be configured to identify a subset of temporally ordered set of images with the presence of the at least one feature of interest by extracting a set of images from the subset of the temporally ordered set of images.

Implementations may include one or more of the following features. The extracted set of images may include characteristics related to the at least one feature of interest. The color may vary with a level of relevance of an image of the subset of temporally ordered set of images for the at least one feature of interest. The color may vary with a level of relevance of an image of the subset of temporally ordered set of images for characteristics related to the at least one feature of interest.

Another general aspect includes a computer-implemented system for performing a plurality of tasks on a set of images. The computer-implemented system may comprise at least one processor configured to receive video captured from an imaging device including a set of image frames. The at least one processor may be further configured to: receive a plurality of tasks, wherein at least one task of the plurality of tasks is associated with a request to identify at least one feature of interest in the set of images; analyze, using a local spatio-temporal processing module, a subset of images of the set of images to identify the presence of characteristics associated with the at least one feature of interest; and iterate execution of a timeseries analysis module for each task of the plurality of tasks to associate a numerical score for each task with each image of the set of images.

Consistent with the present disclosure, a system of one or more computers can be configured to perform operations or actions by virtue of having software, firmware, hardware, or a combination of them installed for the system that in operation causes or cause the system to perform those operations or actions. One or more computer programs can be configured to perform operations or actions by virtue of including instructions that, when executed by data processing apparatus (such as one or more processors), cause the apparatus to perform such operations or actions.

Systems and methods consistent with the present disclosure may be implemented using any suitable combination of software, firmware, and hardware. Implementations of the present disclosure may include programs or instructions that are machine constructed and/or programmed specifically for performing functions associated with the disclosed operations or actions. Still further, non-transitory computer-readable storage media may be used that store program instructions, which are executable by at least one processor to perform the steps and/or methods described herein.

It will be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosed embodiments.

Example embodiments are described below with reference to the accompanying drawings. The figures are not necessarily drawn to scale. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items or meant to be limited to only the listed item or items. It should also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.

In the following description, various working examples are provided for illustrative purposes. However, it will be appreciated that the present disclosure may be practiced without one or more of these details.

Throughout this disclosure there are references to “disclosed embodiments,” which refer to examples of inventive ideas, concepts, and/or manifestations described herein. Many related and unrelated embodiments are described throughout this disclosure. The fact that some “disclosed embodiments” are described as exhibiting a feature or characteristic does not mean that other disclosed embodiments necessarily share that feature or characteristic.

Embodiments described herein include non-transitory computer readable medium containing instructions that when executed by at least one processor, cause the at least one processor to perform a method or set of operations. Non-transitory computer readable mediums may be any medium capable of storing data in any memory in a way that may be read by any computing device with a processor to carry out methods or any other instructions stored in the memory. The non-transitory computer readable medium may be implemented as software, firmware, hardware, or any combination thereof. Software may preferably be implemented as an application program tangibly embodied on a program storage unit or computer readable medium consisting of parts, or of certain devices and/or a combination of devices. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine may be implemented on a computer platform having hardware such as one or more central processing units (“CPUs”), a memory, and input/output interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described in this disclosure may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU, whether or not such a computer or processor is explicitly shown. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit. Furthermore, a non-transitory computer readable medium may be any computer readable medium except for a transitory propagating signal.

The memory may include any mechanism for storing electronic data or instructions, including Random Access Memory (RAM), a Read-Only Memory (ROM), a hard disk, an optical disk, a magnetic medium, a flash memory, other permanent, fixed, volatile or non-volatile memory. The memory may include one or more separate storage devices collocated or disbursed, capable of storing data structures, instructions, or any other data. The memory may further include a memory portion containing instructions for the processor to execute. The memory may also be used as a working memory device for the processors or as a temporary storage.

Some embodiments may involve at least one processor. A processor may be any physical device or group of devices having electric circuitry that performs a logic operation on input or inputs. For example, the at least one processor may include one or more integrated circuits (IC), including application-specific integrated circuit (ASIC), microchips, microcontrollers, microprocessors, all or part of a central processing unit (CPU), graphics processing unit (GPU), digital signal processor (DSP), field-programmable gate array (FPGA), server, virtual server, or other circuits suitable for executing instructions or performing logic operations. The instructions executed by at least one processor may, for example, be pre-loaded into a memory integrated with or embedded into the controller or may be stored in a separate memory.

In some embodiments, the at least one processor may include more than one processor. Each processor may have a similar construction, or the processors may be of differing constructions that are electrically connected or disconnected from each other. For example, the processors may be separate circuits or integrated in a single circuit. When more than one processor is used, the processors may be configured to operate independently or collaboratively. The processors may be coupled electrically, magnetically, optically, acoustically, mechanically, or by other means that permit them to interact.

Embodiments consistent with the present disclosure may involve a network. A network may constitute any type of physical or wireless computer networking arrangement used to exchange data. For example, a network may be the Internet, a private data network, a virtual private network using a public network, a Wi-Fi network, a LAN or WAN network, and/or other suitable connections that may enable information exchange among various components of the system. In some embodiments, a network may include one or more physical links used to exchange data, such as Ethernet, coaxial cables, twisted pair cables, fiber optics, or any other suitable physical medium for exchanging data. A network may also include one or more networks, such as a private network, a public switched telephone network (“PSTN”), the Internet, and/or a wireless cellular network. A network may be a secured network or unsecured network. In other embodiments, one or more components of the system may communicate directly through a dedicated communication network. Direct communications may use any suitable technologies, including, for example, BLUETOOTH™, BLUETOOTH LE™ (BLE), Wi-Fi, near field communications (NFC), or other suitable communication methods that provide a medium for exchanging data and/or information between separate entities.

In some embodiments, machine learning networks or algorithms may be trained using training examples, for example in the cases described below. Some non-limiting examples of such machine learning algorithms may include classification algorithms, video classification algorithms, data regressions algorithms, image segmentation algorithms, temporal video segmentation algorithms, visual detection algorithms (such as object detectors, face detectors, person detectors, motion detectors, edge detectors, etc.), visual recognition algorithms (such as face recognition, person recognition, object recognition, etc.), speech recognition algorithms, action recognition algorithms, mathematical embedding algorithms, natural language processing algorithms, support vector machines, random forests, nearest neighbors algorithms, deep learning algorithms, artificial neural network algorithms, convolutional neural network algorithms, recursive neural network algorithms, linear machine learning models, non-linear machine learning models, ensemble algorithms, and so forth. For example, a trained machine learning network or algorithm may comprise an inference model, such as a predictive model, a classification model, a regression model, a clustering model, a segmentation model, an artificial neural network (such as a deep neural network, a convolutional neural network, a recursive neural network, etc.), a random forest, a support vector machine, and so forth. In some examples, the training examples may include example inputs together with the desired outputs corresponding to the example inputs. Further, in some examples, training machine learning algorithms using the training examples may generate a trained machine learning algorithm, and the trained machine learning algorithm may be used to estimate outputs for inputs not included in the training examples. The training may be supervised or non-supervised, or a combination thereof. In some examples, engineers, scientists, processes, and machines that train machine learning algorithms may further use validation examples and/or test examples. For example, validation examples and/or test examples may include example inputs together with the desired outputs corresponding to the example inputs, a trained machine learning algorithm and/or an intermediately trained machine learning algorithm may be used to estimate outputs for the example inputs of the validation examples and/or test examples, the estimated outputs may be compared to the corresponding desired outputs, and the trained machine learning algorithm and/or the intermediately trained machine learning algorithm may be evaluated based on a result of the comparison. In some examples, a machine learning algorithm may have parameters and hyper parameters, where the hyper parameters are set manually by a person or automatically by a process external to the machine learning algorithm (such as a hyper parameter search algorithm), and the parameters of the machine learning algorithm are set by the machine learning algorithm according to the training examples. In some implementations, the hyper-parameters are set according to the training examples and the validation examples, and the parameters are set according to the training examples and the selected hyper-parameters. The machine learning networks or algorithms may be further retrained based on any output.

Certain embodiments disclosed herein may include computer-implemented systems for performing operations or methods comprising a series of steps. The computer-implemented systems and methods may be implemented by one or more computing devices, which may include one or more processors as described herein, configured to process real-time video. The computing device may be one or more computers or any other devices capable of processing data. Such computing devices may include a display such as an LCD display, augmented reality (AR), or virtual reality (VR) display. However, the computing device may also be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a user device having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system and/or the computing device can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet. The computing device can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship between client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship with each other.

is a block diagram of an example intelligent detector system, consistent with embodiments of the present disclosure. As further disclosed herein, intelligent detector systemmay be a computer-implemented system and comprise of one or more convolutional neural networks (CNN) to process images from a medical procedure to identify requested features of interest in the images. Feature(s) of interest can be a pathology or a list of pathologies a physician is looking for in the images (e.g., to diagnosis a patient). By way of further example, a feature of interest may also include an action to be taken on or in the images, an anatomical location or other location of interest in the images, a clinical index level of the images, and so on. These and other examples are within the scope of the present disclosure. By way of example, an action may include actions taken by a physician during the medical procedure or as part of a subsequent procedure, including actions or procedures identified by systemas a result of a spatio-temporal review of images from the medical procedure. For example, an action may include a recommended action or procedure in accordance with a medical guideline, such as performing a biopsy, removing a lesion, or exploring/analyzing a surface/mucosa of an organ. The action or procedure may be identified based on the images captured and processed by intelligent detector system.

Intelligent detector systemmay receive as input a collection of temporally ordered images of a medical procedure, such as an endoscopy or colonoscopy procedure. Intelligent detector systemmay output a report or information including one or more numerical value(s) (e.g., score(s)) for each image. The numerical value(s) may relate to a medical category such as a particular pathology and provide information regarding the probability of the presence of the medical category within an image frame. The images processed by intelligent detector systemmay be images captured from a medical procedure that are stored in a database or memory device for subsequent retrieval and processing by intelligent detector system. In some embodiments, the output provided by intelligent detector systemresulting from processing the images may include a report with numerical score(s) assigned to the images and recommended next steps in accordance with medical guideline(s), for example. The report may be generated after the completion of the endoscopy or other medical procedure. The report may include information related to all features of interest identified in the processed images. Still further, in some embodiments, the output provided by intelligent detector systemmay include recommended action(s) to be performed by the physician (e.g., performing a biopsy, removing a lesion, exploring/analyzing the surface/mucosa of an organ, etc.) in view of an identified feature(s) of interest in the images from the medical procedure. During a medical procedure, intelligent detector systemmay directly receive the video or image frames from a medical image device, process the video or images frames, and provide during the procedure or right after the medical procedure (i.e. a short time interval, from no time to a few minutes) as feedback to the operator regarding performed action(s) by the operator, as well as a final report containing multiple measured variables, clinical indices and details about what was observed and/or in which anatomical location and/or how the operator behaved/acted during the medical procedure. Performed actions may include a recommended action or procedure in accordance with a medical guideline, such as performing a biopsy, removing a lesion, or exploring/analyzing a surface/mucosa of an organ. In some embodiments, a recommended action may be part of a set of recommended actions based on medical guidelines. A detailed description of an example computer system implementing intelligent detector systemfor real-time processing is presented indescription below.

As disclosed herein, intelligent detector systemmay generate a report after completion of a medical procedure that includes information based on the processing of the captured video by local spatio-temporal processing module, global spatio-temporal processing module, and time series analysis module. The report may include information related to the features of interest identified during the medical procedure, as well as other information such as numerical value(s) or score(s) for each image. As explained, the numerical value(s) may relate to a medical category such as a particular pathology and provide information regarding the probability of the presence of the medical category within an image frame. Further details regarding intelligent detector systemand the operations of local spatio-temporal processing module, global spatio-temporal processing module, and timeseries analysis moduleare provided below with reference to the attached drawings.

In some embodiments, the report generated by systemmay include additional recommended action(s) based on the processing of stored images from a medical procedure or real-time processing of images from the medical procedure. Additional recommended action(s) could include actions or procedures that could have been performed during a medical procedure and actions or procedures to be performed after the medical procedure. Additional recommended action(s) may be part of a set of recommended action(s) based on medical guidelines. Further, as described above, systemmay process video in real-time to provide concurrent feedback to an operator about what is happening or identified in the video and during a medical procedure.

The output generated by intelligent detector systemmay include a dashboard display or similar report (see, e.g.,). The dashboard may provide a summary report of the medical procedure, for example, an endoscopy or colonoscopy. The dashboard may provide quality scores and/or other information for the procedure. The scores and/or other information may summarize the examination behavior of the healthcare professional and provide information for identified features of interest, such as the number of identified polyps. In some embodiments, the information generated by systemis provided and displayed as an overlay on the video from the medical procedure and thus an operator can view the information as part of an augmented video feed during or right after the end of the medical procedure. This information may be provided with some or no delay.

Intelligent detector systemmay also generate reports in the form of an electronic file, a set of data, or data transmission. By way of example, the output generated by systemmay follow a standardized format and/or be integrated into records such as electronic health records (EHR). The output of systemmay also be compliant with regulations such as HIPAA for interoperability and privacy. In some embodiments, the output may be integrated into other reports. For example, the output of intelligent detector systemmay be integrated into an electronic medical or health record for a patient. Intelligent detector systemmay include an API to facilitate such integration and/or provide output in the form of a standardized data set or template. Standardized templates may include predefined forms or tables that can be filled with data values generated by intelligent detector systemby processing input video or image frames from a medical procedure. In some embodiments, reports may be generated by systemin a machine-readable format, such as an XML file, to support their transmission or storage, as well as integration with other systems and applications. In some embodiments, reports may be provided in other formats such as a Word, Excel, HTML, or PDF file format. In some embodiments, intelligent detector systemmay upload data or a report to a server or database over a network (see, e.g., networkin). Intelligent detector systemmay also transfer to a server or database by making an API call and transmitting output data or a formatted report, for example, as a JSON document.

illustrates an example dashboardwith an output summarygenerated using an intelligent detector system (such as intelligent detector system), consistent with embodiments of the present disclosure. Using the modules of intelligent detector system, output summarymay be generated for the procedure. Output summarymay provide quality scores and/or other information such as the procedure time, the withdrawal time, and the clean withdrawal time. Further examples of information that may be part of summaryinclude the time spent performing specific actions (such as recommended action(s) discussed above) and the time spent in distinct anatomical locations. Information related to features of interest, such as polyps, may also be provided. For example, timeseries analysis modulemay generate a summary of number of polyps identified based on characteristics observed in the image frames. Timeseries analysis modulemay aggregate the information generated by processing images of input video using local and global spatio-temporal processing modulesand. Summary dashboardmay also include visual descriptions of features of interest identified by intelligent detector system. For example, selected frames of video of a procedure may be augmented with markings such as green bounding box about the location of each identified feature of interest, as shown in frames,, and. The frames may be related to different examined portions of the colon, such as the ileocaecal valve, foramen, and triradiate fold, which themselves may be features of interest requested by a user of intelligent detector system. Although the example ofillustrates the information for a procedure being displayed as part of a single dashboard, multiple dashboards may be generated with output summaries for each of portion of the colon or other organ examined as part of the medical procedure. In some embodiments, combined scores or values are generated based on inputs received as multiple vectors (e.g., image score vectors) generated by local and global spatio-temporal processing modulesand.

As disclosed above, a feature of interest may relate to a medical category or pathology. Intelligent detector systemmay be implemented to handle a request to detect one or more medical categories (i.e., one or more features of interest) in the images. In the case of multiple features of interest, one instance of the components of intelligent detector systemmay be implemented for each medical category or feature of interest. As will be appreciated from this disclosure, instances of intelligent detector systemmay be implemented with any combination of hardware, firmware, and software, according to the speed or through-put needs, volume of images to be processed, and other requirements of the system.

In some embodiments, a single instance of intelligent detector systemmay output multiple numerical values for each image, one for each medical category. In one example embodiment, pathologies detected by intelligent detector systemmay include detecting polyps in the colon mucosa. Further, by way of example, intelligent detector systemmay output a numerical value (e.g., 0) for all images among the input images where a polyp is not detected by intelligent detector systemand may output another numerical value (e.g., 1) for all images among the input images where the intelligent detector detects at least one polyp. In some embodiments, the numerical values can be arranged relative to a range or scale and/or indicate the probability of the presence of a polyp or other feature of interest.

The source of the input images may vary according to the imaging device, memory device, and/or needs of the application. For example, intelligent detector systemmay be configured to process a video feed directly from a video endoscopy device and receive temporally ordered input images that are subsequently processed by the system, consistent with the embodiments disclosed herein. As a further example, intelligent detector systemmay be configured to receive the input images from a database or memory device, the stored images being temporally ordered and previously captured using an imaging device, such as a video camera of an endoscopy device or a camera of a capsule device. Images received by intelligent detector systemmay be processed and analyzed to identify one or more features of interest, such as one or more types of polyps or lesions.

The example system ofmay be implemented in various environments and for various applications. For example, the captured input images may be stored in a local database or memory device or they be accessed and received by intelligent detector systemover a network from a remote storage location, such as cloud storage. Intelligent detector systemmay also be configured to process a streaming video feed from a current medical procedure and to process the input images are they are collected from the feed (e.g., via pre-processing and buffering). Further, the operation of intelligent detector systemmay be programmed or triggered to start upon one or more conditions. For example, intelligent detector systemmay be configured to analyze input images directly upon receiving it (e.g., via a video feed or a set of stored input images from memory) or upon receiving commands from a user. The output of intelligent detector systemmay also be configured as desired. For example, as previously discussed, intelligent detector systemmay analyze input images for one or more features of interest and generate a report indicating the presence of the one or more features of interest in the processed images. The report may take the form of an electronic file, a graphical display, and/or electronic transmission of data. As will be appreciated, other outputs and report formats are within the scope of the present disclosure. In some embodiments, reports of different formats may be preconfigured and used as templates for generating reports by filling the template with values generated by intelligent detector system. In some embodiments, reports are formatted to be integrated into other reporting systems, such as electronic medical records (EMRs). The report format may be a machine-readable format such as XML or Excel for integrating with other reporting systems.

By way of example, intelligent detector systemmay process a recorded video or images and provide a fully automated report and/or other output that details the feature(s) of interest observed in the processed images. Intelligent detector systemmay use artificial intelligence or machine learning components to efficiently and accurately process the input images and make decision about the presence of features of interest based on image analysis and/or spatio-temporal information. Further, for each feature of interest that is requested or under investigation, intelligent detector systemcan estimate its presence within the images and provide a report or other output with information indicating the likelihood of the presence of that feature and other details, such as the relative time from the beginning of the procedure or sequence of images where the feature of interest appears, estimated anatomical location, duration, most significant images, location within these images, and/or number of occurrences.

In one embodiment, intelligent detector systemmay be configured to automatically determine the presence of gastrointestinal pathologies without the aid of a physician. As discussed above, the input images may be captured and received in different ways and using different types of imaging devices. For example, a video endoscopy device or capsule device or other medical device or other imaging device may record and provide the input images. The input images may be part of a live video feed or may be part of stored set of images received from a local or remote storage location (e.g., a local database or cloud storage). Intelligent detector systemmay be operated as part of a procedure or service at a clinic or hospital, or it may be provided as an online or cloud service for end users to enable self-diagnostics or remote testing.

By way of example, to start an examination procedure, a user may ingest a capsule device or pill cam. The capsule device may include an imaging device and during the procedure wirelessly transmit images of the user's gastrointestinal tract to a smartphone, tablet, laptop, computer, or other device (e.g., user device). The captured images may then be uploaded by a network connection to a database, cloud storage or other storage device (e.g., image source). Intelligent detector systemmay receive the input images from the image source and analyze the images for one or more requested feature(s) of interest (e.g., polyps or lesions). A final report may then be electronically provided as output to the user and/or their physician. The report may include a scoring or probability indicator for each observed feature of interest and/or other relevant information or medical recommendations. Additionally, intelligent detector systemcan detect pathophysiological characteristics that are related to and an indicator of a feature of interest and score those characteristics that are determined to be present. Examples of such characteristics include bleeding, inflammation, ulceration, neoplastic tissues, etc. Further, in response to detected feature(s) of interest, the report may include information or recommendations based on medical guidelines, such as recommendations to consult with a physician and/or to take additional diagnostic examinations, for example. One or more actions may also be recommended to the physician (e.g., perform a biopsy, remove a lesion, explore/analyze the surface/mucosa of an organ, etc.) based on the analysis of the images by intelligent detector systemeither in real-time with the medical procedure or after the medical procedure is completed.

As another example, intelligent detector systemcould assist a physician or specialist with analyzing the video content recorded during a medical procedure or examination. The captured images may be part of the video content recorded during, for example, a gastroscopy, a colonoscopy, or an enteroscopy procedure. Based on the analysis performed by intelligent detector system, the full video recording could be displayed to the physician or specialist along with a colored timeline bar, where different colors correspond to different feature(s) of interest and/or scores for the identified feature(s) of interest.

As a still further example, a physician, specialist, or other individual could use intelligent detector systemto create a synopsis of the video recording or set of images by focusing on images with the desired feature(s) of interest and discarding irrelevant image frames. Intelligent detector systemmay be configured to allow a physician or user to tune or select the feature(s) of interest for detection and the duration of each synopsis based on a total duration time and/or other parameters, such as preset preceding and trailing times before and after a sequence of frames with the selected feature(s) of interest. Intelligent detector systemcan also be configured to combine all or the most relevant frames according to the requested feature(s) of interest.

As illustrated in, intelligent detector systemmay include a local spatio-temporal processing module, a global spatio-temporal processing module, a timeseries analysis module, and a task manager. These components may be implemented through any suitable combination of hardware, software, and/or firmware. Further, the number and arrangement of these components may be modified and it will be appreciated that the example embodiment ofis provided for purposes for illustration and does not limited the scope of the invention and its embodiments. Further example features and details related to these components is provided below, including with respect toand.

Referring again to the example embodiment of, local spatio-temporal processing modulemay be configured to provide a local perspective by processing subset(s) of images of an input video or set of input images. Local spatio-temporal processing modulemay select subset(s) of images and process the images to generate scores based on the determined presence of characteristics related to one or more features of interest. For example, assume an endoscopy input video V includes a collection of T image frames. Characteristics may define the features of interest requested by a user of intelligent detector system. For example, characteristics may include physical and/or biological aspects, such as size, orientation, color, shape, etc. of a feature of interest. Characteristics may also include metadata such as data identifying a portion of a video or time period in a video. For example, characteristics of a colonoscopy procedure video may identify portion(s) of the colon, such as ascending, transverse, or descending. In another example, characteristics may relate to one or more portions of an endoscopy procedure video, such as the amount of motion in the images, the presence of an instrument, or the duration of a segment with reduced motion. Characteristics defining content may also indicate the behavior of a physician, clinician, or other individual performing a medical procedure. For example, portions of the video with the longest pauses with no movement or greatest time exploring the surface of an organ. In some embodiments, characteristics may be a feature of interest. For instance, features of interest and characteristics of a colonoscopy procedure video may be a portion of colon, such as ascending, transverse, or descending.

Local spatio-temporal processing modulemay be configured to process the whole input video in chunks by iterating over sequential batches or subsets of N image frames. Local spatio-temporal processing modulemay also be configured to provide output that includes vectors or quality scores representing the determined characteristics of the feature(s) of interest in each image frame. In some embodiments, local spatio-temporal processing modulemay output quality values and segmentation maps associated with each image frame. Further example details related to local spatio-temporal processing moduleare provided below with reference to theembodiment.

The subset of images processed by local spatio-temporal processing modulemay include shared or overlapping images. Further, the size or arrangement of the subset of images may be defined or controlled based on one or more factors. For example, the size or volume of the subset of images may be configurable by a physician or other user of the system. As a further example, local spatio-temporal processing modulemay be configured so that the size or volume of the subset of images is dynamically determined based on the requested feature(s) of interest. Additionally, or alternatively, the size of the subset of images may be dynamically determined based on the determined characteristics related to the requested feature(s) of interest.

Global spatio-temporal processing modulemay be configured to provide a global perspective by processing all subset(s) of images analyzed by local spatio-temporal processing module. For example, global spatio-temporal processing modulemay process the whole input video or set of input images by processing all outputs of local spatio-temporal processing moduleat once or together. Further, global spatio-temporal processing modulemay be configured to provide output that includes numerical scores for each image frame by processing the vectors of determined characteristics related to the feature(s) of interests. In some embodiments, global spatio-temporal processing modulemay process the images and vectors and output refined quality scores and segmentation maps of each image. Further example details related to global spatio-temporal processing moduleare provided below with reference to theembodiment.

Patent Metadata

Filing Date

Unknown

Publication Date

November 13, 2025

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search