Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A video image processing method for adding privacy effects to a display video image of a scene, the method comprising: obtaining a plurality of input video images of the scene from a corresponding plurality of video cameras that are spaced apart; determining a region of interest, including an image of an object, in each of the plurality of input video images to define a foreground and a background, the foreground including the image of the object; rectifying the plurality of input video images based on the image of the object in each input video image to produce a plurality of rectified video images having aligned foregrounds and having backgrounds that are not aligned due to relative spatial relationships of the plurality of video cameras corresponding to the plurality of input video images; creating the display video image by combining the plurality of rectified video images, the foreground of the display video image being in-focus and the background of the display video image appearing to be blurred due to the backgrounds of the rectified video images not being aligned; and transmitting the display video image for display.
This invention relates to video image processing for enhancing privacy in surveillance or monitoring systems. The problem addressed is the need to obscure background details in video footage while keeping foreground objects in focus, ensuring privacy for individuals or sensitive areas while maintaining visibility of relevant subjects. The method involves using multiple video cameras spaced apart to capture a scene from different angles. Each camera generates input video images of the scene, which are processed to identify regions of interest containing an object, such as a person or vehicle. These regions define the foreground, while the remaining areas form the background. The input video images are then rectified to align the foreground objects across all images, but the backgrounds remain misaligned due to the cameras' spatial relationships. The rectified images are combined to create a display video image where the foreground appears sharp and in-focus, while the background appears blurred due to the misalignment of the background regions. This effect is achieved without requiring depth sensors or additional hardware, relying solely on the natural parallax from multiple camera views. The processed display video image is then transmitted for display, providing a privacy-enhanced output where sensitive background details are obscured while the foreground remains clear.
2. The method of claim 1 , further comprising receiving an indication of an amount of blur to apply to the display video image, wherein the obtaining of the plurality of input video images includes selecting a number of input video images as the plurality of input video images in response to the indication of the amount of blur.
This invention relates to video processing techniques for generating a display video image with controlled blur effects. The problem addressed is the need to dynamically adjust the level of blur in a video output based on user preferences or application requirements, while maintaining visual quality and computational efficiency. The method involves capturing or receiving multiple input video images, which are then processed to produce a single display video image with a desired blur effect. The key innovation is the ability to select the number of input video images used in the processing step based on an external indication of the desired blur amount. For example, a higher blur level may require more input images to be averaged or blended, while a lower blur level may use fewer images. This adaptive selection ensures that the blur effect is both visually accurate and computationally efficient, as the system avoids unnecessary processing when a subtle blur is sufficient. The method may also include additional steps such as aligning the input video images to correct for motion or distortion before blending them, and adjusting the blending weights to achieve a smooth, natural-looking blur. The overall approach provides a flexible way to control blur in video applications, such as in photography, video editing, or real-time rendering systems.
3. The method of claim 1 , further comprising receiving an indication of an amount of blur to apply to the display video image, wherein the obtaining of the plurality of input video images includes selecting the plurality of input video images based on the relative spatial relationships of the cameras corresponding to the plurality of input video images in response to the indication of the amount of blur.
This invention relates to video processing techniques for generating a display video image from multiple input video images captured by spatially distributed cameras. The problem addressed is the need to control the visual blur effect in the final display image based on the spatial arrangement of the cameras and user preferences. The method involves capturing multiple input video images from different cameras positioned at various locations. These images are processed to generate a single display video image. A key feature is the ability to receive an indication of a desired blur amount for the display image. Based on this indication, the system selects specific input video images from the available cameras, prioritizing those with spatial relationships that align with the requested blur effect. For example, if a high blur effect is desired, cameras positioned farther apart may be selected to enhance the blurring, while a low blur effect may use cameras closer together. The selection process ensures that the chosen input images collectively produce the desired visual effect in the final display. This approach allows dynamic adjustment of blur based on user input while leveraging the spatial diversity of the camera setup.
4. The method of claim 1 , further comprising: providing one video image of the plurality of input video images for display; wherein determining the region of interest includes receiving an indication of a portion of the one video image as an indication of the image of the object in the region of interest.
This invention relates to video processing systems that analyze multiple input video images to identify and track objects of interest. The problem addressed is the need for efficient and accurate object detection in video streams, particularly when multiple cameras or video sources are involved. The invention provides a method for selecting a region of interest within a video image, where the region contains an object to be tracked or analyzed. The method involves displaying one video image from a set of input video images and allowing a user or automated system to select a portion of that image as the region of interest. This selection serves as an indication of the object's location within the region. The system then processes the selected region to identify and track the object across subsequent frames or other video images. The method may also include additional steps such as analyzing the region of interest for object characteristics, comparing it to other regions, or adjusting the region based on movement or changes in the object's appearance. The invention improves object detection accuracy by allowing precise user input or automated selection of the region containing the object, reducing false positives and enhancing tracking performance in dynamic environments.
5. The method of claim 1 , wherein: the obtaining of the plurality of input video images includes receiving the plurality of input video images from a client device via a network; and the transmitting of the display video image for display includes transmitting the display video image to the client device via the network.
This invention relates to a system for processing and transmitting video images over a network. The technology addresses the challenge of efficiently capturing, processing, and delivering video content from a client device to a remote server and back, ensuring real-time or near-real-time display. The method involves obtaining multiple input video images from a client device via a network connection. These images are then processed to generate a display video image, which is subsequently transmitted back to the client device for display. The processing may include operations such as image enhancement, compression, or format conversion to optimize the video for transmission and display. The system ensures seamless interaction between the client device and the server, enabling applications like video conferencing, remote monitoring, or augmented reality where low-latency and high-quality video delivery are critical. The network-based transmission allows for scalable and flexible deployment across different devices and environments.
6. The method of claim 1 , further comprising: receiving data representing a plurality of audio signals corresponding to a voice of a presenter; wherein: the determining of the region of interest includes determining the region of interest using sound source localization based on the plurality of audio signals; and the image of the object includes an image of the presenter.
This invention relates to audio-visual systems for identifying and tracking a presenter in a scene using sound source localization. The system addresses the challenge of accurately determining a region of interest in an image or video stream where a presenter is located, particularly in environments with multiple audio sources or background noise. The method involves capturing multiple audio signals from different microphones to localize the sound source, which is then used to identify the presenter's position. The system processes these audio signals to determine the direction and distance of the sound source, enabling precise localization of the presenter. The visual component then captures an image of the presenter based on this localized audio data. This approach improves the accuracy of presenter tracking by combining audio and visual information, reducing reliance on visual-only tracking methods that may fail in low-light or cluttered environments. The system is particularly useful in applications such as video conferencing, live broadcasts, or automated camera systems where dynamic tracking of a speaker is required. By integrating sound source localization with image capture, the invention enhances the reliability of presenter detection in real-time scenarios.
7. The method of claim 1 , wherein determining the region of interest includes processing at least two of the plurality of input video images to estimate a depth of the region of interest, wherein the background includes objects in the plurality of input video images having depths greater that the estimated depth of the region of interest.
This invention relates to video processing techniques for identifying and isolating regions of interest in a scene. The problem addressed is accurately distinguishing foreground objects from background elements in video images, particularly in dynamic environments where depth perception is critical for proper segmentation. The method involves capturing multiple input video images of a scene and analyzing them to determine a region of interest. To enhance accuracy, the method processes at least two of these images to estimate the depth of the region of interest. This depth estimation helps differentiate foreground objects from background elements, where background objects are defined as those with greater depths than the estimated depth of the region of interest. The technique leverages depth information to improve segmentation, ensuring that only relevant foreground regions are isolated while excluding background clutter. This approach is particularly useful in applications like video surveillance, augmented reality, and autonomous navigation, where precise object isolation is essential for further analysis or interaction. The method may also include additional steps such as motion tracking or feature extraction to refine the region of interest identification.
8. An image processing system for adding privacy effects to a display video image, the system comprising: a processor; and a memory storing computer executable instructions configured to cause the processor to: obtain a plurality of input video images of a scene from a corresponding plurality of video cameras that are spaced apart; determine a region of interest, including an image of an object, in each of the plurality of input video images to define a foreground and a background, the foreground including the image of the object; rectify the plurality of input video images based on the image of the object in each input video image to produce a plurality of rectified video images having aligned foregrounds and having backgrounds that are not aligned due to relative spatial relationships of the plurality of video cameras corresponding to the plurality of input video images; create the display video image by combining the plurality of rectified video images, the foreground of the display video image being in-focus and the background of the display video image appearing to be blurred due to backgrounds of the rectified video images not being aligned; and transmit the display video image for display.
This invention relates to image processing systems designed to enhance privacy in video surveillance by selectively blurring background regions while keeping foreground objects in focus. The system addresses the challenge of maintaining privacy in multi-camera surveillance environments where multiple cameras capture overlapping scenes. The system uses a processor and memory to execute instructions for processing video feeds from multiple spatially separated cameras. First, it obtains input video images from the cameras, each capturing a scene from a different perspective. The system then identifies a region of interest in each image, which includes an object of interest, separating it from the background. The images are rectified to align the foreground regions while intentionally leaving the backgrounds misaligned due to the cameras' spatial relationships. The rectified images are combined to form a display video image, where the foreground appears sharp and in-focus, while the background appears blurred due to the misalignment of background regions. This effect is achieved without requiring depth information or complex post-processing. The final display video image is transmitted for output, providing a privacy-enhanced view where sensitive background details are obscured while maintaining visibility of the foreground object. The system leverages multi-camera setups to create a depth-like effect through alignment techniques, enhancing privacy without additional hardware.
9. The image processing system of claim 8 , wherein: the computer executable instructions are further configured to cause the processor to receive an indication of an amount of blur to apply to the display video image; and the computer executable instructions that cause the processor to obtain the plurality of input video images include instructions that cause the processor to select a number of input video images as the plurality of input video images in response to the indication of the amount of blur.
This invention relates to image processing systems designed to reduce motion blur in displayed video images. The problem addressed is the visual distortion caused by motion blur, which degrades the quality of video displays, particularly in fast-moving scenes. The system processes multiple input video images to generate a display video image with reduced blur. The system includes a processor and computer-executable instructions that cause the processor to obtain a plurality of input video images. These input images are captured at different times and represent different states of motion. The system then processes these images to generate a display video image with reduced blur. The processing involves analyzing the input images to determine motion characteristics and combining them in a way that minimizes blur artifacts. A key feature is the ability to adjust the amount of blur reduction based on user input. The system receives an indication of the desired blur level and selects the number of input video images accordingly. More input images are used for greater blur reduction, while fewer images are used for less reduction. This adaptability allows the system to balance between motion clarity and computational efficiency. The system dynamically adjusts the processing based on the input, ensuring optimal performance for different video content and display requirements.
10. The image processing system of claim 8 , wherein: the computer executable instructions are further configured to cause the processor to receive an indication of an amount of blur to apply to the display video image; and the computer executable instructions that cause the processor to obtain the plurality of input video images include instructions that cause the processor to select the plurality of input video images based on the relative spatial relationships of the video cameras corresponding to the selected plurality of input video images in response to the indication of the amount of blur.
This invention relates to image processing systems for generating display video images from multiple input video images captured by spatially distributed video cameras. The system addresses the challenge of creating visually coherent composite images from multiple camera feeds, particularly when simulating depth effects or motion blur. The system includes a processor executing instructions to obtain a plurality of input video images from different video cameras, where the selection of these images is based on the spatial relationships between the cameras. The system further processes these images to generate a display video image, which may include applying blur effects. A key feature is the ability to receive an indication of the desired blur amount and dynamically select input video images based on this parameter, ensuring that the spatial arrangement of the cameras aligns with the intended blur effect. This allows for realistic depth perception and motion effects in the final output. The system optimizes image selection to enhance visual coherence, particularly when simulating motion or depth variations in the display video image.
11. The image processing system of claim 8 , further comprising computer executable instructions that cause the processor to: provide one video image of the plurality of input video images for display; wherein the computer executable instructions that cause the processor to determine the region of interest include instructions that cause the processor to receive an indication of a portion of the one video image as an indication the image of the object in the region of interest.
This invention relates to an image processing system designed to enhance video analysis by identifying and focusing on regions of interest within multiple input video streams. The system addresses the challenge of efficiently processing and displaying relevant portions of video data, particularly in applications requiring real-time monitoring or object tracking. The system processes a plurality of input video images and includes a processor configured to execute computer instructions for analyzing these images. A key feature is the ability to determine a region of interest within the video frames, which is defined as an area containing an object of interest. The system further includes instructions to display one selected video image from the input streams, allowing users to interact with the displayed image. Users can provide an indication, such as a selection or input, specifying a portion of the displayed video image that corresponds to the object in the region of interest. This interaction helps refine the system's focus on the relevant area, improving accuracy in subsequent processing steps. The system may also include additional components, such as a memory for storing video data and a display for presenting the processed images. The instructions for determining the region of interest may involve analyzing the input video images to identify the object and its location within the frames. The user-provided indication helps confirm or adjust the system's automatic detection, ensuring the region of interest is correctly identified for further analysis or display. This approach enhances the system's usability and effectiveness in applications like surveillance, medical imaging, or autonomous navigation.
12. The image processing system of claim 8 , wherein: the computer executable instructions that cause the processor to obtain the plurality of input video images include instructions that cause the processor to receive the plurality of input video images from a client device via a network; and the computer executable instructions that cause the processor to transmit the display video image for display include instructions that cause the processor to transmit the display video image to the client device via the network.
This invention relates to an image processing system designed to enhance video transmission over a network. The system addresses the challenge of efficiently processing and transmitting video data between a server and a client device while maintaining high-quality visual output. The system includes a processor and a memory storing executable instructions. The instructions cause the processor to obtain multiple input video images from a client device via a network, process these images to generate a display video image, and then transmit the processed display video image back to the client device over the same network. The processing may involve operations such as image enhancement, compression, or format conversion to optimize the video for display. The system ensures real-time or near-real-time video transmission, making it suitable for applications like video conferencing, remote monitoring, or streaming services. By handling both the reception and transmission of video data over a network, the system enables seamless bidirectional communication between the server and client device, improving the overall user experience. The invention focuses on the network-based exchange of video data, ensuring efficient and high-quality video delivery in distributed computing environments.
13. The image processing system of claim 8 , further comprising: computer executable instructions that cause the processor to receive data representing a plurality of audio signals corresponding to a voice of a presenter; wherein: the computer executable instructions that cause the processor to determine the region of interest include instructions that cause the processor to determine the region of interest using sound source localization based on the plurality of audio signals; and the image of the object includes an image of the presenter.
This invention relates to an image processing system that enhances video content by dynamically determining and focusing on a region of interest, particularly in scenarios involving a presenter. The system addresses the challenge of automatically identifying and tracking relevant visual elements in a video, such as a speaker, to improve clarity and engagement. The system includes a processor executing instructions to analyze audio signals from multiple sources to localize the sound source, which corresponds to the presenter's voice. Using sound source localization techniques, the system determines the spatial position of the presenter and defines a region of interest around that position. The system then captures or processes an image of the presenter within this region, ensuring the presenter remains the focal point of the video. This approach leverages audio cues to dynamically adjust the visual framing, improving the accuracy and relevance of the captured content. The system may also include additional features, such as adjusting the region of interest based on environmental factors or user preferences, to further optimize the presentation. By integrating audio analysis with image processing, the system provides a more intuitive and responsive way to highlight key subjects in video recordings.
14. The image processing system of claim 8 , wherein: the computer executable instructions that cause the processor to determine the region of interest include instructions that cause the processor to estimate a depth of the region of interest based on at least two input video images of the plurality of input video images; and the background includes objects in the plurality of input video images having depths greater that the estimated depth of the region of interest.
This invention relates to image processing systems designed to enhance video content by isolating and processing regions of interest (ROI) while distinguishing them from background elements. The system addresses the challenge of accurately identifying and separating foreground objects from background clutter in video frames, which is critical for applications like video editing, surveillance, and augmented reality. The system processes multiple input video images to determine a region of interest by estimating its depth relative to other objects in the scene. Depth estimation is performed using at least two input video images, allowing the system to analyze spatial relationships between objects. Objects with depths greater than the estimated depth of the ROI are classified as background elements, enabling selective processing of the foreground region. This approach improves computational efficiency and accuracy in isolating dynamic or static foreground objects from static or dynamic backgrounds. The system may also include additional features such as generating a depth map, applying image processing techniques to the ROI, and dynamically adjusting the background based on depth variations. By leveraging depth information, the system ensures that the ROI remains distinct from background objects, even in complex scenes with overlapping or partially occluded elements. This enhances the quality of video analysis, editing, and real-time applications where precise foreground-background separation is essential.
15. An apparatus for adding privacy effects to a display image of a scene comprising: means for obtaining a plurality of input video images of the scene from a corresponding plurality of video cameras that are spaced apart; means for determining a region of interest, including an image of an object, in each of the plurality of input video images to define a foreground and a background, the foreground including the image of the object; means for rectifying the plurality of input video images based on the image of the object in each input video image to produce a plurality of rectified video images having aligned foregrounds and having backgrounds that are not aligned due to relative spatial relationships of the plurality of video cameras corresponding to the plurality of input video images; means for creating the display video image by combining the plurality of rectified video images, the foreground of the display video image being in-focus and the background of the display video image appearing to be blurred due to the backgrounds of the rectified video images not being aligned; and means for transmitting the display video image for display.
This invention relates to a system for enhancing privacy in video displays by creating a depth-of-field effect that blurs background elements while keeping foreground objects in focus. The system addresses the challenge of maintaining privacy in surveillance or monitoring applications where sensitive information in the background may be visible. The apparatus uses multiple video cameras spaced apart to capture a scene from different angles. Each camera generates an input video image, and the system identifies a region of interest in each image, which includes an object of focus. The foreground containing this object is aligned across all input images, while the background remains misaligned due to the cameras' spatial separation. The system then combines these rectified images to produce a display video image where the foreground appears sharp and in-focus, while the background appears blurred due to the misalignment of background elements. This effect mimics the depth-of-field of a camera lens, obscuring potentially sensitive background details while preserving the clarity of the foreground subject. The processed display video image is then transmitted for display. The system leverages multi-camera setups to dynamically adjust privacy effects without requiring manual intervention or additional hardware.
16. The apparatus of claim 15 , further comprising: means for receiving an indication of an amount of blur to apply to the display video image, wherein the means for obtaining the plurality of input video images includes means for selecting a number of input video images as the plurality of input video images in response to the indication of the amount of blur.
This invention relates to video processing systems that apply controlled blur effects to displayed video images. The problem addressed is the need for dynamic adjustment of blur intensity in video displays, particularly in applications where visual clarity must be balanced with privacy or aesthetic effects. The apparatus includes a mechanism to receive user input specifying the desired blur level, which then determines the number of input video images used to generate the final output. A higher blur level results in more input images being combined, increasing the blur effect, while a lower level uses fewer images for a sharper output. The system processes multiple input video frames to create a composite image with the specified blur, allowing real-time adjustment of visual smoothness without requiring manual frame-by-frame editing. This approach is useful in privacy filters, artistic video effects, or medical imaging where controlled blurring enhances usability. The invention ensures that the blur effect is consistent and adjustable based on user preferences or application requirements.
17. The apparatus of claim 15 , further comprising: means for receiving an indication of an amount of blur to apply to the display video image, wherein the means for obtaining the plurality of input video images includes means for selecting the plurality of input video images based on the relative spatial relationships of the video cameras corresponding to the plurality of input video images in response to the indication of the amount of blur.
This invention relates to a video processing apparatus designed to generate a display video image with controlled blur effects by combining multiple input video images captured from different spatial positions. The apparatus addresses the problem of creating visually pleasing or artistic blur effects in video content, particularly for applications like cinematic transitions, depth-of-field simulation, or motion blur enhancement. The system includes means for obtaining multiple input video images from a set of video cameras positioned at different spatial locations. The selection of these input images is dynamically adjusted based on an indication of the desired blur amount, ensuring that the spatial relationships between the cameras align with the intended blur effect. For example, a higher blur setting may prompt the selection of input images from more widely spaced cameras to enhance the perceived depth or motion blur. The apparatus also includes means for generating the display video image by combining the selected input video images, where the combination process may involve blending, interpolation, or other image processing techniques to achieve the desired visual effect. The invention enables real-time or post-processing adjustment of blur intensity, providing flexibility in video production workflows.
18. The apparatus of claim 15 , wherein the means for determining the region of interest includes means for receiving an indication of a portion of at least one input video image of the plurality of input video images as an indication of the image of the object in the region of interest.
This invention relates to video processing systems that analyze regions of interest (ROI) within video frames to detect and track objects. The problem addressed is the need for efficient and accurate identification of specific areas in video streams where objects of interest are located, often for applications like surveillance, object tracking, or automated monitoring. The apparatus includes a means for determining the ROI, which involves receiving an indication of a portion of at least one input video image from a plurality of input video images. This indication specifies the image of the object within the ROI, allowing the system to focus processing on that specific area rather than the entire frame. The apparatus also includes means for generating a plurality of output video images, where each output image corresponds to a different input video image and includes the object within the ROI. This ensures that the object remains visible and trackable across multiple frames, even if its position changes. Additionally, the apparatus may include means for generating a plurality of output video images where each output image corresponds to a different input video image and includes the object within the ROI. This ensures that the object remains visible and trackable across multiple frames, even if its position changes. The system may also include means for generating a plurality of output video images where each output image corresponds to a different input video image and includes the object within the ROI. This ensures that the object remains visible and trackable across multiple frames, even if its position changes. The apparatus may further include means for generating a plurality of output video images where each output image corresponds to a different input video i
19. The apparatus of claim 15 , further comprising: means for receiving data representing a location of a presenter; wherein: the means for determining of the region of interest includes means for determining the region of interest responsive to the location of the presenter; and the image of the object includes an image of the presenter.
This invention relates to an apparatus for capturing and processing images of a presenter and an object in a region of interest. The apparatus includes a means for determining a region of interest within a field of view, where the region of interest is dynamically adjusted based on the location of the presenter. The apparatus also includes means for capturing an image of the object within the determined region of interest, where the image includes both the object and the presenter. The means for determining the region of interest operates in response to the presenter's location, ensuring that the captured image dynamically adapts to the presenter's movements. The apparatus may also include means for receiving data representing the presenter's location, which is used to adjust the region of interest accordingly. This ensures that the presenter remains within the captured image while the object is being displayed. The invention is particularly useful in scenarios where the presenter needs to interact with the object while ensuring both are clearly visible in the captured image. The apparatus may be part of a larger system for live presentations, video conferencing, or augmented reality applications.
20. The apparatus of claim 15 , wherein the means for determining the region of interest includes means for estimating a depth of the region of interest, wherein the background includes objects in the plurality of input video images having depths greater that the estimated depth of the region of interest.
This invention relates to video processing systems that isolate a region of interest (ROI) in a sequence of input video images by distinguishing it from background objects. The problem addressed is accurately separating foreground content from background elements in dynamic scenes, particularly when depth information is available. The apparatus includes a depth estimation module that calculates the depth of the ROI, allowing the system to classify background objects as those with greater depth values than the estimated ROI depth. This enables effective foreground-background segmentation, improving visual quality in applications like video editing, surveillance, and augmented reality. The system may also include additional processing modules to refine the ROI extraction, such as motion tracking or edge detection, to enhance accuracy in complex environments. By leveraging depth data, the invention improves upon traditional segmentation techniques that rely solely on color or texture analysis, reducing errors in cluttered or low-contrast scenes. The apparatus is designed to operate in real-time or near-real-time, making it suitable for live video processing tasks.
Unknown
September 29, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.