Patentable/Patents/US-20250303990-A1

US-20250303990-A1

Systems and Methods for Performing Commands in a Vehicle Using Speech and Image Recognition

PublishedOctober 2, 2025

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Systems and methods are disclosed herein for implementation of a vehicle command operation system that may use multi-modal technology to authenticate an occupant of the vehicle to authorize a command and receive natural language commands for vehicular operations. The system may utilize sensors to receive data indicative of a voice command from an occupant of the vehicle. The system may receive second sensor data to aid in the determination of the corresponding vehicular operation in response to the received command. The system may retrieve authentication data for the occupants of the vehicle. The system authenticates the occupant to authorize a vehicular operation command using a neural network based on at least one of the first sensor data, the second sensor data, and the authentication data. Responsive to the authentication, the system may authorize the operation to be performed in the vehicle based on the vehicular operation command.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. A machine comprising:

. The machine of, wherein the one or more processors are further to:

. The machine of, wherein the one or more processors are further to store data that associates the occupant with at least one of the region of the machine or the component that is associated with the region of the machine.

. The machine of, wherein the identifier of the occupant comprises one or more of:

. A method comprising:

. The method of, wherein the determining the particular component comprises:

. The method of, wherein the determining the particular component comprises determining that the historical data indicates that the particular component includes a higher percentage of performing the operation as compared to one or more other components of the plurality of components.

. The method of, wherein the determining the particular component uses one or more neural networks trained using at least a portion of the historical data, the at least the portion of the historical data indicating that the particular component includes a higher percentage of performing the operation as compared to one or more other components of the plurality of components.

. The method of, further comprising:

. The method of, wherein the occupant is at least one of a speaker associated with the speech or associated with an identifier indicated by the speech.

. A system comprising:

. The system of, wherein the determination that the one or more images depict the occupant as being associated with the component of the machine comprises:

. The system of, wherein the one or more processors are further to:

. The system of, wherein the system is comprised in at least one of:

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is a continuation of U.S. patent application Ser. No. 18/062,163, filed Dec. 6, 2025, which is a continuation of U.S. patent application Ser. No. 16/867,395, filed May 5, 2020. Each of which is hereby incorporated by reference in its entirety.

The present disclosure is directed to techniques for operating a vehicle, specifically techniques for performing commands using speech and image recognition.

Vehicle systems may implement voice commands to perform specific vehicle operations. In a particular approach, the system may require a physical button to be pressed by an occupant to engage the system to receive a voice command. This assumes that the occupant of the vehicle pushing the button (e.g., the driver pressing the button on the steering wheel) is authorized to provide a voice command. This approach is deficient in a scenario where other occupants are authorized to initiate a voice command, but cannot press the button due to lack of physical access, or where different levels of authorization are desired depending on the person issuing the command. Moreover, current approaches require the command to include specific preprogrammed nomenclature of the vehicle system (e.g., a voice command may recite “lower the right-rear window”). This approach is deficient when occupants do not know the specific preprogramed nomenclature and instead express commands using natural language.

Accordingly, to overcome the limitations of current voice command systems for vehicles, systems and methods are described herein for a vehicle command operation system that may use multi-modal technology to authenticate an occupant of the vehicle to authorize a command and receive natural language commands for vehicular operations. The system may utilize sensors to receive data indicative of a voice command from an occupant of the vehicle. For example, the system may receive a voice command “lower Sally's window” from a first sensor such as a microphone. This particular voice command has a vehicular operation command to lower a window, but the vehicle may not immediately know which window to lower as it has not yet determined which window is Sally's window. The system may receive second sensor data to aid in the determination of the correct vehicular operation to execute in response to the received command. For example, the system may receive data indicative of an image of the interior of the car from an interior camera sensor mounted above the rear-view mirror. From this image, the system may determine that Sally is sitting beside the rear passenger window.

The system may retrieve authentication data (e.g., from a database) for the occupants of the vehicle. Using this authentication data, along with the first and second sensor data, the system may utilize a neural network to authenticate the occupant to authorize a vehicular operation command. For example, the system may retrieve data from the database indicating the primary operator of the vehicle and their visual indication of the primary operator. The system may then determine, based on the image of the interior, which includes the primary operator and the voice signature of the occupant, that the occupant who provided the voice command is generally the primary operator of the vehicle. The primary operator of the vehicle has an assigned permissions level to authorize lowering the rear passenger window. The system may then, responsive to the authentication, authorize the operation to be performed in the vehicle based on the vehicular operation command. For example, in response to the system authenticating the primary operator to issue the command to lower Sally's window, the system provides instruction to lower the rear passenger window (which is proximate to where Sally is sitting).

In some embodiments, the authentication data may include audio fingerprints of the occupants of the vehicle. For example, a database containing audio fingerprints of various occupants may be used for comparative analysis to receive voice commands from an occupant in the vehicle.

In some embodiments, the voice command may include a vehicular operation command and a reference to an object within the interior of the vehicle. The reference to the object within the interior of the vehicle may be a descriptor. In some embodiments, the descriptor may include at least one of a name of the object, a colloquial name of the object, a shorthand name of the object, and a related descriptor of the object in a different language than that of the voice command interface. For example, a voice command may be received stating “turn up Sally's AC.” In this example, “AC” is a shorthand name for air-conditioning.

In some embodiments, a neural network may be trained with a data set including historical association with the vehicle. For example, sensor data (e.g., microphone sensor data and image data by one or more camera sensors) used during operations of the vehicle may be used as a training data set to identify occupants and objects surrounding the occupants within the vehicle (e.g., chairs, windows, etc.). For example, the neural network may determine that Sally has sat in the rear passenger seat for over 90% of trips in this vehicle.

depicts an example scenarioof a top-down interior view of a vehicle with seated occupants and multimodal sensors, in accordance with some embodiments of the disclosure. The vehiclehas several occupants seated within the vehicle at distinct locations (e.g., occupants,,, andrespectively). The vehicle includes a variety of multimodal sensors. For example, the vehicle includes two interior camerasandand two interior microphonesand. The vehicle may include any other types of sensors including, but not limited to, global navigation satellite systems (“GNSS”) sensor(s) (e.g., Global Positioning System sensor(s)); RADAR sensor(s); ultrasonic sensor(s); LIDAR sensor(s); inertial measurement unit (“IMU”) sensor(s) (e.g., accelerometer(s), gyroscope(s), magnetic compass(es), magnetometer(s), etc.); microphone(s); stereo camera(s); wide-view camera(s) (e.g., fisheye cameras); infrared camera(s); surround camera(s) (e.g., 360 degree cameras); long-range cameras; mid-range camera(s); speed sensor(s); vibration sensor(s); steering sensor(s); brake sensor(s) (e.g., as part of brake sensor system); temperature sensor(s); scent recognition sensor(s); and/or other sensor types. The orientation and/or positioning of the sensors may be of any configuration allowing the sensors to receive respective data.

depicts an example scenarioof a top-down interior view of a vehicle with a system receiving a voice command, in accordance with some embodiments of the disclosure. The vehicle may include processing circuitry to process data received from the multimodal sensors. In some embodiments, the processing circuitry may receive, from one or more sensors, first sensor data indicative of a voice command from an occupant in the vehicle. For example, the occupant in the driver's seatof the vehiclemay issue a voice commandreceived by the microphone sensor. In some embodiments, the microphone sensor may be positioned within the vehicle to receive voice commands from any occupant, regardless of whether the occupant specifically directs their speech at the microphone. For example, the type of microphone sensor may be an omnidirectional microphone capable of accurately receiving sound data from any occupant within the interior of the vehicle. In some embodiments, the first sensor may be a camera sensor. The sensor data received from the camera sensor may then be analyzed by the processing circuitry for lip activity of the occupant to determine a voice command from the occupant. Lip activity may be parsed by the processing circuitry using various lip-activity techniques to parse speech. The processing circuitry may then determine the voice command from the parsed speech. For example, a camera may be used solely, or in combination with the microphone sensor, to corroborate the parsed voice command captured by the microphone sensor. Alternatively, the processing circuitry may utilize the microphone sensor data to corroborate the parsed voice command captured by the camera sensor.

In some embodiments, the processing circuitry may perform speech recognition algorithms to parse the received first sensor data into recognizable words in a specific language. In other embodiments, the processing circuitry may implement automatic speech recognition techniques to retrieve words in association with the first sensor data indicative of a voice command. In some embodiments, the first sensor data may be non-lexical utterances. For example, a received voice command may contain an audio signature similar to a sneeze. In certain configurations, the processing circuitry may associate this non-lexical utterance with a lexical utterance of “lower my window,” where the “my” is in association with the occupant who sneezed. In some embodiments, this association may be implemented by a lookup table. In other embodiments, the association may be created by a machine learning model (e.g., a neural network) that is trained on non-lexical utterances and corresponding actions following in short temporal proximity.

The processing circuitry may receive, from one or more sensors, second sensor data. The second sensor data may be from any type of sensor associated with the vehicle including, but not limited to global navigation satellite systems (“GNSS”) sensor(s) (e.g., Global Positioning System sensor(s)); RADAR sensor(s); ultrasonic sensor(s); LIDAR sensor(s); inertial measurement unit (“IMU”) sensor(s) (e.g., accelerometer(s), gyroscope(s), magnetic compass(es), magnetometer(s), etc.); microphone(s); stereo camera(s); wide-view camera(s) (e.g., fisheye cameras); infrared camera(s); surround camera(s) (e.g., 360 degree cameras); long-range cameras; mid-range camera(s); speed sensor(s); vibration sensor(s); steering sensor(s); brake sensor(s) (e.g., as part of brake sensor system); temperature sensor(s); scent recognition sensor(s); and/or other sensor types. For example, the vehiclemay receive data from a cameramounted in the interior of the vehicle. The data from the camera sensor may include a visual representation of an interior of the vehicle. In this example, the data includes, at least, a visual representation of the occupantas shown by the triangular region from the position of the camera sensor. The data may also include interior aspects of the vehicle such as windows, seats, buttons, vents, interior configurations, seat belts, light exposure, objects surrounding the occupant, and other vehicular conditions detected within the interior of the vehicle. In some embodiments, the second sensor data may include a pressure sensor on a seat within the vehicle such that upon a passenger sitting on a seat, the pressure sensor may determine that a threshold weight is met (e.g., weight of an average human) and thus the seat is occupied.

The processing circuitry may retrieve, from a database, authentication data for one or more occupants. In some embodiments, the database may be local to the vehicle that stores authentication data for occupants of the vehicle.

In other embodiments, the database may be remote from the vehicle. The database may interface with the processing circuitry via coupling of a communications network (e.g., wireless network, 4G/5G data network, or similar network). For example, the vehicle, by processing circuitry, communicates with a wireless cloud-based databaseto retrieve the authentication data for one of the occupants. In some embodiments, the authentication data includes an audio fingerprint of one or more occupants. An audio fingerprint may be a condensed digital summary deterministically generated from audio data that may be used to identify an audio sample or quickly locate similar items in a set of audio data. In some embodiments, the processing circuitry may generate an audio fingerprint of one or more occupants based on speech captured from a microphone sensor within the vehicle.

The processing circuitry may authenticate the occupant to authorize a vehicular operation command based on at least one of the first sensor data and the second sensor data. In some embodiments, the processing circuitry may implement a machine learning model to authenticate the occupant. The machine learning model may be a neural network that is trained with a data set including various multimodal data for respective occupants to learn specific audio authentication and vehicular command preferences. In some embodiments, the multimodal data for training may be historical multimodal data of the vehicle. In some embodiments, the multimodal data for training may be based on preexisting multimodal data for the specific one or more sensors utilized as the first sensor and the second sensor. Any one of the first sensor data, second sensor data, and authentication data may be used as input for the neural network.

The neural network may output an authorization value. The processing circuitry determines, using the authorization value, whether the occupant is authenticated to authorize the vehicular operation command. The authorization value may be any type of value (e.g., Boolean, numeric, floating, fuzzy logic, etc.) that allows for the processing circuitry to determine authentication for the occupant. In some embodiments, the processing circuitry authenticates the voice command as the operator of the vehicle and determines that the authenticated operator is authorized to cause the operation to be performed in the vehicle.

The processing circuitry may, responsive to the authentication, authorize the operation to be performed in the vehicle based on the vehicular operation command. For example, the processing circuitry may receive a voice command to “turn on the AC” from the passenger in the vehicle. The processing circuitry receives an image of the passenger as second sensor data. The processing circuitry receives a voice fingerprint of the passenger. Using the voice fingerprint, the voice command, and the image of the passenger, the processing circuitry authenticates the passenger using a neural network as authorized to engage the AC setting within the vehicle to “on.” The processing circuitry then turns on the AC setting within the vehicle.

In some embodiments, the processing circuitry may receive, from one or more sensors, data indicative of an image of the interior of the vehicle. For example, the processing circuitry may receive an image from one or more camera sensors within the vehicle.depicts an example scenarioof a top-down interior view of a vehicle with a system determining a vehicular operation command, in accordance with some embodiments of the disclosure. The processing circuitry of the vehiclereceives data from front-mounted interior cameraand rear-mounted interior camera.

In some embodiments, the processing circuitry may locate, using processing circuitry that implements at least one neural network, a positional region of an object within the interior of the vehicle based on the received image. For example, the neural network may determine various objects within the vehicle including various windows of the vehiclesuch as a rear driver-side window. Other examples of the processing circuitry locating objects may include locating occupantin the driver's seat and occupantseated in the middle row behind the driver. As stated earlier, a neural network may be trained with a data set including historical association with a vehicle to determine and/or detect objects within the vehicle. For example, sensor data (e.g., microphone sensor data and image data by one or more camera sensors) used during operations of the vehicle may be used as a training data set to identify occupants and surrounding objects (e.g., chairs, windows, etc.). For example, the neural network may determine that occupanthas sat at the middle row left seat for over 85% of their trips in this vehicle.

The processing circuitry may locate the positional region of an object. For example, occupanthas a specific associated positional region. The processing circuitry may determine the positional region based on an equal distribution of space started from the center of gravity of the object (or any other position within the object). In other embodiments, the positional region may be based on other factors such as the immediate environment of the object. For example, if an object (e.g., occupant of the vehicle) is sitting close to the door of the vehicle, the positional region may include only the interior region and have an unequal amount of positional space towards the interior of the vehicle. The positional region may be any measure of position such as the six degrees of freedom. In some embodiments, the positional region may be learned for the specific object by the neural network over time. For example, a particular occupant may only interact within a subset of features that is learned over time and the positional region may extend to cover only this set of features used by the particular occupant. In some embodiments, the positional region is preprogrammed for an object type which may be applied to a plurality of objects. For example, any occupant of a plurality of occupants may be designated a specific positional region.

In some embodiments, the processing circuitry may receive, from one or more sensors, data indicative of a voice command, wherein the voice command comprises a vehicular operation command and a reference to the object within the interior of the vehicle. For example, the processing circuitry may receive a voice command from occupantvia the microphone sensorreciting “Lower Sally's window.” In this example, the vehicular operation command is to lower a window of the vehicle. The reference to an object within the vehicle is the term “Sally,” which the neural network determines corresponds to occupant. Sally is seated in the middle row of the vehicle, and she is within a positional regionnext to objects such as window. In some embodiments, the voice command is of an authenticated operator of the vehicle (e.g., a driver, or the owner of the vehicle, etc.). In some embodiments, the authenticated operator is authorized to cause the operation to be performed in the vehicle. In some embodiments, the reference to the object comprises a descriptor associated with the object. The descriptor associated with the object may be a synonym of the object, a colloquial phase of the object, a shorthand name of the object, and/or a related descriptor of the object in a different language than that of the voice command interface. For example, the voice command may be “Lower Sal's window.” Sal may be a nickname for Sally. The processing circuitry may associate this nickname by means of a lookup table where Sal is looked up in a database and the corresponding object name “Sally” is returned. In other embodiments, the processing circuitry may implement a neural network that learns how objects are commanded and/or referred to by analyzing sensor information (e.g., data from vehicle microphones) to determine which objects may have multiple names/aliases. For example, Sally may be referred to as Sal, Sal-Sal, honey, sweetie-pie, Sizzy, and/or Lee as determined from two months of microphone sensor input from the vehicle.

In some embodiments, the processing circuitry may cause the vehicular operation to be performed in the vehicle based on the vehicular operation command at the positional region of the object. For example, the processing circuitry may lower the windowthat is within a positional regionof the object (Sally-occupant). The processing circuitry may implement machine learning (e.g., a neural network) to determine which operation is to be performed given the positional region of the object and the voice command. For example, there may be ambiguity that can be resolved by either having the neural network selecting the operation with highest predictive likelihood, or alternatively, request further information from the occupant regarding the proposed vehicular operation command. For example, there may be a window directly parallel adjacent to the sitting position of occupant, while a second windowmay exist behind occupantin the rear left corner of the vehicle. The processing circuitry may determine that the positional region covers both windows. The processing circuitry may determine, optionally based on a neural network, that only the window directly parallel (e.g., window) will be lowered, while leaving windowclosed. In some embodiments, if ambiguity exists in the instruction of the voice command, the processing circuitry may cause the operation to be performed in the vehicle based on historical information associated with the particular object and/or based on an aggregate set of data for the vehicular operation. For example, if 90% of the voice commands of the object have been to lower the rear left window, the processing circuitry may select this is the vehicular operation over another potential operation within the positional region of the object that may be relevant given the voice command.

depicts an example scenarioof a top-down interior view of a vehicle with a system determining another vehicular operation command, in accordance with some embodiments of the disclosure. The processing circuitry receives two images of the interior of the vehiclefrom camera sensorsand. The processing circuitry determines that the interior of the vehicle includes numerous occupants (including driverand rear right passenger), seats, windows, and other objects. The processing circuitry determines that seat objects have a plurality of respective vehicular operations, including heating and cooling. The processing circuitry locates the rear right seat and determines a positional region of the seat. The processing circuitry then receives a voice command from occupantvia the microphone sensorsandreciting “Turn on my seat warmers.” The voice command includes a vehicular operation command to engage seat warmers, which are a function of the seat object, and a reference to the object, namely occupant, using the pronoun “my.” The processing circuitry determines that the seat is within the positional regionof the occupantand engages the seat warmers.

is an illustration of an example autonomous vehicle, in accordance with some embodiments of the present disclosure. The autonomous vehicle(alternatively referred to herein as the “vehicle”) may include, without limitation, a passenger vehicle, such as a car, a truck, a bus, a first responder vehicle, a shuttle, an electric or motorized bicycle, a motorcycle, a fire truck, a police vehicle, an ambulance, a boat, a construction vehicle, an underwater craft, a drone, and/or another type of vehicle (e.g., that is unmanned and/or that accommodates one or more passengers). Autonomous vehicles are generally described in terms of automation levels, defined by the National Highway Traffic Safety Administration (NHTSA), a division of the US Department of Transportation, and the Society of Automotive Engineers (SAE) “Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles” (Standard No. J3016-201806, published on Jun. 15, 2018, Standard No. 13016-201609, published on Sep. 30, 2016, and previous and future versions of this standard). The vehiclemay be capable of functionality in accordance with one or more of Level 3-Level 5 of the autonomous driving levels. For example, the vehiclemay be capable of conditional automation (Level 3), high automation (Level 4), and/or full automation (Level 5), depending on the embodiment.

The vehiclemay include components such as a chassis, a vehicle body, wheels (e.g., 2, 4, 6, 8, 18, etc.), tires, axles, and other components of a vehicle. The vehiclemay include a propulsion system, such as an internal combustion engine, hybrid electric power plant, an all-electric engine, and/or another propulsion system type. The propulsion systemmay be connected to a drive train of the vehicle, which may include a transmission, to enable the propulsion of the vehicle. The propulsion systemmay be controlled in response to receiving signals from the throttle/accelerator.

A steering system, which may include a steering wheel, may be used to steer the vehicle(e.g., along a desired path or route) when the propulsion systemis operating (e.g., when the vehicle is in motion). The steering systemmay receive signals from a steering actuator. The steering wheel may be optional for full automation (Level 5) functionality.

The brake sensor systemmay be used to operate the vehicle brakes in response to receiving signals from the brake actuatorsand/or brake sensors.

Controller(s), which may include one or more CPU(s), system on chips (SoCs)() and/or GPU(s), may provide signals (e.g., representative of commands) to one or more components and/or systems of the vehicle. For example, the controller(s) may send signals to operate the vehicle brakes via one or more brake actuators, to operate the steering systemvia one or more steering actuators, and/or to operate the propulsion systemvia one or more throttle/accelerators. The controller(s)may include one or more onboard (e.g., integrated) computing devices (e.g., supercomputers) that process sensor signals, and output operation commands (e.g., signals representing commands) to enable autonomous driving and/or to assist a human driver in driving the vehicle. The controller(s)may include a first controllerfor autonomous driving functions, a second controllerfor functional safety functions, a third controllerfor artificial intelligence functionality (e.g., computer vision), fourth controllerfor infotainment functionality, a fifth controllerfor redundancy in emergency conditions, and/or other controllers. In some examples, a single controllermay handle two or more of the above functionalities, two or more controllersmay handle a single functionality, and/or any combination thereof.

The controller(s)may provide the signals for controlling one or more components and/or systems of the vehiclein response to sensor data received from one or more sensors (e.g., sensor inputs). The sensor data may be received from, for example and without limitation, global navigation satellite systems sensor(s)(e.g., Global Positioning System sensor(s)), RADAR sensor(s), ultrasonic sensor(s), LIDAR sensor(s), inertial measurement unit (IMU) sensor(s)(e.g., accelerometer(s), gyroscope(s), magnetic compass(es), magnetometer(s), etc.), microphone(s), stereo camera(s), wide-view camera(s)(e.g., fisheye cameras), infrared camera(s), surround camera(s)(e.g., 360 degree cameras), long-range and/or mid-range camera(s), speed sensor(s)(e.g., for measuring the speed of the vehicle), vibration sensor(s), steering sensor(s), brake sensor(s)(e.g., as part of the brake sensor system), and/or other sensor types.

One or more of the controller(s)may receive inputs (e.g., represented by input data) from an instrument clusterof the vehicleand provide outputs (e.g., represented by output data, display data, etc.) via a human-machine interface (HMI) display, an audible annunciator, a loudspeaker, and/or via other components of the vehicle. The outputs may include information such as vehicle velocity, speed, time, map data (e.g., the HD mapof), location data (e.g., the location of the vehicle, such as on a map), direction, location of other vehicles (e.g., an occupancy grid), information about objects and status of objects as perceived by the controller(s), etc. For example, the HMI displaymay display information about the presence of one or more objects (e.g., a street sign, caution sign, traffic light changing, etc.), and/or information about driving maneuvers the vehicle has made, is making, or will make (e.g., changing lanes now, taking exitB in two miles, etc.).

The vehiclefurther includes a network interface, which may use one or more wireless antenna(s)and/or modem(s) to communicate over one or more networks. For example, the network interfacemay be capable of communication over LTE, WCDMA, UMTS, GSM, CDMA2000, etc. The wireless antenna(s)may also enable communication between objects in the environment (e.g., vehicles, mobile devices, etc.), using local area network(s), such as Bluetooth, Bluetooth LE, Z-Wave, ZigBee, etc., and/or low power wide-area network(s) (LPWANs), such as LoRaWAN, SigFox, etc.

is an example of camera locations and fields of view for the example autonomous vehicleof, in accordance with some embodiments of the present disclosure. The cameras and respective fields of view are one example embodiment and are not intended to be limiting. For example, additional and/or alternative cameras may be included and/or the cameras may be located at different locations on the vehicle.

The camera types for the cameras may include, but are not limited to, digital cameras that may be adapted for use with the components and/or systems of the vehicle. The camera(s) may operate at automotive safety integrity level (ASIL) Band/or at another ASIL. The camera types may be capable of any image capture rate, such as 60 frames per second (fps), 120 fps, 240 fps, etc., depending on the embodiment. The cameras may be capable of using rolling shutters, global shutters, another type of shutter, or a combination thereof. In some examples, the color filter array may include a red clear clear clear (RCCC) color filter array, a red clear clear blue (RCCB) color filter array, a red blue green clear (RBGC) color filter array, a Foveon X3 color filter array, a Bayer sensors (RGGB) color filter array, a monochrome sensor color filter array, and/or another type of color filter array. In some embodiments, clear pixel cameras, such as cameras with an RCCC, an RCCB, and/or an RBGC color filter array, may be used in an effort to increase light sensitivity.

In some examples, one or more of the camera(s) may be used to perform advanced driver assistance systems (ADAS) functions (e.g., as part of a redundant or fail-safe design). For example, a Multi-Function Mono Camera may be installed to provide functions including lane departure warning, traffic sign assist and intelligent headlamp control. One or more of the camera(s) (e.g., all of the cameras) may record and provide image data (e.g., video) simultaneously.

One or more of the cameras may be mounted in a mounting assembly, such as a custom-designed (3-D printed) assembly, in order to cut out stray light and reflections from within the car (e.g., reflections from the dashboard reflected in the windshield mirrors) which may interfere with the camera's image data capture abilities. With reference to wing-mirror mounting assemblies, the wing-mirror assemblies may be custom 3-D printed so that the camera mounting plate matches the shape of the wing-mirror. In some examples, the camera(s) may be integrated into the wing-mirror. For side-view cameras, the camera(s) may also be integrated within the four pillars at each corner of the cabin.

Cameras with a field of view that includes portions of the environment in front of the vehicle(e.g., front-facing cameras) may be used for surround view, to help identify forward-facing paths and obstacles, as well aid in, with the help of one or more controllersand/or control SoCs, providing information critical to generating an occupancy grid and/or determining the preferred vehicle paths. Front-facing cameras may be used to perform many of the same ADAS functions as LIDAR, including emergency braking, pedestrian detection, and collision avoidance. Front-facing cameras may also be used for ADAS functions and systems including Lane Departure Warnings (LDW), Autonomous Cruise Control (ACC), and/or other functions such as traffic sign recognition.

A variety of cameras may be used in a front-facing configuration, including, for example, a monocular camera platform that includes a CMOS (complementary metal oxide semiconductor) color imager. Another example may be a wide-view camera(s)that may be used to perceive objects coming into view from the periphery (e.g., pedestrians, crossing traffic or bicycles). Although only one wide-view camera is illustrated in, there may any number of wide-view camerason the vehicle. In addition, long-range camera(s)(e.g., a long-view stereo camera pair) may be used for depth-based object detection, especially for objects for which a neural network has not yet been trained. The long-range camera(s)may also be used for object detection and classification, as well as basic object tracking.

One or more stereo camerasmay also be included in a front-facing configuration. The stereo camera(s)may include an integrated control unit comprising a scalable processing unit, which may provide a programmable logic (e.g., FPGA) and a multi-core micro-processor with an integrated CAN or Ethernet interface on a single chip. Such a unit may be used to generate a 3-D map of the vehicle's environment, including a distance estimate for all the points in the image. An alternative stereo camera(s)may include a compact stereo vision sensor(s) that may include two camera lenses (one each on the left and right) and an image processing chip that may measure the distance from the vehicle to the target object and use the generated information (e.g., metadata) to activate the autonomous emergency braking and lane departure warning functions. Other types of stereo camera(s)may be used in addition to, or alternatively from, those described herein.

Cameras with a field of view that includes portions of the environment to the side of the vehicle(e.g., side-view cameras) may be used for surround view, providing information used to create and update the occupancy grid, as well as to generate side impact collision warnings. For example, surround camera(s)(e.g., four surround camerasas illustrated in) may be positioned around the vehicle. The surround camera(s)may include wide-view camera(s), fisheye camera(s), 360-degree camera(s), and/or the like. For example, four fisheye cameras may be positioned on the vehicle's front, rear, and sides. In an alternative arrangement, the vehicle may use three surround camera(s)(e.g., left, right, and rear), and may leverage one or more other camera(s) (e.g., a forward-facing camera) as a fourth surround-view camera.

Cameras with a field of view that include portions of the environment to the rear of the vehicle(e.g., rear-view cameras) may be used for park assistance, surround view, rear collision warnings, and creating and updating the occupancy grid. A wide variety of cameras may be used including, but not limited to, cameras that are also suitable as a front-facing camera(s) (e.g., long-range and/or mid-range camera(s), stereo camera(s)), infrared camera(s), etc.), as described herein.

Cameras with a field of view that include portions of the interior or cabin of vehiclemay be used to monitor one or more states of drivers, passengers, or objects in the cabin. Any type of camera may be used including, but not limited to, cabin camera(s), which may be any type of camera described herein, and which may be placed anywhere on or in vehiclethat provides a view of the cabin or interior thereof. For example, cabin camera(s)may be placed within or on some portion of the vehicledashboard, rear view mirror, side view mirrors, seats, or doors and oriented to capture images of any drivers, passengers, or any other object or portion of the vehicle.

is a block diagram of an example system architecture for the example autonomous vehicleof, in accordance with some embodiments of the present disclosure. It should be understood that this and other arrangements described herein are set forth only as examples. Other arrangements and elements (e.g., machines, interfaces, functions, orders, groupings of functions, etc.) may be used in addition to or instead of those shown, and some elements may be omitted altogether. Further, many of the elements described herein are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, and in any suitable combination and location. Various functions described herein as being performed by entities may be carried out by hardware, firmware, and/or software. For instance, various functions may be carried out by a processor executing instructions stored in memory.

Each of the components, features, and systems of the vehicleinis illustrated as being connected via bus. The busmay include a Controller Area Network (CAN) data interface (alternatively referred to herein as a “CAN bus”). A CAN may be a network inside the vehicleused to aid in control of various features and functionality of the vehicle, such as actuation of brakes, acceleration, braking, steering, windshield wipers, etc. A CAN bus may be configured to have dozens or even hundreds of nodes, each with its own unique identifier (e.g., a CAN ID). The CAN bus may be read to find steering wheel angle, ground speed, engine revolutions per minute (RPMs), button positions, and/or other vehicle status indicators. The CAN bus may be ASIL B compliant.

Although the busis described herein as being a CAN bus, this is not intended to be limiting. For example, in addition to, or alternatively from, the CAN bus, FlexRay and/or Ethernet may be used. Additionally, although a single line is used to represent the bus, this is not intended to be limiting. For example, there may be any number of busses, which may include one or more CAN busses, one or more FlexRay busses, one or more Ethernet busses, and/or one or more other types of busses using a different protocol. In some examples, two or more bussesmay be used to perform different functions, and/or may be used for redundancy. For example, a first busmay be used for collision avoidance functionality and a second busmay be used for actuation control. In any example, each busmay communicate with any of the components of the vehicle, and two or more bussesmay communicate with the same components. In some examples, each SoC, each controller, and/or each computer within the vehicle may have access to the same input data (e.g., inputs from sensors of the vehicle), and may be connected to a common bus, such the CAN bus.

The vehiclemay include one or more controller(s), such as those described herein with respect toThe controller(s)may be used for a variety of functions. The controller(s)may be coupled to any of the various other components and systems of the vehicleand may be used for control of the vehicle, artificial intelligence of the vehicle, infotainment for the vehicle, and/or the like.

The vehiclemay include a system(s) on a chip (SoC). The SoCmay include CPU(s), GPU(s), processor(s), cache(s), accelerator(s), data store(s), and/or other components and features not illustrated. The SoC(s)may be used to control the vehiclein a variety of platforms and systems. For example, the SoC(s)may be combined in a system (e.g., the system of the vehicle) with an HD mapwhich may obtain map refreshes and/or updates via a network interfacefrom one or more servers (e.g., server(s)of).

The CPU(s)may include a CPU cluster or CPU complex (alternatively referred to herein as a “CCPLEX”). The CPU(s)may include multiple cores and/or L2 caches. For example, in some embodiments, the CPU(s)may include eight cores in a coherent multi-processor configuration. In some embodiments, the CPU(s)may include four dual-core clusters where each cluster has a dedicated L2 cache (e.g., a 2 MB L2 cache). The CPU(s)(e.g., the CCPLEX) may be configured to support simultaneous cluster operation enabling any combination of the clusters of the CPU(s)to be active at any given time.

The CPU(s)may implement power management capabilities that include one or more of the following features: individual hardware blocks may be clock-gated automatically when idle to save dynamic power; each core clock may be gated when the core is not actively executing instructions due to execution of WFI/WFE instructions; each core may be independently power-gated; each core cluster may be independently clock-gated when all cores are clock-gated or power-gated; and/or each core cluster may be independently power-gated when all cores are power-gated. The CPU(s)may further implement an enhanced algorithm for managing power states, where allowed power states and expected wakeup times are specified, and the hardware/microcode determines the best power state to enter for the core, cluster, and CCPLEX. The processing cores may support simplified power state entry sequences in software with the work offloaded to microcode.

The GPU(s)may include an integrated GPU (alternatively referred to herein as an “iGPU”). The GPU(s)may be programmable and may be efficient for parallel workloads. The GPU(s), in some examples, may use an enhanced tensor instruction set. The GPU(s)may include one or more streaming microprocessors, where each streaming microprocessor may include an LI cache (e.g., an LI cache with at least 96 KB storage capacity), and two or more of the streaming microprocessors may share an L2 cache (e.g., an L2 cache with a 512 KB storage capacity). In some embodiments, the GPU(s)may include at least eight streaming microprocessors. The GPU(s)may use computer-based application programming interface(s) (API(s)). In addition, the GPU(s)may use one or more parallel computing platforms and/or programming models (e.g., NVIDIA's CUDA).

The GPU(s)may be power-optimized for best performance in automotive and embedded use cases. For example, the GPU(s)may be fabricated on a Fin field-effect transistor (FinFET). However, this is not intended to be limiting, and the GPU(s)may be fabricated using other semiconductor manufacturing processes. Each streaming microprocessor may incorporate a number of mixed-precision processing cores partitioned into multiple blocks. For example, and without limitation, 64 PF32 cores and 32 PF64 cores may be partitioned into four processing blocks. In such an example, each processing block may be allocated 16 FP32 cores, 8 FP64 cores, 16 INT32 cores, two mixed-precision NVIDIA TENSOR COREs for deep learning matrix arithmetic, an LO instruction cache, a warp scheduler, a dispatch unit, and/or a 64 KB register file. In addition, the streaming microprocessors may include independent parallel integer and floating-point data paths to provide for efficient execution of workloads with a mix of computation and addressing calculations. The streaming microprocessors may include independent thread-scheduling capability to enable finer-grain synchronization and cooperation between parallel threads. The streaming microprocessors may include a combined L1 data cache and shared memory unit in order to improve performance while simplifying programming.

Patent Metadata

Filing Date

Unknown

Publication Date

October 2, 2025

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search