8856371

Video conferencing over IP networks

PublishedOctober 7, 2014
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
33 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method for communication, comprising: establishing communication links over a packet network between a server and plurality of client computers that are to participate in a video teleconference; receiving at the server from the client computers uplink audio packets and uplink video packets, which respectively contain audio and video data captured by each of the client computers; mixing the audio data from the uplink audio packets at the server so as to create respective streams of mixed audio data for transmission to the client computers; transmitting from the server to the client computers downlink audio packets containing the respective streams of mixed audio data; relaying the video data from the server to the client computers in downlink video packets; analyzing on the server relative time differences between mixed audio data and each relayed video data stream; generating at least one corresponding synchronization packet based on the analyzed relative time differences between mixed audio data and each relayed video data stream; and, transmitting from the server to the client computers the at least one corresponding synchronization packet containing synchronization information for synchronizing the relayed video data in the downlink video packets with the downlink audio packets containing the respective streams of mixed audio data in addition to transmitting the downlink video packets and the downlink audio packets.

Plain English Translation

A video conferencing system uses a server to manage audio and video streams between multiple clients over a packet network (like the internet). Each client sends its audio and video to the server in separate packets. The server mixes the audio from all clients into a single audio stream for each client. The server also relays video from each client to the other clients. Crucially, the server analyzes timing differences between the mixed audio and relayed video for each client, creates synchronization packets containing timing information, and sends these synchronization packets to each client to help them align the audio and video streams, ensuring proper lip-sync and overall synchronization.

Claim 2

Original Legal Text

2. The method according to claim 1 , wherein establishing the communication links comprises establishing respective first and second communication links between first and second client computers and a server over the packet network using different, respective first and second transport layer protocols.

Plain English Translation

In the video conferencing system, the server establishes connections with different clients using different transport layer protocols. For example, one client might connect using TCP, while another uses UDP. This allows the system to accommodate diverse network conditions and client capabilities, optimizing for latency or reliability as needed. The core video conferencing architecture, audio mixing, video relaying and sync packet functionality described previously remains the same.

Claim 3

Original Legal Text

3. The method according to claim 1 , wherein receiving the uplink video packets comprises controlling a quality of the video data conveyed to the server by the client computers by transmitting instructions from the server to the client computers.

Plain English Translation

The video conferencing system can dynamically adjust the quality of the video sent by each client to the server. The server sends instructions to the clients, telling them to increase or decrease video quality. This allows the server to manage bandwidth consumption and maintain a smooth conferencing experience, even when network conditions fluctuate. The core video conferencing architecture, audio mixing, video relaying and sync packet functionality described previously remains the same.

Claim 4

Original Legal Text

4. The method according to claim 3 , wherein transmitting the instructions comprises receiving messages from the client computers that are indicative of downlink bandwidth availability for transmission from the server to the client computers, and determining the quality of the video data responsively to the downlink bandwidth availability.

Plain English Translation

To determine the appropriate video quality, the server monitors the available bandwidth to each client. Clients send messages to the server indicating how much bandwidth they have available. The server then uses this information to instruct the clients to adjust their video quality accordingly. This feedback loop helps ensure that each client receives the best possible video quality without overloading its network connection. The core video conferencing architecture, audio mixing, video relaying and sync packet functionality described previously remains the same.

Claim 5

Original Legal Text

5. The method according to claim 4 , wherein receiving the messages comprises detecting, at one of the client computers, a delay in receiving one or more of the downlink audio and video packets, and informing the server of the delay.

Plain English Translation

Clients detect delays in receiving audio or video packets from the server. If a client experiences a delay, it sends a message to the server informing it of the problem. This provides the server with real-time information about network conditions and allows it to take corrective action, such as reducing video quality. The core video conferencing architecture, audio mixing, video relaying, quality adjustment and sync packet functionality described previously remains the same.

Claim 6

Original Legal Text

6. The method according to claim 5 , wherein transmitting the instructions comprises instructing the clients to reduce the quality of the video data transmitted in the uplink video packets responsively to detecting the delay at the one of the clients.

Plain English Translation

If a client reports a delay in receiving audio or video, the server instructs that client to reduce the quality of its outgoing video stream. This helps alleviate congestion on the network and improves the overall conferencing experience for all participants. The core video conferencing architecture, audio mixing, video relaying, client-reported delay, quality adjustment and sync packet functionality described previously remains the same.

Claim 7

Original Legal Text

7. The method according to claim 3 , wherein controlling the quality comprises instructing the client computers to increase or decrease at least one quality parameter selected from a group of quality parameters consisting of an image resolution, a degree of image compression, a frame rate and a bandwidth.

Plain English Translation

The server controls video quality by instructing clients to adjust various video parameters. These parameters include image resolution, the degree of image compression, the video frame rate, and the overall bandwidth used for video transmission. This allows fine-grained control over video quality and bandwidth consumption. The core video conferencing architecture, audio mixing, video relaying, quality adjustment based on specific parameters and sync packet functionality described previously remains the same.

Claim 8

Original Legal Text

8. The method according to claim 1 , wherein receiving the uplink packets comprises detecting, at the server, a delay in the audio data, and eliminating an interval of silent audio data in order to compensate for the delay.

Plain English Translation

The server detects delays in the incoming audio streams from clients. To compensate for these delays, the server eliminates intervals of silence in the audio. This helps to keep the audio synchronized and improves the overall quality of the conference. The core video conferencing architecture, audio mixing, video relaying, delay compensation and sync packet functionality described previously remains the same.

Claim 9

Original Legal Text

9. The method according to claim 8 , wherein eliminating the interval comprises marking, at one or more of the client computers, at least one block of the audio data as a silent block, and eliminating the silent block from the mixed audio data.

Plain English Translation

To eliminate silent intervals in the audio, clients mark blocks of audio data as "silent." The server then removes these silent blocks from the mixed audio stream. This avoids transmitting unnecessary data and reduces the impact of audio delays. The core video conferencing architecture, audio mixing, video relaying, client-marked silence removal and sync packet functionality described previously remains the same.

Claim 10

Original Legal Text

10. The method according to claim 1 , wherein each of the downlink video packets contains the video data captured by a respective one of the client computers.

Plain English Translation

In the video conference, each client receives the video stream from every other client. The server relays the video from each client to all the other participants in the conference, allowing everyone to see each other. The core video conferencing architecture, audio mixing, video relaying of individual streams and sync packet functionality described previously remains the same.

Claim 11

Original Legal Text

11. The method according to claim 10 further comprising the steps of: receiving and synchronizing the video data with the mixed audio data at the client computers based on synchronization data from the received synchronization packet; and outputting the synchronized video and mixed audio data to a respective user of each of the client computers wherein outputting the synchronized video and mixed audio data comprises displaying the video data captured by the respective one of the client computers in a respective window among multiple windows displayed by each of the client computers.

Plain English Translation

Clients receive the video and mixed audio, synchronizing the two based on synchronization data from the server's sync packets. The synchronized audio and video are then output to the user. The video from each participant is displayed in a separate window on the client's screen. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality and individual video window display described previously remains the same.

Claim 12

Original Legal Text

12. The method according to claim 11 , wherein synchronizing the video data comprises controlling, at the client computer, the multiple windows so that the video data conveyed from each of the client computers are synchronized with the mixed audio data.

Plain English Translation

The client software controls the placement and timing of the multiple video windows to synchronize the displayed video with the mixed audio, based on the timing information in the sync packets. This ensures that the audio and video are properly aligned for each participant. The core video conferencing architecture, audio mixing, video relaying, synchronized window control and sync packet functionality described previously remains the same.

Claim 13

Original Legal Text

13. The method according to claim 10 , wherein relaying the video data comprises passing the video data from the uplink video packets to the downlink video packets without transcoding of the video data at the server.

Plain English Translation

The server relays the video data without transcoding it. The video data from each client is passed directly to the other clients without being re-encoded or modified by the server, reducing server processing load and latency. The core video conferencing architecture, audio mixing, video relaying without transcoding and sync packet functionality described previously remains the same.

Claim 14

Original Legal Text

14. The method according to claim 1 , wherein receiving the uplink audio and video packets comprises receiving at the server synchronization data from each of the client computers, and comprising generating synchronization information at the server based on the synchronization data, and transmitting the synchronization information from the server to the client computers for use in synchronizing the video data with the mixed audio data.

Plain English Translation

Clients send synchronization data to the server along with their audio and video. The server uses this data to generate synchronization information which the server sends back to the clients. The clients then use this synchronization information to synchronize the received video with the mixed audio, ensuring lip-sync. The core video conferencing architecture, audio mixing, video relaying, server sync data generation and sync packet functionality described previously remains the same.

Claim 15

Original Legal Text

15. The method according to claim 1 , wherein the plurality of client computers comprises at least three client computers that participate in the video teleconference.

Plain English Translation

The video conferencing system supports teleconferences with at least three participants. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality supports multiple participants (3 or more).

Claim 16

Original Legal Text

16. A communication apparatus, comprising: a plurality of client computers, which are connected to communicate over a packet network and are configured to capture audio and video data and to transmit over the packet network uplink audio packets and uplink video packets, which respectively contain the audio and video data; and a server, which is coupled to establish communication links over the packet network with the client computers that are to participate in a video teleconference and to receive the uplink audio packets and uplink video packets over the communication links; wherein the server is configured to mix the audio data from the uplink audio packets so as to create respective streams of mixed audio data for transmission to the client computers, and to generate at least one synchronization packet containing synchronization information, and to transmit to the client computers downlink audio packets containing the respective streams of mixed audio data in addition to the at least one synchronization packet while relaying the video data from the uplink video packets to the client computers in downlink video packets, wherein the client computers are configured to synchronize the video data with the mixed audio data based on the synchronization information, and to output the synchronized video and mixed audio data to a respective user of each of the client computers.

Plain English Translation

A video conferencing apparatus consists of client computers and a central server. The client computers capture audio and video, sending it to the server. The server mixes the audio streams and relays the video streams to all clients. The server generates sync packets that allows each client to synchronize received video with the mixed audio, ensuring proper audio-visual alignment when presenting to the users. The core video conferencing architecture, audio mixing, video relaying and sync packet functionality is implemented in the server/client apparatus.

Claim 17

Original Legal Text

17. The apparatus according to claim 16 , wherein the server is configured to establish respective first and second communication links with first and second client computers over the packet network using different, respective first and second transport layer protocols.

Plain English Translation

In the video conferencing apparatus, the server can establish communication links with different clients using different transport layer protocols. For example, one client might connect using TCP, while another uses UDP. This allows the system to accommodate diverse network conditions and client capabilities, optimizing for latency or reliability as needed. The core video conferencing architecture, audio mixing, video relaying and sync packet functionality described previously remains the same in the server/client apparatus.

Claim 18

Original Legal Text

18. The apparatus according to claim 16 , wherein the server is configured to control a quality of the video data conveyed to the server by the client computers by transmitting instructions to the client computers.

Plain English Translation

The server in the video conferencing apparatus can control the quality of the video sent by the client computers. The server sends instructions to the clients, telling them to increase or decrease video quality. This allows the server to manage bandwidth consumption and maintain a smooth conferencing experience, even when network conditions fluctuate. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality and video quality adjustment described previously remains the same in the server/client apparatus.

Claim 19

Original Legal Text

19. The apparatus according to claim 18 , wherein the server is coupled to receive messages from the client computers that are indicative of downlink bandwidth availability for transmission from the server to the client computers, and to determine the quality of the video data responsively to the downlink bandwidth availability.

Plain English Translation

In the video conferencing apparatus, the server determines video quality based on available bandwidth. Client computers send messages to the server indicating how much bandwidth they have available. The server then uses this information to instruct the clients to adjust their video quality accordingly. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality, video quality adjustment and bandwidth monitoring described previously remains the same in the server/client apparatus.

Claim 20

Original Legal Text

20. The apparatus according to claim 19 , wherein the client computers are configured to determine the downlink bandwidth availability by detecting a delay in receiving one or more of the downlink audio and video packets, and to inform the server of the delay.

Plain English Translation

The client computers in the video conferencing apparatus determine bandwidth availability by detecting delays in receiving audio or video packets from the server. If a client experiences a delay, it sends a message to the server informing it of the problem. This provides the server with real-time information about network conditions and allows it to take corrective action, such as reducing video quality. The core video conferencing architecture, audio mixing, video relaying, client-side delay detection, sync packet functionality, video quality adjustment and bandwidth monitoring described previously remains the same in the server/client apparatus.

Claim 21

Original Legal Text

21. The apparatus according to claim 20 , wherein the server is configured to instruct the clients to reduce the quality of the video data transmitted in the uplink video packets responsively to detecting the delay at one or more of the clients.

Plain English Translation

In the video conferencing apparatus, the server instructs clients to reduce their outgoing video quality if those clients report delays, which helps to alleviate congestion and improve the overall experience. The core video conferencing architecture, audio mixing, video relaying, client-side delay detection, server-side video quality control and sync packet functionality described previously remains the same in the server/client apparatus.

Claim 22

Original Legal Text

22. The apparatus according to claim 18 , wherein the client computers are configured to control the quality by increasing or decreasing at least one quality parameter selected from a group of quality parameters consisting of an image resolution, a degree of image compression, a frame rate and a bandwidth.

Plain English Translation

The client computers in the video conferencing apparatus control video quality by adjusting parameters like image resolution, compression, frame rate, and bandwidth. This provides fine-grained control over video quality and bandwidth consumption. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality and client-side video parameter control described previously remains the same in the server/client apparatus.

Claim 23

Original Legal Text

23. The apparatus according to claim 16 , wherein the server is configured to detect a delay in the audio data, and to eliminate an interval of silent audio data in order to compensate for the delay.

Plain English Translation

The server in the video conferencing apparatus detects delays in the audio data. To compensate, it eliminates silent intervals, helping keep audio synchronized and improve the conference quality. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality, server-side delay detection and silence removal described previously remains the same in the server/client apparatus.

Claim 24

Original Legal Text

24. The apparatus according to claim 23 , wherein the client computers are configured to mark at least one block of the audio data as a silent block, and wherein the conference is configured to eliminate the silent block from the mixed audio data.

Plain English Translation

In the video conferencing apparatus, the client computers mark blocks of audio data as silent, allowing the server to remove them, reducing unnecessary data and minimizing delay impacts. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality, server-side delay detection, client-side silence marking and server-side silence removal described previously remains the same in the server/client apparatus.

Claim 25

Original Legal Text

25. The apparatus according to claim 16 , wherein each of the downlink video packets contains the video data captured by a respective one of the client computers.

Plain English Translation

In the video conferencing apparatus, each client receives the video stream from every other client. The server relays each individual video feed separately, allowing all participants to see each other. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality, and individual video streams described previously remains the same in the server/client apparatus.

Claim 26

Original Legal Text

26. The apparatus according to claim 25 , wherein the client computers are configured to display the video data captured by the respective one of the client computers in a respective window among multiple windows displayed by each of the client computers.

Plain English Translation

The client computers in the video conferencing apparatus display each participant's video in a separate window on their screen, providing a multi-view conference experience. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality, individual video streams and multi-window display described previously remains the same in the server/client apparatus.

Claim 27

Original Legal Text

27. The apparatus according to claim 26 , wherein the client computers are configured to control the multiple windows so that the video data conveyed from each of the client computers are synchronized with the mixed audio data.

Plain English Translation

The client computers in the video conferencing apparatus control the multiple windows to ensure video is synchronized with audio, using timing information, for proper audio-visual alignment. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality, individual video streams, multi-window display and synchronized display described previously remains the same in the server/client apparatus.

Claim 28

Original Legal Text

28. The apparatus according to claim 25 , wherein the server is configured to pass the video data from the uplink video packets to the downlink video packets without transcoding of the video data at the server.

Plain English Translation

The server relays the video data without transcoding it, passing the original stream directly to clients. This reduces server processing and latency. The core video conferencing architecture, audio mixing, video relaying without transcoding, sync packet functionality, individual video streams, multi-window display and synchronized display described previously remains the same in the server/client apparatus.

Claim 29

Original Legal Text

29. The apparatus according to claim 16 , the client computers are configured to transmit synchronization data to the server, and wherein the server is configured to generate the synchronization information based on the synchronization data, and to transmit the synchronization information to the client computers for use in synchronizing the video data with the mixed audio data.

Plain English Translation

In the video conferencing apparatus, client computers send synchronization data to the server, which uses it to generate synchronization information sent back to clients to synchronize video and audio streams, achieving proper lip-sync. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality and server-side sync data generation based on client data described previously remains the same in the server/client apparatus.

Claim 30

Original Legal Text

30. The apparatus according to claim 16 , wherein the plurality of client computers comprises at least three client computers that participate in the video teleconference.

Plain English Translation

The video conferencing apparatus supports teleconferences with at least three client computers, allowing for multi-party video communication. The core video conferencing architecture, audio mixing, video relaying, sync packet functionality and multiple participants (3 or more) described previously remains the same in the server/client apparatus.

Claim 31

Original Legal Text

31. A server, comprising: a network interface, which is coupled to establish communication links over a packet network with a plurality of client computers that are to participate in a video teleconference, and to receive from the client computers uplink audio packets and uplink video packets, which respectively contain audio and video data captured by each of the client computers; and a processor, which is configured to mix the audio data from the uplink audio packets so as to create respective streams of mixed audio data for transmission to the client computers and to generate at least one synchronization packet containing synchronization information, and to transmit to the client computers via the network interface downlink audio packets containing the respective streams of mixed audio data in addition to the at least one synchronization packet while relaying the video data from the uplink video packets to the client computers in downlink video packets.

Plain English Translation

A video conferencing server includes a network interface and processor. The network interface establishes communication with client computers and receives audio/video data. The processor mixes audio streams, generates synchronization packets, and transmits mixed audio and relayed video data to the clients using the network interface. This server performs audio mixing, video relaying and sync packet functionality.

Claim 32

Original Legal Text

32. A physical computer program product comprising a computer-readable medium having executable computer-readable program code embodied therein, the executable computer-readable program code for implementing a method for communication, the method comprising: establishing communication links over a packet network between a server and a plurality of client computers that are to participate in a video teleconference; receiving from the client computers uplink audio packets and uplink video packets, which respectively contain audio and video data captured by each of the client computers; mixing the audio data from the uplink audio packets so as to create respective streams of mixed audio data for transmission to the client computers; generating at least one synchronization packet containing synchronization information; and, transmitting to the client computers via the network interface downlink audio packets containing the respective streams of mixed audio data in addition to the at least one synchronization packet, while relaying the video data from the uplink video packets to the client computers in downlink video packets.

Plain English Translation

A computer program product stored on a computer-readable medium implements video conferencing. The program establishes connections between a server and client computers, receives audio/video data, mixes audio streams, generates sync packets, and transmits mixed audio and relayed video data to the clients. This software implements audio mixing, video relaying and sync packet functionality.

Claim 33

Original Legal Text

33. A client computer, comprising: a user interface; and a processor, which is configured to establish a communication link over a packet network with a server so as to participate in a video teleconference, and to transmit uplink audio packets and uplink video packets, which respectively contain audio and video data captured by the client computer, wherein the processor is configured to receive from the server downlink audio packets containing the a stream of mixed audio data generated by the server, to receive synchronization packets containing synchronization information generated by the server, and to receive downlink video packets containing the video data transmitted by other client computers in the video teleconference, and to synchronize the video data with the mixed audio data based on the synchronization information for output via the user interface.

Plain English Translation

A video conferencing client computer includes a user interface and processor. The processor connects to a server, transmits audio/video, receives mixed audio and video data, and synchronizes video with mixed audio based on received synchronization packets, for output via the user interface.

Patent Metadata

Filing Date

Unknown

Publication Date

October 7, 2014

Inventors

Eran Kariti
Sergey Pesherov
Vladislav Gelfer

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Video conferencing over IP networks” (8856371). https://patentable.app/patents/8856371

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/8856371. See llms.txt for full attribution policy.