Systems, apparatuses, and methods related to extended memory communication subsystems for performing extended memory operations are described. An example apparatus can include a plurality of computing devices. Each of the computing devices can include a processing unit configured to perform an operation on a block of data, and a memory array configured as a cache for each respective processing unit. The example apparatus can further include a first communication subsystem coupled to a host and to each of the plurality of communication subsystems. The example apparatus can further include a plurality of second communication subsystems coupled to each of the plurality of computing devices. Each of the plurality of computing devices can be configured to receive a request from the host, send a command to execute at least a portion of the operation, and receive a result of performing the operation from the at least one hardware accelerator.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
2. The apparatus of claim 1, wherein the plurality of second communication subsystems comprises a plurality of interconnect interfaces.
3. The apparatus of claim 1, wherein the first communication subsystem is a peripheral component interconnect express (PCIe) interface.
The invention relates to a communication apparatus designed to facilitate data transfer between a host system and a peripheral device. The apparatus addresses the need for efficient and reliable communication in computing systems, particularly where high-speed data transfer is required. The apparatus includes a first communication subsystem that interfaces with the host system and a second communication subsystem that interfaces with the peripheral device. These subsystems are configured to convert data between different communication protocols, enabling seamless data exchange. The first communication subsystem is specifically implemented as a peripheral component interconnect express (PCIe) interface, which is a high-speed serial expansion bus standard widely used in computing systems for connecting peripheral devices. The PCIe interface ensures low-latency and high-bandwidth communication, making it suitable for applications requiring rapid data transfer. The apparatus may also include a controller that manages the data conversion and transfer processes, ensuring compatibility and reliability between the host system and the peripheral device. The overall design aims to optimize data transfer efficiency while maintaining system stability and performance.
4. The apparatus of claim 1, wherein a particular one of the plurality of second communication subsystems is a controller and the controller is coupled to a memory device.
5. The apparatus of claim 4, wherein the memory device comprises at least one of a double data rate (DDR) memory, a three-dimensional (3D) cross-point memory, a NAND memory, or any combination thereof.
6. The apparatus of claim 1, wherein the accelerator is on-chip and is coupled to a static random access device (SRAM).
7. The apparatus of claim 1, wherein the accelerator is on-chip and is coupled to an arithmetic logic unit (ALU) configured to perform an arithmetic operation or a logical operation, or both.
8. The apparatus of claim 1, wherein the processing unit of each of the plurality of computing devices is configured with a reduced instruction set architecture.
10. The apparatus of claim 1, wherein each of the plurality of computing devices is configured as reduced instruction set computer (RISC) compliant.
11. The apparatus of claim 1, wherein the at least one hardware accelerator is configured to perform the extended memory operation by accessing a non-volatile memory device coupled to the plurality of second communication subsystems.
12. The apparatus of claim 1, wherein the at least one hardware accelerator is configured to send a request for an additional hardware accelerator to perform an additional portion of the extended memory operation.
The invention relates to a computing system with hardware accelerators for managing extended memory operations. The system addresses the challenge of efficiently handling large-scale memory operations that exceed the capacity or performance limits of a single hardware accelerator. The apparatus includes at least one hardware accelerator designed to offload memory-intensive tasks from a central processing unit (CPU), improving overall system performance. When a memory operation exceeds the capabilities of the primary hardware accelerator, it is configured to dynamically request an additional hardware accelerator to handle a portion of the extended operation. This distributed approach ensures that memory operations are processed efficiently without overloading any single component. The system may include multiple hardware accelerators, each capable of performing specialized memory tasks, such as data compression, encryption, or caching. The apparatus also manages communication between the accelerators and the CPU, ensuring seamless coordination of memory operations. This design enhances scalability and performance in systems requiring high-speed memory access and processing.
14. The system of claim 13, wherein the plurality of computing devices, the first communication subsystem, and the second plurality of communication subsystems are configured on a field programmable gate array (FPGA) and the non-volatile memory device is external to the FPGA.
15. The system of claim 13, wherein the plurality of computing devices are each configured as a reduced instruction set computer (RISC)-V compliant.
A system for distributed computing includes multiple computing devices interconnected to perform parallel processing tasks. The system is designed to address challenges in scalability, efficiency, and compatibility in distributed computing environments. Each computing device in the system is configured to execute tasks independently while coordinating with others to achieve a common computational goal. The devices are interconnected via a network, allowing for data sharing and task distribution. The system may include a central controller to manage task allocation, monitor performance, and ensure synchronization among the devices. Additionally, the system may incorporate fault tolerance mechanisms to handle device failures or network disruptions, ensuring continuous operation. The computing devices are each configured as RISC-V compliant, meaning they adhere to the open-standard RISC-V instruction set architecture. This compliance ensures compatibility, flexibility, and scalability across different hardware implementations. The RISC-V architecture allows for customization of the instruction set, enabling optimization for specific workloads while maintaining interoperability. The system leverages the RISC-V compliance to enhance performance, reduce power consumption, and simplify integration with various software and hardware components. The overall design aims to provide a robust, efficient, and adaptable distributed computing solution suitable for high-performance computing, data processing, and other computationally intensive applications.
18. The system of claim 17, wherein the peripheral port is coupled to an off-chip serial port through the first communication subsystem and through at least one of the second plurality of communication subsystems.
A system for managing data communication in an integrated circuit includes multiple communication subsystems that handle different types of data traffic. The system addresses the challenge of efficiently routing data between various components, such as on-chip peripherals and off-chip devices, while minimizing latency and congestion. The system includes a peripheral port that connects to an off-chip serial port through a first communication subsystem and at least one additional communication subsystem. The first communication subsystem is responsible for initial data processing, while the second communication subsystem further routes the data to the off-chip serial port. This architecture allows for flexible and scalable data routing, ensuring that data is transmitted efficiently between internal and external components. The system may also include additional communication subsystems that handle other types of data traffic, such as memory access or inter-processor communication, further optimizing overall system performance. The design ensures that data flows smoothly between different parts of the system without bottlenecks, improving reliability and speed in data-intensive applications.
19. The system of claim 13, wherein the first communication subsystem is directly coupled to at least one of the second plurality of communication subsystems.
A system for managing communication between multiple subsystems in a distributed network environment addresses the challenge of ensuring efficient and reliable data exchange across decentralized components. The system includes a first communication subsystem and a second plurality of communication subsystems, each capable of transmitting and receiving data. The first communication subsystem is directly coupled to at least one of the second plurality of communication subsystems, enabling direct data transfer without intermediate routing. This direct coupling reduces latency and improves communication efficiency by eliminating the need for additional hops or intermediaries. The system may also include a control module that coordinates data flow, ensuring synchronization and error handling across the subsystems. The direct coupling can be implemented using wired or wireless connections, depending on the application requirements. This configuration is particularly useful in scenarios where low-latency communication is critical, such as in industrial automation, real-time monitoring, or distributed computing environments. The system may further include redundancy mechanisms to maintain communication integrity in case of subsystem failures. By optimizing the communication pathways, the system enhances overall network performance and reliability.
20. The system of claim 19, wherein the at least one of the second plurality of communication subsystem is configured to transfer the block of data from the non-volatile memory device to the first communication subsystem and to the host, wherein the transfer of the block of data bypasses the plurality of computing devices.
21. The system of claim 19, wherein an AXI interconnect that directly couples the first communication subsystem to at least one of the second plurality of communication subsystem is a faster AXI interconnect than an AXI interconnect that couples the plurality of computing devices to the first communication subsystem and to the at least one of the second plurality of communication subsystems.
23. The method of claim 22, wherein the reduced size block of data is transferred to the host via a PCIe interface coupled to the first communication subsystem.
24. The method of claim 22, further comprising causing, using a memory controller, the block of data to be transferred from the memory device to the second communication subsystem and subsequently to the first communication subsystem, wherein the block of data bypasses the plurality of computing devices.
This invention relates to data transfer systems in computing environments, specifically addressing inefficiencies in data movement between memory devices and communication subsystems. The problem solved is the unnecessary processing overhead when data must pass through multiple computing devices during transfer, which consumes computational resources and introduces latency. The invention describes a method for optimizing data transfer by bypassing intermediate computing devices. A memory controller is used to directly transfer a block of data from a memory device to a second communication subsystem, which then forwards the data to a first communication subsystem. This bypass route eliminates the need for the data to traverse through multiple computing devices, reducing processing overhead and improving transfer efficiency. The method ensures that the data is routed directly between the memory device and the communication subsystems, minimizing delays and conserving system resources. The invention may also include additional features, such as determining a data transfer path based on system conditions or prioritizing certain data transfers. The memory controller plays a central role in managing the transfer, ensuring that the data is routed efficiently without unnecessary intermediate processing. This approach is particularly useful in high-performance computing environments where minimizing latency and maximizing throughput are critical.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 26, 2020
October 25, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.