Legal claims defining the scope of protection, as filed with the USPTO.
1. A computing system, comprising: a memory; a slave computing device, comprising: a plurality of computing units, wherein each computing unit is configured to perform multiple computations in parallel according to a single instruction multiple data (SIMD) manner; and a first translation lookaside buffer (TLB), configured to store a plurality of virtual address entries; and a master computing device, comprising: a memory controller, configured to perform a read operation and a write operation to the memory; at least one processing unit, configured to access the memory via the memory controller to execute a program; and an input-output memory management unit (IOMMU), comprising a second translation lookaside buffer, configured to store a plurality of virtual address entries, wherein each virtual address entry is configured to store a virtual address requested by the slave computing device, a physical address corresponding to the virtual address, a recent use time and a dependent workload, wherein the virtual address is used in a specific instruction performed by a specific computing unit of the plurality of computing units, and the dependent workload of the virtual address is an amount of virtual address translations requested by the specific computing unit to perform the specific instruction; wherein: when the plurality of computing units access a first virtual address, the plurality of computing units transfer the first virtual address to the first translation lookaside buffer to obtain a first physical address corresponding to the first virtual address; when the first translation lookaside buffer does not store a virtual address entry comprising the first virtual address, the first translation lookaside buffer is configured to send a first translation request to the input-output memory management unit to obtain the first physical address corresponding to the first virtual address; when the input-output memory management unit receives the first translation request, and the second translation lookaside buffer does not store a virtual address entry comprising the first virtual address, the input-output memory management unit is configured to traverse a plurality of page tables of the memory controller to obtain the first physical address corresponding to the first virtual address, select a first virtual address entry from the plurality of virtual address entries according to a recent use time and a dependent workload of each of the plurality of virtual address entries, and clear the first virtual address entry to store the first virtual address and the first physical address.
2. The computing system of claim 1, wherein the first virtual address entry is a virtual address entry having a longest recent use time and a largest dependent workload among the plurality of virtual address entries.
3. The computing system of claim 1, wherein the input-output memory management unit further comprises: a translation request cache, configured to store a plurality of translation requests issued by the first translation lookaside buffer; and a page table walker, configured to, while being idle, select at least one translation request from the plurality of translation requests, and traverse the plurality of page tables of the memory controller to obtain at least one physical address corresponding to at least one virtual address of the translation request.
4. The computing system of claim 1, wherein the first translation lookaside buffer is further configured to calculate a dependent workload of the first virtual address, and the first translation request comprises the first virtual address and the dependent workload of the first virtual address.
5. The computing system of claim 4, wherein when the input-output memory management unit receives a second translation request comprising a second virtual address and a second dependent workload, and the second translation lookaside buffer has stored a second virtual address entry comprising the second virtual address, the second translation lookaside buffer is further configured to change a dependent workload previously recorded in the second virtual address entry to the second dependent workload.
6. The computing system of claim 5, wherein the second translation lookaside buffer is further configured to reset a recent use time of the second virtual address entry, after changing the dependent workload previously recorded in the second virtual address entry to the second dependent workload.
7. The computing system of claim 1, wherein the first translation lookaside buffer is further configured to set a first dependent work number of the first virtual address according to a computing unit and instruction corresponding to the first virtual address, and the first translation request comprises the first virtual address and the first dependent work number.
8. The computing system of claim 7, wherein the second translation lookaside buffer is further configured to: store the first dependent work number in the first virtual address entry; increase a dependent workload of at least one virtual address entry that also comprises the first dependent work number in the second translation lookaside buffer by one, to obtain a first dependent workload; and store the first dependent workload in the first virtual address entry.
9. The computing system of claim 7, wherein, when the input-output memory management unit receives a second translation request comprising a second virtual address and a second dependent work number, and the second translation lookaside buffer has stored a second virtual address entry comprising the second virtual address, the second translation lookaside buffer is further configured to: change a dependent work number of the second virtual address entry to the second dependent work number; increase a dependent workload of at least one virtual address entry that also comprises the second dependent work number in the second translation lookaside buffer by one, to obtain a second dependent workload; and store the second dependent workload in the second virtual address entry.
10. The computing system of claim 1, wherein the master computing device is a central processing unit, and the slave computing device is a graphic processing unit.
11. A master computing device, comprising: a memory controller; at least one processing unit; and an input-output memory management unit of claim 1.
12. A slave computing device, comprising: a plurality of computing units, wherein each computing unit is configured to perform multiple computations in parallel according to a single instruction multiple data (SIMD) manner; and a first translation lookaside buffer, configured to store a plurality of virtual address entries; wherein, when the first translation lookaside buffer receives a request to obtain a first physical address corresponding to a first virtual address and the first translation lookaside buffer does not store a virtual address entry comprising the first virtual address, the first translation lookaside buffer calculates a dependent workload of the first virtual address, and sends a first translation request comprising the first virtual address and the dependent workload of the first virtual address to an input-output memory management unit.
13. A computing system method, wherein the computing system comprises a slave computing device and a master computing device, wherein the slave computing device comprises a first translation lookaside buffer, the master computing device comprises an input-output memory management unit, and the input-output memory management unit comprises a second translation lookaside buffer, and the method comprises: storing a plurality of virtual address entries in the second translation lookaside buffer, wherein, in each virtual address, a virtual address requested by the slave computing device, a physical address corresponding to the virtual address, a recent use time and a dependent workload are stored, the virtual address is used in a specific instruction performed by a specific computing unit of the slave computing device, and the dependent workload of the virtual address is an amount of virtual address translations requested by the specific computing unit to perform the specific instruction; the slave computing device accessing a first virtual address; looking up a first physical address corresponding to the first virtual address in the first translation lookaside buffer; the first translation lookaside buffer not storing a virtual address entry comprising the first virtual address; using the first translation lookaside buffer to send a first translation request to the input-output memory management unit to obtain the first physical address corresponding to the first virtual address; the input-output memory management unit receiving the first translation request, wherein the second translation lookaside buffer of the input-output memory management unit does not store a virtual address entry comprising the first virtual address; traversing a plurality of page tables of the memory controller to obtain the first physical address corresponding to the first virtual address; selecting a first virtual address entry from the plurality of virtual address entries according to a recent use time and a dependent workload of each of the plurality of virtual address entries; and clearing the first virtual address entry to store the first virtual address and the first physical address.
14. The method of claim 13, wherein the first virtual address entry is a virtual address entry having a longest recent use time and a largest dependent workload among the plurality of virtual address entries.
15. The method of claim 13, wherein the input-output memory management unit further comprises a translation request cache and a page table walker, and the method further comprises: storing a plurality of translation requests issued by the first translation lookaside buffer to the translation request cache; the page table walker being idle; selecting at least one translation request from the plurality of translation requests; and using the page table walker to traverse the plurality of page tables to obtain at least one physical address corresponding to at least one virtual address of the translation request.
16. The method of claim 13, further comprising: using the first translation lookaside buffer to calculate a dependent workload of the first virtual address; the input-output memory management unit receiving a second translation request comprising a second virtual address and a second dependent workload, wherein the second translation lookaside buffer has stored a second virtual address entry comprising the second virtual address; and changing a dependent workload previously recorded in the second virtual address entry to the second dependent workload; wherein the first translation request comprises the first virtual address and the dependent workload of the first virtual address.
17. The method of claim 16, further comprising: changing the dependent workload previously recorded in the second virtual address entry to the second dependent workload; and resetting a recent use time of the second virtual address entry.
18. The method of claim 13, further comprising: setting a first dependent work number of the first virtual address according to a computing unit and an instruction corresponding to the first virtual address; wherein the first translation request comprises the first virtual address and the first dependent work number.
19. The method of claim 18, further comprising: storing the first dependent work number in the first virtual address entry to the second translation lookaside buffer; increasing a dependent workload of at least one virtual address entry that also comprises the first dependent work number in the second translation lookaside buffer by one, to obtain a first dependent workload; and storing the first dependent workload in the first virtual address entry.
20. The method of claim 18, further comprising: the input-output memory management unit receiving a second translation request comprising a second virtual address and a second dependent work number, wherein the second translation lookaside buffer has stored a second virtual address entry comprising the second virtual address; changing a dependent work number of the second virtual address entry to the second dependent work number; adding a dependent workload of at least one virtual address entry that also comprises the second dependent work number in the second translation lookaside buffer by one, to obtain a second dependent workload; and storing the second dependent workload in the second virtual address entry.
Unknown
March 11, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.