Based on a predetermined number of available processor sockets, a plurality of candidate matrix decompositions are identified, which correspond to a multiplication of matrices. Based on a first comparative relationship of a variation of first sizes of the plurality of candidate matrix decompositions along a first dimension and a second comparative relationship of a variation of second sizes of the plurality of candidate matrix decomposition sizes along a second dimension, a given candidate matrix decomposition is selected. Processing of the multiplication among the processor sockets is distributed based on the given candidate matrix decomposition.
Legal claims defining the scope of protection, as filed with the USPTO.
3. The apparatus of claim 2, wherein the third load balancing metrics comprise at least one of a cache block size and a processor core per last level cache number.
4. The apparatus of claim 2, wherein the processing nodes comprises non-uniform memory access (NUMA) nodes.
5. The apparatus of claim 1, wherein the first load balancing metrics comprise metrics to bias the selection of the given candidate decomposition to favor vertical partitioning.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
March 24, 2023
August 13, 2024
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.