System-in-packages (SiPs) having vertically integrated processing units and combined high-bandwidth memory (HBM) devices, and associated devices and methods, are disclosed herein. In some embodiments, the SiP includes a processing unit and a HBM device carried by the processing unit. Further, the combined HBM device can include one or more volatile memory dies and one or more non-volatile memory dies. The SiP can also include a shared through silicon via (TSV) bus that electrically couples combined HBM device can also include a shared bus that is electrically coupled to each of the processing unit, the one or more volatile memory dies, and the one or more non-volatile memory dies to establish communication paths therebetween.
Legal claims defining the scope of protection, as filed with the USPTO.
a processing unit; one or more volatile memory dies; and one or more non-volatile memory dies; and a combined high-bandwidth (HBM) device carried by the processing unit, wherein the combined HBM device comprises: a through silicon via (TSV) bus electrically coupled to each of the processing unit, the one or more volatile memory dies, and the one or more non-volatile memory dies. . A system-in-package (SiP) device, comprising:
claim 1 a first adapter electrically coupled to the one or more volatile memory dies; a first controller electrically coupled between the first adapter and the processing unit, wherein the first controller is configured to manage operation of the one or more volatile memory dies; a second adapter electrically coupled to the one or more non-volatile memory dies; and a second controller electrically coupled between the second adapter and the processing unit, wherein the second controller is configured to manage operation of the one or more non-volatile memory dies. . The SiP device of, wherein the combined HBM device further comprises an interface die carried by the processing unit, and wherein the interface die includes:
claim 2 a first three-dimensional TSV input-and-output (3D TSV I/O) interface electrically coupled between the one or more volatile memory dies and the first adapter, wherein the first 3D TSV I/O interface is coupled to the one or more volatile memory dies via a first set of TSVs of the TSV bus; a second 3D TSV I/O interface electrically coupled between the first controller and the processing unit; a third three-dimensional TSV input-and-output (3D TSV I/O) interface electrically coupled between the one or more non-volatile memory dies and the second adapter, wherein the third 3D TSV I/O interface is coupled to the one or more non-volatile memory dies via a second set of TSVs of the TSV bus, wherein the second set of TSVs pass through the one or more volatile memory dies; and a fourth 3D TSV I/O interface electrically coupled between the second controller and the processing unit. . The SiP device of, wherein the interface die further includes:
claim 1 a controller die carried by the processing unit, wherein the controller die is electrically coupled between the processing unit and the one or more volatile memory dies by the TSV bus, and wherein the controller die is configured to manage operation of the one or more volatile memory dies. . The SiP device of, wherein the combined HBM device further comprises:
claim 1 a controller die carried by the one or more volatile memory dies, wherein the controller die is electrically coupled between the processing unit and the one or more non-volatile memory dies by the TSV bus, and wherein the controller die is configured to manage operation of the one or more non-volatile memory dies. . The SiP device of, wherein the combined HBM device further comprises:
claim 1 . The SiP device of, wherein the processing unit includes a controller electrically coupled to the one or more volatile memory dies by the TSV bus, wherein the controller is configured to manage operation of the one or more volatile memory dies.
claim 1 . The SiP device of, wherein the processing unit includes a controller electrically coupled to the one or more non-volatile memory dies by the TSV bus, wherein the controller is configured to manage operation of the one or more non-volatile memory dies.
claim 1 . The SiP device of, wherein the SiP device does not include an interposer die electrically coupled to the processing unit.
claim 1 . The SiP device of, wherein the combined HBM device does not include an interface die between the processing unit and the one or more volatile memory dies.
generating a request for a subset of a set of data stored in a plurality of non-volatile memory dies in a combined high-bandwidth (HBM) device; writing a copy of the subset to a plurality of volatile memory dies in the combined HBM device, wherein the plurality of non-volatile memory dies is carried by the plurality of volatile memory dies; reading the subset from the plurality of volatile memory dies into a processing unit, wherein the plurality of volatile memory dies is carried by the processing unit; processing, at the processing unit, the subset; and writing a result of processing the subset to the plurality of volatile memory dies. . A method, comprising:
claim 10 . The method of, wherein writing the copy of the subset comprises writing the copy of the subset via a through silicon via (TSV) bus electrically coupled to each of the processing unit, the plurality of volatile memory dies, and the plurality of non-volatile memory dies.
claim 10 . The method of, wherein writing the copy of the subset comprises writing the copy of the subset via a through silicon via (TSV) bus electrically coupled to each of the processing unit, the plurality of volatile memory dies, and the plurality of non-volatile memory dies.
claim 10 . The method of, wherein generating the request for the subset is performed by a controller included in an interface die in the combined HBM device, wherein the interface die is carried by the processing unit, and wherein the plurality of volatile memory dies is carried by the interface die.
claim 10 . The method of, wherein generating the request for the subset is performed by a controller die included in the combined HBM device, wherein the controller die is carried by the processing unit, and wherein at least one of the plurality of volatile memory dies or the plurality of non-volatile memory dies is carried by the controller die.
claim 10 . The method of, wherein generating the request for the subset is performed by a controller included in the processing unit.
claim 10 . The method of, further comprising writing the result of processing the subset to the plurality of non-volatile memory dies.
writing a set of data to one or more volatile memory dies in a combined high-bandwidth (HBM) device through a through a silicon via (TSV) bus; receiving, at a processing unit carrying the combined HBM device, a power down or idle request; and in response to the power down or idle request, controlling the combined HBM device to write the set of data from the one or more volatile memory dies to one or more non-volatile memory dies in the combined HBM device through the TSV bus. . A method, comprising:
claim 17 . The method of, further comprising writing a copy of the set of data to the one or more non-volatile memory dies, through the TSV bus, before receiving the power down or idle request to store a backup of the set of data in the one or more non-volatile memory dies.
claim 17 receiving, at the processing unit, a power up or wake up request; and in response to the power up or wake up request, controlling the combined HBM device to write, through the TSV bus, the set of data from the one or more non-volatile memory dies back to the one or more volatile memory dies. . The method of, further comprising:
claim 17 reading, through the TSV bus, the set of data from the one or more volatile memory dies to use at least a portion of the set of data in a computer processing operation; and writing, through the TSV bus, a result of the computer processing operation to the one or more volatile memory dies. . The method of, further comprising:
Complete technical specification and implementation details from the patent document.
The present application claims priority to U.S. Provisional Patent Application No. 63/669,083, filed Jul. 9, 2024, the disclosure of which is incorporated herein by reference in its entirety.
The present technology is generally related to computing and memory systems and more specifically to vertically integrated computing and memory systems for semiconductor packages.
Microelectronic devices, such as memory devices, microprocessors, and other electronics, typically include one or more semiconductor dies mounted to a substrate and encased in a protective covering. The semiconductor dies include functional features, such as memory cells, processor circuits, imager devices, interconnecting circuitry, etc. To meet continual demands on decreasing size, wafers, individual semiconductor dies, and/or active components are typically manufactured in bulk, singulated, and then stacked on a support substrate (e.g., a printed circuit board (PCB) or other suitable substrates). The stacked dies can then be coupled to the support substrate (sometimes also referred to as a package substrate) through bond wires in shingle-stacked dies (e.g., dies stacked with an offset for each die) and/or through substrate vias (TSVs) between the dies and the support substrate.
The drawings have not necessarily been drawn to scale. Further, it will be understood that several of the drawings have been drawn schematically and/or partially schematically. Similarly, some components and/or operations can be separated into different blocks or combined into a single block for the purpose of discussing some of the implementations of the present technology. Moreover, while the technology is amenable to various modifications and alternative forms, specific implementations have been shown by way of example in the drawings and are described in detail below. The intention, however, is not to limit the technology to the particular implementations described.
High data reliability, high speed of memory access, lower power consumption, and reduced chip size are features that are demanded from semiconductor memory. In recent years, three-dimensional (3D) memory devices have been introduced. Some 3D memory devices are formed by stacking memory dies vertically, and interconnecting the dies using through-silicon (or through-substrate) vias (TSVs). Benefits of the 3D memory devices include shorter interconnects (which reduce circuit delays and power consumption), a large number of vertical vias between layers (which allow wide bandwidth buses between functional blocks, such as memory dies, in different layers), and a considerably smaller footprint. Thus, the 3D memory devices contribute to higher memory access speed, lower power consumption, and chip size reduction. Example 3D memory devices include Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM). For example, HBM is a type of memory that includes a vertical stack of dynamic random-access memory (DRAM) dies and an interface die (which, e.g., provides the interface between the DRAM dies of the HBM device and a host device).
In a system-in-package (SiP) configuration, HBM devices may be integrated with a host device (e.g., a graphics processing unit (GPU) and/or computer processing unit (CPU)) using a base substrate (e.g., a silicon interposer, a substrate of organic material, a substrate of inorganic material and/or any other suitable material that provides interconnection between GPU/CPU and the HBM device and/or provides mechanical support for the components of a SiP device), through which the HBM devices and host communicate. Because traffic between the HBM devices and host device resides within the SiP (e.g., using signals routed through the silicon interposer), a higher bandwidth may be achieved between the HBM devices and host device than in conventional systems. In other words, the TSVs interconnecting DRAM dies within an HBM device, and the silicon interposer integrating HBM devices and a host device, enable the routing of a greater number of signals (e.g., wider data buses) than is typically found between packaged memory devices and a host device (e.g., through a printed circuit board (PCB)). The high bandwidth interface within a SiP enables large amounts of data to move quickly between the host device (e.g., GPU/CPU) and HBM devices during operation. For example, the high bandwidth channels can be on the order of 1000 gigabytes per second (GB/s, sometimes also referred to as gigabits (Gb)). It will be appreciated that such high bandwidth data transfer between a GPU/CPU and the memory of HBM devices can be advantageous in various high-performance computing applications, such as video rendering, high-resolution graphics applications, artificial intelligence and/or machine learning (AI/ML) computing systems and other complex computational systems, and/or various other computing applications.
1 FIG. 1 FIG. 1 FIG. 1 FIG. 100 100 110 120 130 112 100 140 110 120 122 124 126 126 128 152 128 120 128 110 120 120 128 128 128 120 122 110 122 124 126 128 126 120 is a schematic diagram illustrating an environmentthat incorporates a high bandwidth memory architecture. As illustrated in, the environmentincludes a SiP devicehaving one or more processing devices(one illustrated in, sometimes also referred to herein as one or more “hosts”), and one or more HBM devices(one illustrated in), integrated with a silicon interposer(or any other suitable base substrate). The environmentadditionally includes a storage devicecoupled to the SiP device. The processing devices(s)can include one or more CPUs and/or one or more GPUs, referred to as a CPU/GPU, each of which may include a registerand a first level of cache. The first level of cache(also referred to herein as “L1 cache”) is communicatively coupled to a second level of cache(also referred to herein as “L2 cache”) via a first communication path. In the illustrated embodiment, the L2 cacheis incorporated into the processing device(s). However, it will be understood that the L2 cachecan be integrated into the SiP deviceseparate from the processing device(s). Purely by way of example, the processing device(s)can be carried by a base substrate (e.g., an interposer that is itself carried by a package substrate) adjacent to the L2 cacheand in communication with the L2 cachevia one or more signal lines (or other suitable signal route lines) therein. The L2 cachemay be shared by one or more of the processing devices(and CPU/GPUtherein). During operation of the SiP device, the CPU/GPUcan use the registerand the L1 cacheto complete processing operations, and attempt to retrieve data from the larger L2 cachewhenever a cache miss occurs in the L1 cache. As a result, the multiple levels of cache can help accelerate the average time it takes for the processing device(s)to access data, thereby accelerating the overall processing rates.
1 FIG. 1 FIG. 128 130 154 120 128 130 112 154 112 128 140 156 140 110 112 110 140 156 110 120 130 140 128 As further illustrated in, the L2 cacheis communicatively coupled to the HBM device(s)through a second communication channel. As illustrated, the processing device(s)(and the L2 cachetherein) and HBM device(s)are carried by and electrically coupled (e.g., integrated by) the silicon interposer. The second communication channelis provided by the silicon interposer(e.g., the silicon interposer includes and routes the interface signals forming the second communication channel, such as through one or more redistribution layers (RDLs)). As additionally illustrated in, the L2 cacheis also communicatively coupled to a storage devicethrough a third communication channel. As illustrated, the storage deviceis outside of the SiP device, and utilizes signal routing components that are not contained within the silicon interposer(e.g., between a packaged SiP deviceand packaged storage device). For example, the third communication channelmay be a peripheral bus used to connect components on a motherboard or PCB, such as a Peripheral Component Interconnect Express (PCIe) bus. As a result, during operation of the SiP device, the processing device(s)can read data from and/or write data to the HBM device(s)and/or the storage device, through the L2 cache.
100 130 132 154 130 112 120 154 112 130 132 120 130 132 130 110 112 130 130 110 130 1 FIG. In the illustrated environment, the HBM devicesinclude one or more stacked volatile memory dies(e.g., DRAM dies, one illustrated schematically in) coupled to the second communication channel. As explained above, the HBM device(s)can be located on the silicon interposer, on which the processing device(s)are also located. As a result, the second communication channelcan provide a high bandwidth (e.g., on the order of 1000 GB/s) channel through the silicon interposer. Further, as explained above, each HBM device(s)can provide a high bandwidth channel (not shown) between the volatile memory diestherein. As a result, data can be communicated between the processing device(s)and the HBM device(s)(and the volatile memory diestherein) at high speeds, which can be advantageous for data-intensive processing operations. Although the HBM device(s)of the SiP deviceprovide relatively high bandwidth communication, their integration on the silicon interposersuffers from certain shortcomings. For example, each HBM device(s)may provide a limited amount of storage (e.g., on the order of 16 GB each), where the total storage provided by all of the HBM devicesmay be insufficient to maintain the working data set of an operation to be performed by the SiP device. Additionally, or alternatively, the HBM device(s)are made up of volatile memory (e.g., each requires power to maintain the stored data, and the data is lost once the HBM device is powered down and/or suffers an unexpected power loss).
130 140 140 110 140 140 110 112 110 156 154 130 156 140 110 140 110 156 1 FIG. In contrast to the characteristics of the HBM devices, the storage devicecan provide a large amount of storage (e.g., on the order of terabytes and/or tens of terabytes). The greater capacity of the storage deviceis typically sufficient to maintain the working data set of the complex operations to be performed by the SiP device. Additionally, the storage deviceis typically non-volatile (e.g., made up of NAND-based storage, such as NAND flash, as illustrated in), and therefore retains stored data even after power is lost. However, as discussed above, the storage deviceis located external to the SiP device(e.g., not placed on the silicon interposer), and instead coupled to the SiP devicethrough a communication channel (e.g., PCIe) routed over a motherboard, system board, or other form of PCB. As a result, the third communication channelcan have a relatively low bandwidth (e.g., on the order of 8 GB/s), significantly lower than the bandwidth of the second communication channel. Consequently, processing operations involving large amounts of data (e.g., graphics rendering, AI/ML processes, and the like), which do not fit within the storage capacities of the HBM device, are bottlenecked by the low bandwidth of the third communication channelas data moves between the storage deviceand the SiP device. Additionally, power-down/power-up operations that require data to move between the storage deviceand the SiP deviceare bottlenecked by the relatively low bandwidth of the third communication channel.
Vertically integrated computing and memory systems, and associated devices and methods, that address the shortcomings discussed above are disclosed herein. A vertically integrated computing and memory system can include a host device and a HBM device. The HBM device can include one or more volatile memory dies (e.g., DRAM dies) and one or more non-volatile memory dies (e.g., NAND dies, NOR dies, PCM dies, FeRAM dies, MRAM dies, and/or any other suitable dies). The HBM device can optionally include a controller die for the one or more volatile memory dies and/or a controller die for the one or more non-volatile memory dies. The vertically integrated computing and memory system can also include one or more TSVs that electrically couple the host device to the volatile memory dies and to the non-volatile memory dies to establish communication paths therebetween. As described herein, the TSVs can provide a wide communication path (e.g., on the order of 1024 I/Os) between the volatile memory dies, the non-volatile memory dies, and the host device, enabling high bandwidth therebetween. In other words, the disclosed HBM device combines both volatile memory and non-volatile memory (referred to herein as a “combined HBM device”), while providing high-bandwidth communication between the memories within the combined HBM device as well as between the combined HBM device and the host device. As explained herein, embodiments of the combined HBM device may be vertically integrated with the host device. For example, combined HBM devices may be vertically stacked on top of the host device.
Advantageously, vertically integrating memories and host devices and creating communication paths therebetween using TSVs as opposed to, for example, a SiP bus with routes extending through an interposer die, can provide a higher bandwidth communication channel between the combined HBM devices and the host device. Additionally, vertically integrating memories and host devices can eliminate the need for certain components included in conventional SiPs, such as interposer dies and interface dies. Moreover, because multiple combined HBM devices can be stacked on top of a single host device, vertically integrated computing and memory systems provide significant space savings for valuable substrate real estate. Accordingly, embodiments of the present technology provide improved functionality, cost savings, and size reduction.
130 1 FIG. Furthermore, large sets of data can be loaded into the non-volatile memory dies (e.g., from an external storage component) through a low bandwidth communication path (e.g., PCIe) during an initialization phase. Then, during processing, portions of the large data set may be transferred between the non-volatile memory dies and the volatile memory dies via a high bandwidth communication path (e.g., a TSV bus) coupled therebetween, based on the portions of the large data set being processed at a time (e.g., the working data set). In this example, the volatile memory dies of the combined HBM device can provide functionality similar to the HBM devicediscussed above with reference to. That is, for example, the volatile memory dies can provide DRAM-based storage of a working data set, accessible via a high bandwidth interface (e.g., the TSV bus) to the host devices. Once a first portion of the data set has been processed, a result can be saved to the non-volatile memory dies and a second portion of the data set can be loaded into the volatile memory dies, through the high bandwidth communication path, from the data set in the non-volatile memory dies. The process can then be repeated for the first, second, etc., portions of the data set to use the data set in any number of computations at the host device without needing to load the data set through the low bandwidth communication path.
In a specific, non-limiting example, the data set can include training data for an artificial intelligence and/or machine learning (AI/ML) model that needs to be accessed and/or processed hundreds, thousands, tens of thousands, or more of times to train the AI/ML model. In this example, the vertically integrated computing and memory system can significantly reduce the processing time by requiring the data set to only be communicated to the combined HBM device through the low bandwidth channel once during an initialization phase, and subsequently provide high bandwidth transfer of the data set (or portions thereof) between the volatile memory dies and the non-volatile memory dies of the combined HBM device, and between the host device and the combined HBM devices stacked thereon during a processing phase (e.g., reducing the processing time by hundreds of seconds, thousands of seconds, tens of thousands of seconds, or more).
Embodiments of the present technology can also improve the performance of AI/ML models compared to conventional SiPs by providing increased memory capacity, which typically limits the precision and batch size of such models. For example, the batch size has a critical impact on the convergence of the training process and the resulting accuracy of the trained model. Typically, there exists an optimal value or range of batch sizes for a given neural network and data set. If the batch size is too large, the trained model can exhibit poor generalization (or even get stuck at a local minimum). In other words, the trained model can exhibit overfitting and consequently perform poorly on samples outside the training set. Conversely, if the batch size is too small, the trained model can exhibit poor (slow) convergence speed. Fewer samples used at each training step can lead to noisier and less accurate gradient estimates. In other words, a small batch size will lead to a single sample having a (excessively) large impact on the applied variable updates, thereby extending the time it takes for the model to converge.
Additionally, or alternatively, the non-volatile memory dies can provide non-volatile storage for the data stored in the combined HBM device (e.g., the non-volatile memory dies operate as a non-volatile DRAM). In said embodiments, the non-volatile memory dies may not be usable by a host device (e.g., they may not increase the memory capacity that is made available to the host device and/or may not be used for their increase of memory capacity). In said embodiments, the non-volatile memory dies operating as non-volatile DRAM can save data from and restore data to the volatile memory dies in response to certain event, such as power-down and/or power-up. For example, in response to a power-down or idle request, data from the volatile memory dies and/or any of the caches can be stored in the non-volatile memory dies, in response to a power-down or idle request, to store a present state of the SiP device. Because the non-volatile memory dies are available through the high bandwidth communication path, the request can be satisfied much faster than communicating the data to a separate storage component (e.g., on the order of tens of milliseconds instead of several seconds). Similarly, in response to a power-up or wake-up request is received, the data can be moved back to the volatile memory dies and/or cache(s) through the high bandwidth communication paths. As a result, the saved state of the SiP can be restored, and the power-up request can be answered, within tens of milliseconds instead of the several seconds required when data must be loaded from the separate storage component.
Additional details on the vertically integrated computing and memory systems, and associated devices and methods, are set out below. For ease of reference, semiconductor packages (and their components) are sometimes described herein with reference to front and back, top and bottom, upper and lower, upwards and downwards, and/or horizontal plane, x-y plane, vertical, or z-direction relative to the spatial orientation of the embodiments shown in the figures. It is to be understood, however, that the semiconductor assemblies (and their components) can be moved to, and used in, different spatial orientations without changing the structure and/or function of the disclosed embodiments of the present technology. Additionally, signals within the semiconductor packages (and their components) are sometimes described herein with reference to downstream and upstream, forward and backward, and/or read and write relative to the embodiments shown in the figures. It is to be understood, however, that the flow of signals can be described in various other terminology without changing the structure and/or function of the disclosed embodiments of the present technology.
Further, although the memory device architectures disclosed herein are primarily discussed in the context of expanding memory capacity to improve artificial intelligence and machine learning models and/or to create non-volatile memory in a dynamic random-access memory (DRAM) component, one of skill in the art will understand that the scope of the technology is not so limited. For example, the systems and methods disclosed herein can also be deployed to expand the available high bandwidth memory for various other applications that process significant volumes of data (e.g., video rendering, decryption systems, and the like).
2 FIG. 2 FIG. 2 FIG. 200 100 200 210 220 230 230 220 220 220 212 210 212 220 222 224 226 226 228 252 228 232 262 230 254 262 210 254 228 240 256 254 256 is a schematic diagram illustrating an environmentthat incorporates an HBM architecture in accordance with some embodiments of the present technology. Similar to the environmentdiscussed above, the environmentincludes a SiP devicehaving one or more processing devices(one illustrated in) and one or more combined HBM device(s)(one illustrated in). As schematically shown, the combined HBM device(s)can be integrated on the processing device(s)(e.g., carried by the processing device(s)). Further, the processing device(s)is integrated on an interposer(e.g., a silicon interposer, another organic interposer, an inorganic interposer, and/or any other suitable base substrate). In some embodiments, however, the SiP devicedoes not include the interposer. The processing device(s)is driven by a CPU/GPUthat includes a registerand an L1 cache. The L1 cacheis communicatively coupled to an L2 cachevia a first communication channel. The L2 cacheis communicatively coupled to a stack of one or more volatile memory dies(e.g., DRAM dies) and a stack of one or more storage dies(e.g., NAND dies, NOR dies, or other suitable non-volatile memory dies) in the combined HBM device(s)through a second communication channel. The storage diescan provide a relatively large storage capacity (e.g., on the order of hundreds of gigabytes and/or a terabyte), as well as non-volatile storage within the SiP device. The second communication channelcan comprise a TSV bus. The L2 cacheis also communicatively coupled to a storage devicethrough a third communication channel. The second communication channelcan have a relatively high bandwidth (e.g., on the order of 1000 GB/s) while the third communication channelcan have a relatively low bandwidth (e.g., on the order of 8 GB/s).
230 210 240 256 262 232 254 254 262 220 262 220 2 FIG. Accordingly, the combined HBM device(s)provide the SiP devicewith high bandwidth access to a large amount of non-volatile storage, rather than needing to access the storage devicesthrough the third communication channel. Althoughillustrates an embodiment in which the storage diesare coupled to the volatile memory diesvia the second communication channel, in some embodiments, the second communication channelcan additionally or alternatively couple the storage diesto the processing device(s), and/or an additional communication channel (not shown) can couple the storage diesto the processing device(s).
230 210 232 220 210 262 220 210 254 240 256 262 220 232 240 The combination of volatile memory and non-volatile memory (e.g., via the combined HBM device(s)) within the SiP devicecan provide various advantages. For example, volatile memory such as DRAM typically provides accesses (e.g., reads and writes) that are relatively faster than non-volatile memory such as NAND, but at a lower density (e.g., storage capacity within a die footprint). In contrast, non-volatile memory such as NAND typically provides a high storage density, but can be relatively slow to access and can incur certain overheads (e.g., wear-leveling). As a result, the volatile memory diescan provide low-latency fast communication, making data quickly available to the processing device(s)of the SiP deviceas needed. The non-volatile memory diescan provide a relatively large memory capacity that is “closer” to the processing devices(e.g., accessible within the SiP devicethrough high bandwidth buses, such as the second communication channel, and/or other communication channels not shown) as compared to the storage device(e.g., accessible through the slower third communication channel, such as PCIe). Additionally, the non-volatile memory diescan provide non-volatile memory capacity that is closer to the processing devicesand/or the volatile memory diesas compared to the storage deviceand/or other non-volatile memory capacity.
230 220 220 212 230 220 254 240 262 230 240 262 262 232 220 254 Furthermore, because the combined HBM device(s)are integrated directly on the processing device(s)(e.g., carried by the processing device(s), as opposed to providing a communication channel therebetween through the interposer), the combined HBM device(s)can provide volatile and non-volatile memory capacity that is “closer” to the processing devices(e.g., accessible through high bandwidth buses, such as the second communication channel, and/or other communication channels not shown). As a result, for example, a relatively large data set can be communicated from the storage deviceto the non-volatile memory diesin the combined HBM device(s)to initiate a processing operation (e.g., to run an AI/ML algorithm). For example, an entire data set needed for an AI/ML operation can be copied from the storage deviceto the non-volatile memory dies. Subsets of the data set can then be rapidly communicated from the non-volatile memory diesto the volatile memory dies, then to the processing device(s)via the high bandwidth of the second communication channel(sometimes also referred to herein as a “high bandwidth communication path”).
220 232 262 240 256 230 230 220 210 200 262 210 200 200 When the processing devices(s)is finished processing the subset, a new subset can be quickly written into the volatile memory diesfrom the non-volatile memory dies, without needing to retrieve the data from the storage devicewith the attendant bottleneck in the third communication channel(sometimes also referred to herein as a “low bandwidth communication path”). Further, the processing operation can be iteratively executed (e.g., the hundreds, thousands, tens of thousands, or more iterations often used for an AI/ML algorithm) without requiring the large data set to be communicated through the bottleneck multiple times. Thus, (i) the inclusion of the combined HBM device(s)and (ii) the vertical integration of the combined HBM device(s)on the processing device(s)can increase the processing speed of the SiP device, thereby increasing the functionality of the environment. Further, because communicating data through high bandwidth channels is more efficient than communicating data through low bandwidth channels, the inclusion of the non-volatile memory diesin the SiP devicecan reduce the overall power consumption of the environmentand/or reduce the heat generated by the environment.
262 230 210 230 232 262 262 230 254 232 262 230 254 240 256 230 200 230 210 230 220 262 230 210 262 210 Additionally, or alternatively, the non-volatile memory die(s)in the combined HBM device(s)can save a copy of the data being processed and/or an overall state of the SiP devicein a non-volatile component. As a result, for example, the state of the HBM device(s)does not need to be written between the volatile memory diesand the non-volatile diesto power down and/or power up. Instead, the state can be written to the non-volatile memory diesin the combined HBM device(s). Thus, a power-down operation (sometimes also referred to herein as a “sleep operation” and/or an “idle operation”) can be completed almost instantly (e.g., by saving a copy through the high bandwidth of the second communication channel). Similarly, a power-up operation (sometimes also referred to herein as a “wake up operation”) can write the state back to the volatile memory diesfrom the non-volatile memory diesin the combined HBM device(s)via the second communication channel, instead of from the storage devicevia the third communication channel. As a result, the power-down and/or power-up operations can be accelerated from several seconds to much less than one second (e.g., tens of milliseconds). Additionally, or alternatively, the combined HBM device(s)can protect against a loss of power and/or other processing errors in the environment. For example, because the combined HBM device(s)can save a current state of SiP device(e.g., a current state of the combined HBM device(s)and/or the processing device(s)) to the non-volatile diesin milliseconds, the combined HBM device(s)can save a current state of the SiP deviceto the non-volatile diesafter a predetermined period (e.g., every ten seconds, minute, five minutes, thirty minutes, hour, two hours, twelve hours, day, and/or any other suitable period) and/or after various processing milestones without significantly delaying processing at the SiP device. As a result, a loss of power and/or other error can return to the last saved state before the loss of power and/or error, thereby losing less processing time and/or less data (e.g., restoring half of a processing operation rather than needing to start over).
200 200 210 200 200 200 200 210 230 3 9 FIGS.-B The environmentcan be configured to perform any of a wide variety of suitable computing, processing, storage, sensing, imaging, and/or other functions. For example, representative examples of systems that include the environment(and/or components thereof, such as the SiP device) include, without limitation, computers and/or other data processors, such as desktop computers, laptop computers, Internet appliances, hand-held devices (e.g., palm-top computers, wearable computers, cellular or mobile phones, automotive electronics, personal digital assistants, music players, etc.), tablets, multi-processor systems, processor-based or programmable consumer electronics, network computers, and minicomputers. Additional representative examples of systems that include the environment(and/or components thereof) include lights, cameras, vehicles, etc. With regard to these and other examples, the environmentcan be housed in a single unit or distributed over multiple interconnected units, e.g., through a communication network, in various locations on a motherboard, and the like. Further, the components of the environment(and/or any components thereof) can be coupled to various other local and/or remote memory storage devices, processing devices, computer-readable storage media, and the like. Additional details on the architecture of the environment, the SiP device, the combined HBM device(s), and processes for operation thereof, are set out below with reference to.
3 FIG. 3 FIG. 300 300 310 320 312 310 330 322 320 320 330 340 330 322 320 is a partially schematic cross-sectional diagram of a SiP deviceconfigured in accordance with some embodiments of the present technology. As illustrated in, the SiP deviceincludes a base substrate(e.g., a silicon interposer, another suitable organic substrate, an inorganic substrate, and/or any other suitable material), a processing unit or host deviceintegrated with an upper surfaceof the base substrate, and one or more combined HBM devicesintegrated with an upper surfaceof the host device. For example, as discussed in more detail below, the host deviceand individual dies included in the combined HBM devicesare communicatively coupled by a TSV busextending therethrough and therebetween. The number of combined HBM devicesintegrated with the upper surfaceof a single host devicecan be one, two, three, four, five, six, seven, eight, or more.
320 320 2 FIG. In the illustrated embodiments, the host deviceis illustrated as a single component. However, as discussed above with reference to, the host devicecan include a CPU/GPU component, a register, an L1 cache, an L2 cache, and/or various other suitable components integrated into a single package.
330 330 332 334 352 354 332 334 352 354 332 334 352 354 354 352 352 334 334 332 332 320 330 330 332 352 332 352 320 3 FIG. 3 FIG. The combined HBM deviceincludes a stack of semiconductor dies. The stack of semiconductor dies in the combined HBM devicecan include a first controller die, one or more volatile memory dies(three illustrated in), a second controller die, and one or more non-volatile memory dies(three illustrated in). The first controller diecan be operably coupled to manage operation of the volatile memory diesand the second controller diecan be operably coupled to manage operation of the non-volatile memory dies. The first controller die, the volatile memory dies, the second controller die, and the non-volatile memory diescan be integrated, or stacked, on one another. In the illustrated embodiment, the non-volatile memory diesare stacked on top of the second controller die, the second controller dieis stacked on top of the volatile memory dies, the volatile memory diesare stacked on top of the first controller die, and the first controller dieis stacked on top of the host device. In other embodiments, however, the dies of the combined HBM devicecan be stacked in a different order or arrangement. In some embodiments, the combined HBM deviceomits the first controller dieand/or the second controller die, and functionalities found in and/or operations performed by the first controller dieand/or the second controller diecan be included elsewhere, such as in the host device.
330 320 340 338 330 338 340 320 332 334 352 354 338 330 334 354 3 FIG. The dies of each combined HBM deviceare coupled to one another and to the host devicevia the TSV bus, which includes one or more TSVs(four illustrated schematically in each combined HBM device). The TSVs(sometimes also referred to herein as part of (or forming) the TSV bus) extend from the host devicethrough each of the first controller die, the volatile memory dies, the second controller die, and the non-volatile memory dies. The TSVsallow each of the dies to communicate data within the combined HBM device(e.g., between the volatile memory dies(e.g., DRAM dies) and the non-volatile memory dies(e.g., NAND dies)) at a relatively high rate (e.g., on the order of 100 GB/s, 1000 GB/s, or greater).
4 6 FIGS.and 3 FIG. 3 FIG. 330 332 352 332 352 320 320 334 354 330 354 300 300 In some embodiments, as discussed in greater detail below with reference to, each combined HBM devicecan also include an interface die (not shown in). In such embodiments, functionalities found in and/or operations performed by the first controller dieand/or the second controller diecan be included in the interface die as opposed to forming separate dies (as illustrated in). In some embodiments, the functionalities found in and/or operations performed by the first controller dieand/or the second controller dieare included in the host device. This allows the interface die and/or the host deviceto control the volatile memory diesand/or the non-volatile memory diesof the combined HBM devicein response to various read and write requests. The non-volatile memory diesprovide a relatively large, non-volatile storage (e.g., on the order of hundreds of gigabytes, a terabyte, and/or the like) within the SiP device. As a result, relatively large data sets and/or the like can be stored fully within the SiP device, reducing the need to retrieve data from an external storage.
300 320 330 338 340 332 334 352 338 354 334 338 330 320 338 320 330 338 320 334 354 338 320 330 330 300 300 For example, as discussed in more detail below, during operation of the SiP device, the host devicecan send a request for a subset of a large data set to the combined HBM devicethrough the TSVsof the TSV bus. The first controller diecan check whether the subset is stored in the volatile memory diesand, if not, forward the request and/or generate a new request for the data to the second controller diethrough the TSVs. The non-volatile memory diescan then write a copy of the subset of the data to the volatile memory diesthrough the TSVs, thereby allowing the combined HBM deviceto send the subset of the data to the host devicefor processing through the TSVs. Once the subset has been processed (and/or at various times during the processing), the host devicecan write a result of the processing into the combined HBM devicethrough the TSVs. More specifically, the host devicecan write the result to the volatile memory dieswhich, in turn, can write the result to the non-volatile memory diesthrough the TSVs. The host devicecan then send a request for another subset of the data set to the combined HBM device, and so on. In some embodiments, the process can be repeated, as necessary, any number of times (e.g., when iteratively training a machine learning model on a data set). As a result, when a data set is available in the combined HBM device, the SiP deviceis able to complete any number of iterations of a processing operation without communicating with an external storage component (e.g., via a PCI bus), thereby avoiding (or reducing the passages through) the bottleneck discussed in more detail above and increasing an overall processing speed of the SiP device.
334 354 330 330 332 352 334 354 320 332 320 334 334 330 300 In some embodiments, the volatile memory diesact as a buffer for the non-volatile memory diesto increase a response speed of the combined HBM device. For example, as discussed in more detail below, the combined HBM devicecan receive a first request instructing the first controller dieand/or the second controller dieto load a subset of data into the volatile memory diesfrom the non-volatile memory diesfor an upcoming request (e.g., when the host deviceknows which data it will need next), then receive a second request instructing the first controller dieto send the data to the host devicefrom the volatile memory dies. By loading the subset of the data into the volatile memory diesin response to the first request, the combined HBM devicecan help reduce a response time to the second request, thereby further increasing the overall processing speed of the SiP device.
338 354 30 354 320 320 334 320 320 354 320 320 354 In some embodiments, the TSVsdirectly couple the non-volatile memory diesto the host device. The direct coupling between the non-volatile memory diesand the host devicecan allow a new subset of data to be loaded directly to the host deviceat the start of a new operation (e.g., avoiding a buffer time associated with loading the subset into the volatile memory diesthen loading the subset into the host device). Additionally, or alternatively, the direct coupling between the host deviceand the non-volatile memory diescan allow the host deviceto periodically save a state of the host devicedirectly to the non-volatile memory diesto create a non-volatile backup of the current state (e.g., after a predetermined amount of time, after a processing milestone, and/or the like).
3 FIG. 320 312 300 362 312 310 314 310 362 320 362 320 362 320 362 320 314 310 320 330 300 As further illustrated in, the host devicecan be connected to the upper surfaceof the base substrate via solder balls, micro bumps, posts (e.g., copper posts), metal-metal bonds, and/or any other suitable conductive bonds. Also, the SiP devicealso includes interconnectsextending from the upper surfaceof the base substrateto a lower surfaceof the base substrate. The interconnectscan provide an external connection for the host device. For example, the interconnectscan couple the host deviceto an external component (e.g., a PCI bus coupled to an external storage, an external controller, and/or the like). Additionally, or alternatively, the interconnectscan couple the host device. Additionally, or alternatively, the interconnectscan couple the host deviceto a testing pin on the lower surfaceof the base substrate(e.g., to allow the host deviceand the combined HBM deviceto be evaluated after the SiP deviceis assembled).
330 322 300 310 310 310 310 300 However, because the combined HBM devicesare integrated on the upper surfaceof the host device, in some embodiments, the SiP devicedoes not include the base substrate(e.g., an interposer die) and additional components traditionally associated with the base substrate, such as route lines including metallization layers formed in one or more RDL layers of the base substrateand/or one or more vias interconnecting the metallization layers and/or traces. The omission of the base substratecan help simplify a construction of the SiP deviceby limiting the number of different components and thereby reducing cost.
4 FIG. 400 400 402 410 434 454 440 410 431 434 402 442 440 451 454 402 442 440 431 433 434 435 433 432 435 437 432 402 451 453 454 455 453 452 455 457 452 402 a b is a simplified schematic diagram of a SiPconfigured in accordance with some embodiments of the present technology. The SiPcan include a host device, an interface die, one or more volatile memory dies, one or more non-volatile memory dies, and a TSV bus. The interface diecan include a first set of componentscoupling the volatile memory diesto the host devicevia first TSVsincluded in the TSV bus, and a second set of componentscoupling the non-volatile memory diesto the host devicevia second TSVsincluded in the TSV bus. The first set of componentscan include a first three-dimensional TSV input-and-output (3D TSV I/O) interfacecoupled to the volatile memory dies, a first adaptercoupled to the first 3D TSV I/O interface, a first controllercoupled to the first adapter, and a second 3D TSV I/O interfacecoupled between the first controllerand the host device. The second set of componentscan include a third 3D TSV I/O interfacecoupled to the non-volatile memory dies, a second adaptercoupled to the third 3D TSV I/O interface, a second controllercoupled to the second adapter, and a fourth 3D TSV I/O interfacecoupled between the second controllerand the host device.
434 410 454 431 434 442 451 454 442 434 a b In the illustrated embodiment, the volatile memory diesare stacked between the interface dieand the non-volatile memory dies. Therefore, the first set of componentscan be directly coupled to the volatile memory diesvia the first TSVs, and the second set of componentscan be coupled to the non-volatile memory diesvia the second TSVsthat pass through the volatile memory dies.
432 434 402 452 454 402 In operation, the first controller(e.g., a DRAM controller) can manage data transfer between the volatile memory diesand the host devicein response to read and write requests. Similarly, the second controller(e.g., a NAND controller) can manage data transfer between the non-volatile memory diesand the host devicein response to read and write requests.
432 452 402 435 455 410 432 452 435 455 442 442 a b In some embodiments, the first controllerand/or the second controllerare included in the host deviceinstead. For example, the first adapterand/or the second adaptermay remain included in the interface die, and the first controllerand/or the second controllercan be coupled to the first adapterand/or the second adaptervia the first and/or second TSVs,, respectively.
5 FIG. 500 500 502 532 534 552 554 540 400 500 410 532 534 502 552 554 532 534 502 542 540 554 554 502 542 540 542 532 534 a b b is a simplified schematic diagram of a SiPconfigured in accordance with some embodiments of the present technology. The SiPcan include a host device, a first controller die, one or more volatile memory dies, a second controller die, one or more non-volatile memory dies, and a TSV bus. Notably, unlike the SiP, the SiPdoes not include an interface die (e.g., the interface die). In the illustrated embodiment, the first controller dieand the volatile memory diesare stacked between (i) the host deviceand (ii) the second controller dieand the non-volatile memory dies. Therefore, the first controller diecan be directly coupled to the volatile memory diesand the host devicevia first TSVsincluded in the TSV bus. Also, the second controller diecan be directly coupled to the non-volatile memory diesand indirectly coupled to the host devicevia second TSVsincluded in the TSV bus. As shown, the second TSVspass through the first controller dieand the volatile memory dies.
532 534 502 552 554 502 In operation, the first controller die(e.g., a DRAM controller) can manage data transfer between the volatile memory diesand the host devicein response to read and write requests. Similarly, the second controller die(e.g., a NAND controller) can manage data transfer between the non-volatile memory diesand the host devicein response to read and write requests.
6 FIG. 3 FIG. 6 FIG. 6 FIG. 600 600 330 600 610 620 630 650 600 640 610 620 630 650 640 642 is a partially schematic exploded view of a combined HBM deviceconfigured in accordance with some embodiments of the present technology. The combined HBM devicecan be an example of the combined HBM devicesdiscussed above with reference to. In the illustrated embodiment, the combined HBM devicecomprises a stack of dies that includes an interface die, a static random access memory (SRAM) die, one or more volatile memory dies(four illustrated in), and one or non-volatile memory dies(four illustrated in). Further, the combined HBM deviceincludes a shared TSV buscommunicatively coupling the interface die, the SRAM die, the volatile memory dies, and the non-volatile memory dies. The shared TSV buscan include one or more individual TSVsextending through the dies.
610 320 640 620 630 642 640 620 610 644 630 650 610 3 FIG. 5 FIG. The interface diecan be a physical layer (“PHY”) that establishes electrical connections between the other dies and other components (e.g., the host deviceof) through the shared TSV bus. The SRAM diecan provide volatile memory capacity in addition to the volatile memory dies. As shown, in addition to the TSVsof the shared TSV bus, the SRAM diecan be coupled to the interface dievia TSVsthat do not extend to the volatile memory diesor the non-volatile memory dies. In some embodiments, the vertical integration of the host device and the memory dies allows the interface dieto be omitted entirely, as discussed above with reference to.
600 630 650 630 650 610 630 650 600 5 FIG. 4 FIG. 6 FIG. In some embodiments, the combined HBM deviceadditionally includes one or more controller dies for controlling the volatile memory diesand/or the non-volatile memory dies, as discussed above with reference to. The controller dies can be stacked adjacent the volatile memory diesand/or the non-volatile memory dies. In some embodiments, the interface diecan include one or more controller dies for controlling the volatile memory diesand/or the non-volatile memory dies, as discussed above with reference to. In some embodiments, the combined HBM devicedoes not include controllers (or controller dies), which can instead be included in a host device (not shown in).
630 600 600 650 600 650 630 630 650 300 600 630 650 3 FIG. The volatile memory diescan be DRAM memory dies that provide low latency memory access to the combined HBM device(e.g., acting as a buffer die for the combined HBM device). In contrast, the non-volatile memory dies(sometimes referred to herein as a “secondary memory die,” “memory extension,” a “memory extension die,” and the like) can provide a non-volatile storage device (e.g., a NAND flash device) for the combined HBM device. Further, the non-volatile memory diescan provide a significant extension of the available memory (e.g., two times, three times, four times, five times, ten times, or any other suitable increase in the memory capacity of the volatile memory dies). In a specific, non-limiting example, each of the volatile memory diescan provide 4 GB of memory while each of the non-volatile memory diescan provide 64 GB of memory. In this example, a SiP device (e.g., the SiP deviceof) including the combined HBM devicecan avoid the latency of loading memory from an external storage component (and through a low bandwidth communication channel) into the volatile memory diesfor each round of processing through the 256 GB of data that can be stored in the non-volatile memory dies.
7 FIG. 3 FIG. 3 FIG. 700 700 320 330 is a flow diagram of a processfor operating a SiP device in accordance with some embodiments of the present technology. The processcan be completed by a controller in communication with the SiP device (e.g., a package controller) and/or on-board the SiP device (e.g., the host deviceof, within the combined HBM deviceof, and/or the like) to load, manage, and/or process data in the SiP device.
700 702 354 330 256 340 330 3 FIG. 3 FIG. 2 FIG. 3 FIG. 3 FIG. The processbegins at blockwith writing data into one or more non-volatile memory dies (e.g., the non-volatile memory diesof) of a combined HBM device (e.g., the combined HBM deviceof). In some embodiments, the data is written from an external storage component into the non-volatile memory dies (e.g., via a PCI bus such as the third communication channelof). In some such embodiments, the capacity of the non-volatile memory dies is large enough to store an entire data set for a complex computational operation (e.g., image and/or video rendering, AI/ML algorithms, and/or the like). In such embodiments, the data set must pass through a bottleneck to be loaded into the SiP device (e.g., through the PCI bus) only once. Afterward, the entire data set is available at a single location via a high bandwidth communication channel (e.g., the TSV busof) for any suitable number of iterations of the computational operation. In some embodiments, the SiP device includes multiple combined HBM devices and the data written into each combined HBM device is a partition of a larger data set. For example, a larger data set can be partitioned into two, three, four, and/or any other suitable number of parts for a corresponding number of combined HBM devices in a SiP and/or a corresponding number of SiP devices having one or more combined HBM devices. Additionally, or alternatively, the set of data can be partitioned according to an external requirement (e.g., according to a desired batch size for data in an AI/ML process, to maximize resource utilization during a computational process, and the like). In a specific, non-limiting example, a SiP device can include four combined HBM devices similar to the HBM devicesdiscussed above with reference to, each having a stack of non-volatile memory dies that provides 256 GB of memory. In this example, a data set with 1024 GB of data can be portioned into four partitions of 256 GB, each of which can be loaded into a corresponding combined HBM device to be accessed during the AI/ML process. In some embodiments, the data is written from another suitable external component (e.g., a bus component coupled to another electronic device, a data capture device, an input/output device, and/or the like), for example when the combined HBM device stores a primary copy (and/or only) of data used by an electronic device that includes the SiP device.
702 In some embodiments, the write operation at blockincludes determining a role for the one or more non-volatile memory dies in the combined HBM device. For example, a first subset of the non-volatile memory dies can be assigned as core dies, a second subset of the non-volatile memory dies can be assigned as spare dies, and a third subset of the non-volatile memory dies can be assigned as error correction code (ECC) dies.
702 1 FIG. Because the write operation at blockrequires data to move from an external storage component and/or another external device into the combined HBM device, the write operation can require the data to move through a relatively low bandwidth bus (e.g., on the order of 8 GB/s in the bottleneck described above with reference to). Consequently, the write operation can take several seconds to complete. However, as discussed in more detail below, the data is then available via a high bandwidth communication path within the SiP device, allowing the data to be used any number of times without going through the bottleneck again.
704 700 410 700 708 700 706 4 FIG. At block, the processincludes receiving (or generating) a request for a subset of the data in the combined HBM device. The request can be received from, for example, a host device (e.g., CPU/GPU) in a SiP device and/or any other suitable controller. Additionally, or alternatively, the request can be generated by a controller in the combined HBM device (e.g., by the interface dieof) in anticipation of the data being needed by an external component and/or based on a previous request from the external component. In some embodiments, receiving the request causes the combined HBM device (e.g., via a controller in the interface die) to check whether the requested subset of the data is stored in a volatile memory die in the combined HBM device. When the requested subset is found in a volatile memory die, the processcan continue to block(e.g., when the subset is written to the volatile memory die in anticipation of the request), else the processmust continue to block.
706 700 340 706 3 FIG. At block, the processincludes writing a copy of the subset of the data (or causing the subset of the data to be written), from the non-volatile memory dies device, into one or more volatile memory dies in the combined HBM device. The write operation can use a portion of a TSV bus (e.g., the TSV busof) between the non-volatile memory dies and the volatile memory dies to write the requested subset via a high bandwidth communication path. As a result, the write operation at blockcan be executed in a timeframe on the order of tens of microseconds, such that the subset is available almost instantly. Once stored in the HBM device, the subset of the data is available for typical use by a controller and/or processing unit via a high bandwidth communication path.
708 700 320 704 704 706 3 FIG. At block, the processincludes reading the subset of the data in volatile memory dies. The read operation can move a copy of the subset (and/or a portion of the subset) into a host device (e.g., the host deviceof) via the TSV bus, which may extend between the host device and the combined HBM device. In some embodiments, the subset of the dataset requested at blockis access directly from the non-volatile memory dies. Therefore, in such embodiments, blocksandcan be replaced by reading the subset of the data in the non-volatile memory dies.
710 700 320 712 700 710 340 708 712 710 700 708 712 712 700 708 708 712 710 704 3 FIG. 3 FIG. At block, the processincludes processing the read subset of the data (e.g., at the host deviceof). And at block, the processcan write a result of the processing (done at block) to the volatile memory dies through the high bandwidth communication path (e.g., the TSV busof). Because the read/write operations at blocks,can communicate the data using the high bandwidth communication path, the subset of the data is available for processing within tens of microseconds, and/or the result of the processing is saved within tens of microseconds, such that the processing at blockis usually the limiting factor on the speed of the processthrough blocks-. After writing a result of the processing to the volatile memory dies at block, the processcan return to blockto repeat blocks-any suitable number of times (e.g., when the processing at blockis part of an AI/ML algorithm that iteratively processes the subset of the data), and/or can return to blockto receive (or generate) a request for a second subset of the data in the non-volatile memory dies and write the second subset of the data to the volatile memory dies for processing.
714 700 714 714 712 714 712 714 340 714 700 708 708 712 710 714 704 3 FIG. Additionally or alternatively, at block, the processincludes writing a result of the processing to the non-volatile memory dies. In some embodiments, the write at blockwrites the result of the processing from the host device directly to the non-volatile memory dies. In some such embodiments, the write at blockcan occur simultaneously (or generally simultaneously) with the write at block. Additionally, or alternatively, the write at blockcan be executed instead of the write at block. In some embodiments, the write at blockwrites the result of the processing from the volatile memory dies to the non-volatile memory dies (e.g., through the TSV busof). After writing a result of the processing to the non-volatile memory dies at block, the processcan return to blockto repeat blocks-any suitable number of times (e.g., when the processing at blockis a part of an AI/ML algorithm that iteratively processes the subset of the data, when the write at blocksaves an intermediate result of the processing during a long processing operation, and/or the like), and/or can return to blockto receive (or generate) a request for a second subset of the data in the non-volatile memory dies and write the second subset of the data to the volatile memory dies for processing.
700 700 2 6 FIGS.- In various specific, non-limiting examples, the processcan be part of an AI/ML algorithm, a video rendering process, a high-resolution graphics rending process, various complex computer simulations, and/or any other suitable computing applications. In such embodiments, the CPU/GPU will typically call and/or refer to each subset of the data more than once. As a result, the SiP architectures discussed above with reference toallow the processto avoid reading the data from a storage component (and through a low bandwidth communication channel) multiple times. Instead, the data is written into the non-volatile memory dies in the combined HBM device(s) once, then written to the volatile memory dies in the combined HBM device(s), and read any suitable number of times. While the initial writing operation is subject to the bottleneck constraints of the low bandwidth communication path from the storage component, each subsequent access of the subset of the data (and/or accessing each subset sequentially) uses a high bandwidth path. As a result, each subsequent use of the data can require tens of microseconds instead of one or more seconds, potentially increasing the speed of the processing operations by orders of magnitude.
8 FIG. 4 FIG. 3 FIG. 800 800 410 320 is a flow diagram of a processfor operating a combined HBM device in accordance with some embodiments of the present technology. The processcan be implemented by a controller within an interface die of a combined HBM device (e.g., the interface dieof), a controller die stacked within the combined HBM device, a controller included in a host device (e.g., the host deviceof), and/or another suitable controller in a SiP device.
800 802 334 804 800 354 3 FIG. 3 FIG. The processbegins at blockwith receiving (or generating) a first request for a subset of the data in the combined HBM device. The first request can be received from, for example, a CPU/GPU in a processing unit of a SiP device and/or any other suitable controller in anticipation of the data being needed by an external component (e.g., needed by the CPU/GPU) in the future. Purely by way of example, the first request can be received 10 cycles, 100 cycles, 1000 cycles, and/or any other suitable number of cycles before the anticipated need for the data. The first request allows the combined HBM device to check whether the requested subset of the data is available in volatile memory dies in the combined HBM device (e.g., the volatile memory diesof). If not, at block, the processincludes writing the subset of the data from non-volatile memory dies (e.g., any of the non-volatile memory diesof) to the volatile memory dies in the combined HBM device. As a result, the subset of the data is available in a faster component in response to the anticipated future need.
806 800 808 800 320 300 808 708 3 FIG. 3 FIG. 7 FIG. At block, the processincludes receiving (or generating) a second request for the subset of the data in the combined HBM device. The second request corresponds to the anticipated need for the subset of the data and can be received from, for example, a CPU/GPU in the processing unit of the SiP device. Responsive to receiving the second request, at block, the processincludes writing the subset of the data from the volatile memory dies in the combined HBM device to a host device (e.g., the host deviceof) in the SiP (e.g., the SiP deviceof). The write at blockcan generally correspond to the read at blockofto make the subset of data available for processing at the host device.
9 9 FIGS.A andB 3 FIG. 900 920 900 920 320 330 are flow diagrams of processes,for powering a system-in-package device down and powering a system-in-package device up, respectively, using a combined high-bandwidth memory device in accordance with some embodiments of the present technology. The processes,can be completed by a controller in communication with the SiP device (e.g., a package controller) and/or on-board the SiP device (e.g., included in the host deviceor the combined HBM deviceof).
900 902 9 FIG.A The processofbegins at blockby receiving (or generating) a command to write a set of data to one or more volatile memory dies in a combined HBM device (e.g., from a storage device separate from the SiP device). The command can be, for example, in response to a user's request to launch a computing application with the SiP device.
904 900 900 904 900 At block, the processwrites the set of data to the volatile memory dies (e.g., DRAM dies) in the combined HBM device, such that the portion (or all) of the set of data is available for typical processing. Because the non-volatile memory die and the volatile memory dies are both coupled to a shared TVS bus in the combined HBM device, the processat blockcan simultaneously write the set of data to the non-volatile memory die in the combined HBM device (optional). By writing the data to the non-volatile memory die, the processcan protect against data loss during a blackout or other sudden loss of power (e.g., damage to a power connection).
900 902 904 906 906 900 906 340 3 FIG. The processcan complete blocksand(collectively, block) any number of times during operation of the SiP to support typical processing in a semiconductor device. During the processing at block, the read/write operations can use the high bandwidth communication path to quickly communicate sets of data back and forth between the volatile memory dies and the processing components, allowing the read/write operations to not impose significant time constraints on the processing. Further, in some embodiments, the processincludes writing to the non-volatile memory die at blockto save a result of various processing operations, save a current state of the SiP device, the combined HBM device, and/or any related semiconductor device. Because the non-volatile memory die is coupled to a high bandwidth communication path (e.g., the shared TSV busof), the saves can protect against a blackout or other loss of power without requiring a significant time investment and/or pause in processing operations. Further, in some embodiments, because any write operation on the volatile memory dies automatically creates a save in the non-volatile memory dies by virtue of their mutual connection to TSVs in the shared TSV bus, the saves may not require any additional time.
908 900 At block, the processincludes receiving a power-down request (sometimes also referred to herein as an idle request). The power-down request can be received in response to an input from a user and/or another component of a system using the SiP device (e.g., to conserve power when an electronic device is running low on battery power and/or in response to a loss of power).
910 900 226 228 140 2 FIG. 1 FIG. At block, the processincludes writing a state of the volatile memory dies (and/or any other suitable component of the semiconductor device, such as the L1 and L2 caches,of) to the non-volatile memory die in the combined HBM device. Because the non-volatile memory die is coupled to the high bandwidth communication path, the write operation can complete within tens of microseconds (e.g., as opposed to one or more seconds to write the data to a traditional storage device, such as the storage deviceof). As a result, the SiP device can comply with the power-down request within tens of microseconds, allowing the semiconductor device to save power, reduce losses of data when power is lost, and/or otherwise shut off quickly when requested.
920 922 924 920 226 228 9 FIG.B 2 FIG. Relatedly, the processofcan begin at blockby receiving a power-up request (sometimes also referred to herein as a wake-up request). The power-up request can be received in response to an input from a user and/or another component of a system using the SiP device (e.g., another controller in a semiconductor device). And at blockthe processcan read/write a previous state of the SiP device from the non-volatile memory die to the volatile memory dies and/or any other suitable components (e.g., the L1 and L2 caches,of). Similar to the discussion above, because the non-volatile memory die is coupled to the high bandwidth communication path, the SiP device (and the corresponding semiconductor device) can respond to a power-up request within tens of microseconds (e.g., instead of the one or more seconds required to read/write from a traditional storage component). As a result, the SiP device (and the corresponding semiconductor device) can be ready for computational activities significantly faster than a conventional device.
From the foregoing, it will be appreciated that specific embodiments of the technology have been described herein for purposes of illustration, but well-known structures and functions have not been shown or described in detail to avoid unnecessarily obscuring the description of the embodiments of the technology. To the extent any material incorporated herein by reference conflicts with the present disclosure, the present disclosure controls. Where the context permits, singular or plural terms may also include the plural or singular term, respectively. Moreover, unless the word “or” is expressly limited to mean only a single item exclusive from the other items in reference to a list of two or more items, then the use of “or” in such a list is to be interpreted as including (a) any single item in the list, (b) all of the items in the list, or (c) any combination of the items in the list. Furthermore, as used herein, the phrase “and/or” as in “A and/or B” refers to A alone, B alone, and both A and B. Additionally, the terms “comprising,” “including,” “having,” and “with” are used throughout to mean including at least the recited feature(s) such that any greater number of the same features and/or additional types of other features are not precluded. Further, the terms “generally,” “approximately,” and “about” are used herein to mean within at least within 10 percent of a given value or limit. Purely by way of example, an approximate ratio means within ten percent of the given ratio.
Several implementations of the disclosed technology are described above in reference to the figures. The computing devices on which the described technology may be implemented can include one or more central processing units, memory, input devices (e.g., keyboard and pointing devices), output devices (e.g., display devices), storage devices (e.g., disk drives), and network devices (e.g., network interfaces). The memory and storage devices are computer-readable storage media that can store instructions that implement at least portions of the described technology. In addition, the data structures and message structures can be stored or transmitted via a data transmission medium, such as a signal on a communications link. Thus, computer-readable media can comprise computer-readable storage media (e.g., “non-transitory” media) and computer-readable transmission media.
It will also be appreciated that various modifications may be made without deviating from the disclosure or the technology. For example, the dies in the HBM device can be arranged in any other suitable order (e.g., with the non-volatile memory die(s) positioned between the interface die and the volatile memory dies; with the volatile memory dies on the bottom of the die stack; and the like). Further, one of ordinary skill in the art will understand that various components of the technology can be further divided into subcomponents, or that various components and functions of the technology may be combined and integrated. In addition, certain aspects of the technology described in the context of particular embodiments may also be combined or eliminated in other embodiments. For example, although discussed herein as using a non-volatile memory die (e.g., a NAND die and/or NOR die) to expand the memory of the HBM device, it will be understood that alternative memory extension dies can be used (e.g., larger-capacity DRAM dies and/or any other suitable memory component). While such embodiments may forgo certain benefits (e.g., non-volatile storage), such embodiments may nevertheless provide additional benefits (e.g., reducing the traffic through the bottleneck, allowing many complex computation operations to be executed relatively quickly, etc.).
Furthermore, although advantages associated with certain embodiments of the technology have been described in the context of those embodiments, other embodiments may also exhibit such advantages, and not all embodiments need necessarily exhibit such advantages to fall within the scope of the technology. Accordingly, the disclosure and associated technology can encompass other embodiments not expressly shown or described herein.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
July 7, 2025
January 15, 2026
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.