Systems and methods are provided for removing debris from a heatsink. A debris detection and dislodging module receives one or more sensor readings that exceed a configurable threshold at a heatsink of a component to be cooled. In response to the received sensor readings, enabling, by the debris detection and dislodging module, a controller to activate a dislodging apparatus to remove debris in fins of the heatsink in the component to be cooled. The dislodging apparatus sweeps the debris in the fins of the component to be cooled until the one or more sensor readings returns to an expected level below the threshold. Coolant flowing through a coldplate assembly in the heatsink carries the debris out of the fins of the heatsink.
Legal claims defining the scope of protection, as filed with the USPTO.
. A method for removing debris from a heatsink comprising:
. The method of, wherein the debris detection and dislodging module generates a notification when a filter requires cleaning.
. The method of, wherein a duration of the sweeping is a configurable count of jobs completed within a configurable window amount of time, wherein a first event triggers starting a timer and accumulating a counter of jobs, and wherein the counter and timer are reset to zero when the time ends without the configurable count of jobs completing.
. The method of, wherein a sweeping is triggered based on extracted sensor data and utilization data of the component to be cooled exceeding a threshold percentage.
. The method of, wherein a utilization threshold is defined as a percent deviation from configured preset sensor readings.
. The method of, wherein the dislodging apparatus performs at least one sweep of the component to be cooled, each of the sweepings beginning at a position with an action of rotating downward to position parallel members between the fins of the heatsink and drawing the dislodging apparatus through the component to be cooled before returning to the beginning position.
. The method of, wherein the dislodging apparatus performs more at least one sweep of the component to be cooled, each of the sweepings being an action of moving through the component to be cooled from an initial position until the full area between each of the fins of the component to be cooled is swept before returning to the initial position.
. The method of, wherein the debris detection and dislodging module detects a difference between a baseline value and a configurable threshold to determine debris is present and a dislodge event is required.
. A computer program product for removing debris from a heatsink, the computer program product comprising a non-transitory tangible storage device having program code embodied therewith, the program code executable by a processor of a computer to perform a method, the method comprising:
. The computer program product of, wherein the debris detection and dislodging module generates a notification when the filter requires cleaning.
. The computer program product of, further comprising one or more sensors, wherein the sensors include different sensors that measure temperature, power, pressure, or flow.
. The computer program product of, wherein a duration of the sweeping is a preset amount of time.
. The computer program product of, wherein the dislodging apparatus performs more than one sweep of the component to be cooled.
. The computer program product of, wherein each of the sweepings is an action of moving through the component to be cooled from an initial position until the full area between each of the fins of the component to be cooled is swept before returning to the initial position.
. The computer program product of, wherein the debris detection and dislodging module detects a difference between a baseline value and the threshold to determine debris is present and a dislodge event is required.
. A computer system removing debris from a heatsink, comprising:
. The computer system of, wherein the debris detection and dislodging module generates a notification when the filter requires cleaning.
. The computer system of, further comprising one or more sensors, wherein the sensors include different sensors that measure temperature, power, pressure, or flow.
. The computer system of, wherein each of the sweepings is an action of moving through the component to be cooled from an initial position until the full area between each of the fins of the component to be cooled is swept before returning to the initial position.
. The computer system of, wherein the debris detection and dislodging module detects a difference between a baseline value and the threshold to determine debris is present and a dislodge event is required.
Complete technical specification and implementation details from the patent document.
This invention relates generally to computer systems, and more particularly to heatsink debris detection and dislodging.
Heatsinks within water-cooled systems represent some of the narrowest channels through which water must flow. Fins are the primary location for blockages in water cooled systems, thereby leading to failures in the field and decreased product reliability. These narrow channels drastically increase the surface area that contacts chilled water, enabling the effective transfer of heat away from the chip. However, systems undergo an expected amount of wear and tear over the course of normal operation. Metal shavings, small components of the system, and more random bits of debris can detach and begin circulating in the system before collecting in the heatsink channels, and eventually causing blockages. These blockages increase water pressure, decrease flow rate, and can increase operating temperatures that lead to failures in the field.
It would be advantageous to provide a system to dislodge small metal pieces and components thereby proactively preventing the blockages before a clog can form.
A method is provided for removing debris from a heatsink. A debris detection and dislodging module receives, through one or more sensors, readings that exceed a configurable threshold at a heatsink of a component to be cooled. In response to the received sensor readings, the debris detection and dislodging module enables a motor controller to activate a motor actuator of a dislodging apparatus to remove debris in fins of the heatsink in the component to be cooled. A dislodging apparatus sweeps the debris in the fins of the component to be cooled until the one or more sensor readings returns to an expected level below the threshold. The coolant flowing through a coldplate assembly in the heatsink carries the debris out of the fins of the heatsink.
Heatsinks within water-cooled systems represent some of the narrowest channels through which water must flow. These heatsink channels increase the surface area that contacts the chilled water, enabling heat transfer of heat away from the chip. The fins, shown in the heatsinks in, form the heatsink channels, and are primarily where blockages in the water-cooled systems occur. Coldplate heatsink assemblies use both rigid piping and flexible hose systems. The coldplate heatsink assemblies include a constant flow liquid cooling loop, such as that illustrated in, to continuously pull heat away from the component to be cooled. Systems experience an expected amount of wear during normal operation. Metal shavings, small components of the system, and other random debris can detach and begin circulating in the system before collecting in the heatsink channels and creating blockages.
These blockages increase water pressure and decrease flow rate, which contribute to increases in operating temperatures. As a result, the chip can fail in the field, causing the system in which it is installed to fail as well.
Embodiments of the present invention increase the lifespan of the chip in the system by dislodging small metal pieces and similar debris (debris) before blockages occur in the heatsink channels. The various embodiments include a dislodging apparatus that automatically removes the debris from the heatsink, thereby contributing to the lifespan of the chip being increased by preventing overheating. Employed within a coldplate heatsink, the debris is dislodged from between the fins in-situ, without opening the system. Embodiments include a method to detect a blockage and cause the dislodging apparatus to eject the blockage such that heatsink cooling is optimized. These novel structural changes are possible without significantly altering flow rate through the system and are controlled via a novel method of blockage remediation without opening the closed system.
Embodiments of the present invention can be implemented generally in environments where heatsinks are present, for example, heatsink manufacturers, electronics manufacturers, and coldplate manufacturers. While embodiments of the present invention are primarily directed to coldplate heatsinks, they can apply to air cooled heatsinks, as long as there is a device downstream to catch any debris. In that case, air from fans would be flowing instead of liquid flowing between the fins of the heatsink.
For example,is a high-level image of a representative cooling loopwhere embodiments of the present invention can be practiced. This embodiment shows a closed liquid cooling loop. Here, the pumps drive hot liquid up through the glycol-to-air heat exchanger. The liquid is hot having drawn heat away from the processors using the coldplate heatsinks in the processor drawers (not shown). Fans blow across the glycol-to-air heat exchanger to cool the liquid, after which the cooled liquid is pumped up through supply manifold hoses (hoses not shown) to the coldplate heatsinks in the processor drawers. In some embodiments, there are four coldplate heatsinks, but other embodiments may have any number of coldplate heatsinks depending on the implementation details. The return path between the processor drawers and the pumps is a preferred location for a filter, although the filter can be elsewhere in the loop.
, an illustration is presented of the operating environment of a networked computer, according to an embodiment of the present invention.
Computing environmentcontains an example of an environment for the execution of at least some of the computer code involved in performing the inventive methods, such as a system for heatsink debris detection and dislodging(system), embodied in the debris detection and dislodging moduleand the dislodging apparatusof the coldplate assembly. In addition to block, computing environmentincludes, for example, computer, wide area network (WAN), end user device (EUD), remote server, public cloud, and private cloud. In this embodiment, computerincludes processor set(including processing circuitryand cache), communication fabric, volatile memory, persistent storage(including operating systemand block, as identified above), peripheral device set(including user interface (UI), device set, storage, and Internet of Things (IoT) sensor set), and network module. Remote serverincludes remote database. Public cloudincludes gateway, cloud orchestration module, host physical machine set, virtual machine set, and container set.
COMPUTERmay take the form of a desktop computer, laptop computer, tablet computer, smart phone, smart watch or other wearable computer, mainframe computer, quantum computer or any other form of computer or mobile device now known or to be developed in the future that is capable of running a program, accessing a network, or querying a database, such as remote database. As is well understood in the art of computer technology, and depending upon the technology, performance of a computer-implemented method may be distributed among multiple computers and/or between multiple locations. On the other hand, in this presentation of computing environment, detailed discussion is focused on a single computer, specifically computer, to keep the presentation as simple as possible. Computermay be located in a cloud, even though it is not shown in a cloud in. On the other hand, computeris not required to be in a cloud except to any extent as may be affirmatively indicated.
PROCESSOR SETincludes one, or more, computer processors of any type now known or to be developed in the future. Processing circuitrymay be distributed over multiple packages, for example, multiple, coordinated integrated circuit chips. Processing circuitrymay implement multiple processor threads and/or multiple processor cores. Cacheis memory that is located in the processor chip package(s) and is typically used for data or code that should be available for rapid access by the threads or cores running on processor set. Cache memories are typically organized into multiple levels depending upon relative proximity to the processing circuitry. Alternatively, some, or all, of the cache for the processor set may be located “off chip.” In some computing environments, processor setmay be designed for working with qubits and performing quantum computing.
Computer readable program instructions are typically loaded onto computerto cause a series of operational steps to be performed by processor setof computerand thereby effect a computer-implemented method, such that the instructions thus executed will instantiate the methods specified in flowcharts and/or narrative descriptions of computer-implemented methods included in this document (collectively referred to as “the inventive methods”). These computer readable program instructions are stored in various types of computer readable storage media, such as cacheand the other storage media discussed below. The program instructions, and associated data, are accessed by processor setto control and direct performance of the inventive methods. In computing environment, at least some of the instructions for performing the inventive methods may be stored in blockin persistent storage.
COMMUNICATION FABRICis the signal conduction paths that allow the various components of computerto communicate with each other. Typically, this fabric is made of switches and electrically conductive paths, such as the switches and electrically conductive paths that make up busses, bridges, physical input/output ports and the like. Other types of signal communication paths may be used, such as fiber optic communication paths and/or wireless communication paths.
VOLATILE MEMORYis any type of volatile memory now known or to be developed in the future. Examples include dynamic type random access memory (RAM) or static type RAM. Typically, the volatile memory is characterized by random access, but this is not required unless affirmatively indicated. In computer, the volatile memoryis located in a single package and is internal to computer, but, alternatively or additionally, the volatile memory may be distributed over multiple packages and/or located externally with respect to computer.
PERSISTENT STORAGEis any form of non-volatile storage for computers that is now known or to be developed in the future. The non-volatility of this storage means that the stored data is maintained regardless of whether power is being supplied to computerand/or directly to persistent storage. Persistent storagemay be a read only memory (ROM), but typically at least a portion of the persistent storage allows writing of data, deletion of data and re-writing of data. Some familiar forms of persistent storage include magnetic disks and solid-state storage devices. Operating systemmay take several forms, such as various known proprietary operating systems or open-source Portable Operating System Interface type operating systems that employ a kernel. The code included in blocktypically includes at least some of the computer code involved in performing the inventive methods.
PERIPHERAL DEVICE SETincludes the set of peripheral devices of computer. Data communication connections between the peripheral devices and the other components of computermay be implemented in various ways, such as Bluetooth connections, Near-Field Communication (NFC) connections, connections made by cables (such as universal serial bus (USB) type cables), insertion type connections (for example, secure digital (SD) card), connections made through local area communication networks and even connections made through wide area networks such as the internet. In various embodiments, UI device setmay include components such as a display screen, speaker, microphone, wearable devices (such as goggles and smart watches), keyboard, mouse, printer, touchpad, game controllers, and haptic devices. Storageis external storage, such as an external hard drive, or insertable storage, such as an SD card. Storagemay be persistent and/or volatile. In some embodiments, storagemay take the form of a quantum computing storage device for storing data in the form of qubits. In embodiments where computeris required to have a large amount of storage (for example, where computerlocally stores and manages a large database) then this storage may be provided by peripheral storage devices designed for storing very large amounts of data, such as a storage area network (SAN) that is shared by multiple, geographically distributed computers. IoT sensor setis made up of sensors that can be used in Internet of Things applications. For example, one sensor may be a thermometer and another sensor may be a motion detector.
NETWORK MODULEis the collection of computer software, hardware, and firmware that allows computerto communicate with other computers through WAN. Network modulemay include hardware, such as modems or Wi-Fi signal transceivers, software for packetizing and/or de-packetizing data for communication network transmission, and/or web browser software for communicating data over the internet. In some embodiments, network control functions and network forwarding functions of network moduleare performed on the same physical hardware device. In other embodiments (for example, embodiments that utilize software-defined networking (SDN)), the control functions and the forwarding functions of network moduleare performed on physically separate devices, such that the control functions manage several different network hardware devices. Computer readable program instructions for performing the inventive methods can typically be downloaded to computerfrom an external computer or external storage device through a network adapter card or network interface included in network module.
WANis any wide area network (for example, the internet) capable of communicating computer data over non-local distances by any technology for communicating computer data, now known or to be developed in the future. In some embodiments, the WAN may be replaced and/or supplemented by local area networks (LANs) designed to communicate data between devices located in a local area, such as a Wi-Fi network. The WAN and/or LANs typically include computer hardware such as copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and edge servers.
END USER DEVICE (EUD)is any computer system that is used and controlled by an end user (for example, an administrator that operates computer), and may take any of the forms discussed above in connection with computer. For example, EUDcan be the external application by which an end user connects to the control node through WAN. In some embodiments, EUDmay be a client device, such as thin client, heavy client, mainframe computer, desktop computer and so on.
REMOTE SERVERis any computer system that serves at least some data and/or functionality to computer. Remote servermay be controlled and used by the same entity that operates computer. Remote serverrepresents the machine(s) that collect and store helpful and useful data for use by other computers, such as computer. For example, in a hypothetical case where computeris designed and programmed to provide a recommendation based on historical data, then this historical data may be provided to computerfrom remote databaseof remote server.
PUBLIC CLOUDis any computer system available for use by multiple entities that provides on-demand availability of computer system resources and/or other computer capabilities, especially data storage (cloud storage) and computing power, without direct active management by the user. Cloud computing typically leverages sharing of resources to achieve coherence and economies of scale. The direct and active management of the computing resources of public cloudis performed by the computer hardware and/or software of cloud orchestration module. The computing resources provided by public cloudare typically implemented by virtual computing environments that run on various computers making up the computers of host physical machine set, which is the universe of physical computers in and/or available to public cloud. The virtual computing environments (VCEs) typically take the form of virtual machines from virtual machine setand/or containers from container set. It is understood that these VCEs may be stored as images and may be transferred among and between the various physical machine hosts, either as images or after instantiation of the VCE. Cloud orchestration modulemanages the transfer and storage of images, deploys new instantiations of VCEs and manages active instantiations of VCE deployments. Gatewayis the collection of computer software, hardware, and firmware that allows public cloudto communicate through WAN.
Some further explanation of virtualized computing environments (VCEs) will now be provided. VCEs can be stored as “images.” A new active instance of the VCE can be instantiated from the image. Two familiar types of VCEs are virtual machines and containers. A container is a VCE that uses operating-system-level virtualization. This refers to an operating system feature in which the kernel allows the existence of multiple isolated user-space instances, called containers. These isolated user-space instances typically behave as real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can utilize all resources of that computer, such as connected devices, files and folders, network shares, CPU power, and quantifiable hardware capabilities. However, programs running inside a container can only use the contents of the container and devices assigned to the container, a feature which is known as containerization.
PRIVATE CLOUDis similar to public cloud, except that the computing resources are only available for use by a single enterprise. While private cloudis depicted as being in communication with WAN, in other embodiments a private cloud may be disconnected from the internet entirely and only accessible through a local/private network. A hybrid cloud is a composition of multiple clouds of different types (for example, private, community or public cloud types), often respectively implemented by different vendors. Each of the multiple clouds remains a separate and discrete entity, but the larger hybrid cloud architecture is bound together by standardized or proprietary technology that enables orchestration, management, and/or data/application portability between the multiple constituent clouds. In this embodiment, public cloudand private cloudare both part of a larger hybrid cloud.
shows a network diagram for a systemfor detecting and dislodging heatsink debris.
The systemincludes a debris and dislodging module, one or more sensors, and a printed circuit board (PCB)which are all interconnected via wired and or wireless network. The systemalso contains at least one motor/actuator, a coldplate assembly, and a component to be cooled. In this context, an actuator is a device that moves an object to a different position, typically in a linear motion, thereby actuating the motion. A motor translates an energy source into rotation motion. In different embodiments, a linear or rotational motor can be used. The actuator can be combined with the motor to move the dislodging apparatus. Additionally, embodiments of the present invention can be practiced using either one or more motors, one or more actuators, or a combination of motors and actuators.
The networkmay be any wired and/or wireless communication protocol that allows data to be transferred between components of the system (e.g., PCIe, IC, Bluetooth, Wi-Fi, Cellular (e.g., 3G, 4G, 5G), Ethernet, fiber optics, etc.).
The debris and dislodging modulereads in sensorsdata to determine when debris is caught in heatsinkand controls motor/actuator, via motor controller, connected to the dislodging apparatus. The debris and dislodging moduleis contained in a memory module (not shown) on the PCBor in another computing device (not shown). The method executed by debris and dislodging moduleis described further with reference to.
The sensorsinclude temperature, flow, and pressure sensors, or a combination thereof located on the component to be cooled(e.g., processor, integrated circuit (IC) module). The sensorsdetect a change in condition, i.e., increase in temperature or flow, under similar past utilization/conditions, thereby indicating that the heatsinkis not performing as efficiently as it had previously.
The systemand the component to be cooled, either separately or together, can track data from similar past utilization/conditions. This data includes utilization, logs, metrics, events, telemetry data, and environmental conditions. Much of this data is already being tracked on computer systems such as the computing environmentand is accessible through the computersingle point of control, such as the support element (SE) or baseboard management controller (BMC). The data from the sensorsare tracked in addition to this utilization/condition data. In some embodiments, additional sensors may not be needed as there may already be thermal diodes on the component being cooledto track temperature. Additionally, there may already be flow sensors in the cooling loop that produce data for other purposes, but that the systemcan capture and incorporate.
In this case, it is possible that the heatsinkhas debris caught between its fins. In one or more embodiments, the sensorsmay be flow sensors in the cooling path of the coldplate and a change in flow under similar past conditions indicates that the heatsinkis not performing as efficiently as it has previously and may have debris caught between fins.
The PCBis the electronics board where signals exist and where components are mounted that control and power motor/actuatorconnected to dislodging apparatus. Other embodiments may not use a PCBdirectly. For example, a coldplate manifold can have a heatsink, but also provide cooling for components downstream in the airflow or cooling loop path. A heatsink may also be connected to a component not mounted to a board, such as for a component mounted directly off the wafer, or for a component not mounted to a PCB, such as a motor.
In one or more embodiments, the component to be cooledis mounted on the PCB. In one or more embodiments, one or more components-are separate components that are not connected to a PCB. For example, a coldplate manifold could have a heatsink, but also provide cooling for components downstream in the airflow or cooling loop path. A heatsink may also be connected to a component not mounted to a board (e.g., directly off the wafer or another component not mounted to a PCB such as a motor or battery pack).
The PCBcontains the motor controller, a motor enable signal, power, and a ground.
The motor controllerreceives program instructions from the debris and dislodging moduleto generate a motor enable signalfor the motor/actuator. The powerand groundprovide the required power to the motor/actuatorwhen required. In one or more embodiments, the power required is dependent upon the weight of the dislodging apparatus and the fluids flow rate through the coldplate assembly.
There are gravitational and frictional forces for the dislodging apparatusthat the motor/actuatorthat controls the dislodging apparatusmust overcome. There is also force against the dislodging apparatusfrom the fluid flowing past it. The motor/actuatorshould be sized to overcome these forces acting against the dislodging apparatus. If the heatsinkis small with few fins and the flow rate is low, a small motor with less output power is required versus a large heatsinkwith many fins and a high flow rate which would need more power to operate efficiently and prevent the dislodging apparatusfrom overloading the motor/actuatoror receiving sufficient force to stop the dislodging apparatus.
In one or more embodiments, the motor/actuatormay operate off power planes (e.g., 3.3Vdc, 5Vdc, 12Vdc) through a connector mounted to the PCBor may be wired directly to pins/vias on the PCB. The motor/actuatorcontrols dislodging apparatusupon receiving a motor enable signalfrom the motor controller. In one or more embodiments, a motor is used to move the dislodging apparatus. In one or more embodiments, a linear or rotational actuator is used to move the dislodging apparatus. For example, an actuator is effective where more power is required to move the dislodging apparatusalong the linear path along the standoffs. A linear motor would be more beneficial if low force is sufficient in the given application.
In one or more embodiments, a plurality of motor/actuatorcan be used to ensure that the dislodging apparatusremains on a desired track (e.g., does not tilt, cantilever, or break).
illustrate a dislodging apparatus.illustrates the dislodging apparatusin its rest position during normal operation.illustrates the dislodging apparatusin its uppermost position during a dislodging action. The dislodging apparatuscan be at an acute angle to the base of the heatsink, for example 1 mm. In this case, one side of the dislodging apparatusraises up slightly before the other side to form the acute angle, then the entire dislodging apparatusraises up and down while maintaining the acute angle.
The coldplate assemblyincludes the dislodging apparatus, the heatsink, a filter, guide rails, and bumpers. The return path between the processor drawers and the pumps is a preferred location for the filter, although the filtercan be elsewhere in the cooling loop.
The guide railsensure that the dislodging apparatusremains straight when activated. The bumpersensure that the dislodging apparatusand the heatsinkdo not collide and cause damage to one another. The guide railsand bumpersare each optional. In some embodiments, the linear arm of the motor/actuatorcan be used in place of these standoffs. When using the guide railsand the bumpers, the arm(s) of the motor/actuatorcan attach to the dislodging apparatusat any point on the dislodging apparatusto move it up and down.
The heatsinkis located in the liquid flow path of the coldplate assemblyand helps to spread heat from the component to be cooledfor optimized cooling.
The filteris located downstream of the heatsinkwithin the coldplate assembly. The filtercatches any debristhat was dislodged from the heatsinkby the dislodging apparatus.
The component to be cooled(e.g., processor, IC package) is the entity that requires liquid cooling by the coldplate assembly. In one or more embodiments, the component to be cooledis mounted to the PCB.
illustrates the dislodging apparatusin its uppermost position halfway through a dislodging action with debrisremoved from the system. The dislodging action ends with the return of the dislodging apparatusto the position shown in.
In, in response to a dislodge event being triggered by debrisbeing detected by the change in condition beyond a threshold by the sensors, the detection and dislodging modulesends instructions to the dislodging apparatusto start in the position shown in, rise to the position shown in, and return to the position shown in. During the movement of the dislodging apparatus, debristhat was lodged between the heatsinkfins is jarred free, such that liquid flowing through the coldplate assemblyover the heatsinkcarries the debristo the filter.
illustrates an alternate embodiment of a dislodging apparatus. In the figure, the dislodging apparatusrotates down into position as shown by arrow. The dislodging apparatuscomprises at least one perpendicular member positioned adjacent to and perpendicular to the fins of the heatsink. The dislodging apparatusis located at the opposite end of the heatsinkthat is attached to the component to be cooled.
At least one straight parallel memberfor each channel is attached to the perpendicular member such that the parallel memberis positioned in the channel formed between two fins and parallel to the fins (x-axis). Subsequently upon activation, the perpendicular member of the dislodging apparatusrotates downward in the direction shown by arrowcausing the attached parallel membersto change position from parallel to the fins to perpendicular to the fins (y-axis). The dislodging apparatusis drawn through the fins of the heatsink(either by actuator, motor, or combined actuator/motor) in the direction, thereby causing the parallel membersto dislodge and remove the debris. It should be noted that in some implementations, the dislodging apparatuscan be drawn in directionA as well or can be repeatedly drawn through the fins of the heatsinkas needed.
is a view of hook-like features that are implemented in some embodiments on the parallel membersofand the curved membersof.a and c are profile views of the hook-like features, and b is a front view. These hook-like features are sized to occupy the channels between the fins of the heatsinkto catch and sweep away debris. While the hook-like features are shown as barbs, other shapes are possible, depending on the implementation.
Unknown
October 2, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.