US-12596861-B2

Efficient delay calculations in replicated designs

PublishedApril 7, 2026

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Disclosed is an improved approach to implement sharing of delay calculations for replicated portions of a design, where input slews may be different between those replicated design portions. This allows the system to experience runtime improvements for timing analysis of electronic designs.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. A method for processing an electronic design, comprising:

. The method of, wherein a default input slew is applied to the second sibling portion of the electronic design to perform timing analysis.

. The method of, further comprising performing a re-calculation for the timing analysis with actual slew values.

. The method of, wherein the re-calculation is performed for two or three stages of the second sibling portion.

. The method of, wherein the dependency is broken by performing for a group of the sibling portions:

. The method of, wherein a graph-based scheduler is used to schedule computing resources to process the electronic design.

. The method of, wherein the graph-based scheduler performs:

. A computer program product that includes a non-transitory computer readable medium, the non-transitory computer readable medium comprising a plurality of computer instructions which, when executed by a processor, cause the processor to execute performing a process for processing an electronic design, the process comprising:

. The computer program product of, wherein a default input slew is applied to the second sibling portion of the electronic design to perform timing analysis.

. The computer program product of, further comprising performing a re-calculation for the timing analysis with actual slew values.

. The computer program product of, wherein the re-calculation is performed for two or three stages of the second sibling portion.

. The computer program product of, wherein the dependency is broken by performing for a group of the sibling portions:

. The computer program product of, wherein a graph-based scheduler is used to schedule computing resources to process the electronic design.

. The computer program product of, wherein the graph-based scheduler performs:

. A system for processing an electronic design, comprising:

. The system of, wherein a default input slew is applied to the second sibling portion of the electronic design to perform timing analysis.

. The system of, further comprising performing a re-calculation for the timing analysis with actual slew values.

. The system of, wherein the dependency is broken by performing for a group of the sibling portions:

. The system of, wherein a graph-based scheduler is used to schedule computing resources to process the electronic design.

. The system of, wherein the graph-based scheduler performs:

Detailed Description

Complete technical specification and implementation details from the patent document.

An integrated circuit (IC) has a large number of electronic components, such as transistors, logic gates, diodes, wires, etc., that are fabricated by forming layers of different materials and of different geometric shapes on various regions of a silicon wafer. Many phases of physical design may be performed with computer aided design (CAD) tools or electronic design automation (EDA) systems. To design an integrated circuit, a designer first creates high level behavior descriptions of the IC device using a high-level hardware design language. An EDA system typically receives the high level behavior descriptions of the IC device and translates this high-level design language into netlists of various levels of abstraction using a computer synthesis process. A netlist describes, for example, interconnections of nodes and components on the chip and includes information of circuit primitives such as transistors and diodes, their sizes and interconnections.

An integrated circuit designer may use a set of layout EDA application programs to create a physical integrated circuit design layout from a logical circuit design. The layout EDA application uses geometric shapes of different materials to create the various electrical components on an integrated circuit and to represent electronic and circuit IC components as geometric objects with varying shapes and sizes. Typically, geometric information about the placement of the nodes and components onto the chip is determined by a placement process and a routing process. The placement process is a process for placing electronic components or circuit blocks on the chip and the routing process is the process for creating interconnections between the blocks and components according to the specified netlist. After an integrated circuit designer has created the physical design of the circuit, the integrated circuit designer then verifies and optimizes the design using a set of EDA testing and analysis tools.

Rapid developments in the technology and equipment used to manufacture semiconductor ICs have allowed electronics manufacturers to create smaller and more densely packed chips in which the IC components, such as wires, are located very close together. When electrical components are spaced close together, the electrical characteristics or operation of one component may affect the electrical characteristics or operation of its neighboring components, which may negatively affect the timing characteristics of the circuit design. Therefore, one of the key steps in the modern circuit design process is to perform “timing closure” and/or “signoff”, to ensure that the timing characteristics of the circuit design will meet expected operating requirements.

As electronic designs move towards lower process technologies having a significantly higher number of components within the design, the process to perform timing closure has become quite challenging. The process of performing timing closure typically also includes the calculation of delays for the design, where these delay calculations are often very expensive in terms of computational resources and time. As such, it is very desirable to be able to reduce the amount of resources and time needed to perform delay calculations for the design.

One possible approach to make the timing analysis process more efficient is to share delay calculations for portions of the design that are repeated over and over again within the design. In a hierarchical design, it is likely that the same design portions (e.g., design “blocks” or “instances”) are replicated many times within the hierarchical structure of the design. By sharing the delay calculations, this allows the system to avoid the cost and expense of having to separately perform the delay calculations for each of the same replicated design portions that are repeated throughout the design. However, while theoretically a good idea, it is unfortunately the case that conventional timing analysis techniques are unable to effectively and efficiently share delay calculations across many of the replicated design blocks within an electronic design given the fact that many of these replicated design blocks have different input slews. To explain, consider that the input slews to the replicated design blocks implemented within the design realistically may be different from one another. This situation may occur for example, if a first copy of a replicated design block is inserted inline with a second copy of the same replicated design block, which means that the input slew for the first copy is likely going to be quite different from the input slew for the second copy. This difference in input slews makes conventional timing analysis techniques unable to share delay calculations between these affected copies of the replicated blocks in the design.

Therefore, there is a need for an improved approach to implement sharing of delay calculations for electronic designs.

According to some embodiments of the invention, the present disclosure provides an improved approach to implement sharing of delay calculations for replicated portions of a design, even where input slews may be different between those replicated design portions.

Other and additional objects, features, and advantages of the invention are described in the detailed description, figures, and claims.

Various embodiments will now be described in detail with reference to the drawings, which are provided as illustrative examples of the invention so as to enable those skilled in the art to practice the invention. Notably, the figures and the examples below are not meant to limit the scope of the present invention. Where certain elements of the present invention may be partially or fully implemented using known components (or methods or processes), only those portions of such known components (or methods or processes) that are necessary for an understanding of the present invention will be described, and the detailed descriptions of other portions of such known components (or methods or processes) will be omitted so as not to obscure the invention. Further, various embodiments encompass present and future known equivalents to the components referred to herein by way of illustration.

Some embodiments of the invention provide an improved approach to implement sharing of delay calculations for replicated portions of a design, even where input slews may be different between those replicated design portions.

provides a high level illustration of a systemto implement some embodiments of the invention. Systemmay include one or more users at one or more user station(s)that operate the systemto design or verify the electronic design. Such users include, for example, design engineers or verification engineers. User stationcomprises any type of computing station that may be used to operate, interface with, or implement EDA applications or devices, such as EDA (electronic design automation) toolswithin computing system. Examples of user stationsinclude for example, workstations, personal computers, or remote computing terminals. User stationcomprises a display device, such as a display monitor, for displaying electronic design analysis results to users at the user station. User stationalso comprises one or more input devices for the user to provide operational control over the activities of system, such as a mouse or keyboard to manipulate a pointing object in a graphical user interface. The computing systemcomprises any suitable type of platform or system that is capable of performing computing activities within a system. For example, computing systemmay be implemented as a personal computer, a workstation or a computing server. Computing systemmay be located at any suitable location, such as for example, in the form of a portable computing device, a workstation, a server within an on-premises server room, or as a cloud-accessible server within a cloud computing environment.

The electronic designmay be stored in a computer readable storage device. The electronic designcorresponds to any form of electrical design data that needs to be analyzed by the EDA tool(s). For example, electronic designmay include data in the form of view definitions, MMMC (multi-mode multi-corner) configurations, timing properties, Verilog data, LEF/DEF (Library Exchange Format/Design Exchange Format) data files, and/or scripts. Computer readable storage deviceincludes any combination of hardware and/or software that allows for ready access to the data that is located at the computer readable storage device. For example, computer readable storage devicecould be implemented as computer memory operatively managed by an operating system. The computer readable storage devicecould also be implemented as an electronic database system having storage on persistent and/or non-persistent storage, e.g., which implements a database according to the specification, API (Applications Programming Interface), and standards compliant with the OpenAccess database reference promulgated by the Silicon Integration Initiative.

One or more EDA toolsmay be used by users at a user stationto design and/or analyze the electronic design dataand to perform timing signoff and optimization upon that design data. EDA toolsmay include multiple EDA modules-to perform various EDA-related functions relative to the electronic design, such as design and verification activities, e.g., to perform timing analysis functions.

Static timing analysis (“STA”) is one particular approach that is often used to assess the timing of any given digital circuit using software techniques and certain models that provide relevant characteristics of the digital circuit. Some of these input models may include netlists, library models, parasitic models, timing derates, standard delay format, system level constraints, delay calculation (“DC”), worst slack, timing reports, and MMMC analysis.

A “netlist” may refer to a model that defines the digital circuit that is being envisioned. Generally, a gate level netlist is provided as an input model to define the desired functionality. Various kinds of library models are required to perform static timing analysis. Some standard library models include Liberty format specified (“.lib”) library models for defining the delays of standard digital gates (e.g., AND OR, NOT, FLOP, LATCH, etc.) and MACROS, advanced on-chip variation (“AOCV”) models for performing advanced STA, models for performing SI analysis, etc. Similar to gates, for interconnects, there exist parasitic models which are generally specified in the standard parasitic exchange format (“SPEF”). Timing Derates may be used to model the variation impact generally during STA. Standard delay format is another approach which may be used to specify the input delays of gates and interconnects. System level constraints may refer to a set of input constraints that may be applied that define the desired timing that is envisioned from the digital circuit under consideration. After reading inputs, the first step that may occur is delay calculation. During this step, an STA tool receives the user inputs provided through SPEF/Library/Timing Constraints for each netlist object and generates the best and worst propagation delay of the signal flowing through each particular stage in the design. After the delay calculation step, the STA tool may calculate the worst slack of the design. The worst slack represents the timing state of the design. It generally refers to the amount of time by which the design is meeting or violating the timing requirements specified by the user. Using the delays computed at the delay calculation step, the timing tool may internally create a timing graph for the given netlist and then propagate the worst signal across each node of the timing graph. This worst signal is the arrival time needed by the signal to reach that particular node. The arrival time reaching at each sequential register may then be compared with the design clock to ensure if the signals could reach the capturing registers in stipulated clock period or not. If yes, then the design is considered to be compliant from a timing perspective; otherwise it may be reported as a timing violation. One output format of STA software is a set of timing reports that classify the entire design into various number of paths (e.g., subsections of digital circuits) and then identify if each path is meeting the set constraints.

In some implementations, timing closure and signoff corresponds to two worst corner (best and worst) analysis. Due to an increased number of process variations in lower technologies, a designer may need to signoff on various process, voltage, temperature (“PVT”) conditions. Different combination of PVT may result in a large number of corners that need to be analyzed for each design. Another set of variations comes from design modes on which a particular chip is expected to run. For example, the same wireless phone chip may operate differently while receiving the call than when in stand-by mode. Each mode may be represented through a different set of input timing constraints. The same mode may again show variation across different PVT conditions. These different modes and corner runs form the MMMC setup for a designer who needs to ensure that timing is intact for each of these combinations. One possible approach involves running all of these MMMC runs into a single run, and to then generate and review the timing of each MMMC setup. However, the corners continue to increase, the delay calculation cost also keeps on increasing.

In a hierarchical design, it is likely that the same design portions (e.g., design “blocks”) are replicated many times within the hierarchical structure of the design. In fact, most modern designs contain multiple instances of the same block, such as ASICs, GPUs, FPGAs, and/or multi-core CPUs. All of these repeated instances share the same SPEF (RC). The timing for these blocks maybe very close, especially when they are deeper in the block.

To explain, consider the example designshown in. Here, the same block is repeated three times as blocks S, S, and S. Each of these blocks include instances of the same cells within the block. For example, each of the blocks S, S, and Sinclude the same sequence of inverters Inv, Inv, Inv, and Inv. In particular, block Sincludes inverters Inv, Inv, Inv, and Inv. Similarly, block Sincludes inverters Inv, Inv, Inv, and Inv, and block Sincludes inverters Inv, Inv, Inv, and Inv.

As shown in, hierarchical instances of the same SPEF cell may be considered as “hierarchical sibling instances”. Two leaf instances are “leaf sibling instances” if: (a) they are instances of the same cell; (b) they belong to different hierarchical sibling instances; (c) their positions inside their hierarchical instances should be the same. Terminals of two sibling instances are “sibling terminals”, and nets connected to sibling terminals are “sibling nets”.

It is desirable to be able to share delay calculations for portions of the design that are repeated over and over again within the electronic design. By sharing the delay calculations, this allows the system to avoid the cost and expense of having to separately perform the delay calculations for each of the same replicated design blocks that are repeated throughout the design.

However, conventional timing analysis techniques are unable to effectively and efficiently share delay calculations across many of the replicated design blocks within an electronic design if the replicated design block have different input slews. This situation may occur for example, if a first copy of a replicated design block is in the fan-in/fan-put cone of another instance of the replicated design block. In the example design of, it can be seen that the design block Sis in the fan-in cone relative to design block S. As such, it is likely that the input slew for Sis going to be different from the input slew for S. This difference in input slews makes conventional timing analysis techniques unable to share delay calculations between these affected copies of the replicated blocks in the design. In addition, this also means that the exact input slew for Swill not be known until timing calculations have been completed already for block S.

Returning back to, embodiments of the invention solve this problem by implementing an improved replicated sharing modulethat is capable of sharing delay calculations among replicated siblings, even when inputs are different between the siblings. This occurs in some embodiments breaking the dependencies () between the siblings to generate a modified version of the design, and then permitting sharing on the versionof the design without the dependencies. At a later point in time, additional calculations can then be performed to reconcile and correct for the input slews.

In this way, runtime improvements can be achieved for most designs. This permits advanced STA optimization based on repeatable patterns in a design. In addition, out-of-order calculations can be performed without loss of accuracy even with inter-dependent blocks.

As a practical matter, users of the EDA systems will experience significant runtime improvements for large designs. In addition, runtime of STA will become less sensitive to the number of multi-instantiated blocks.

From a computing perspective, this approach can dramatically improve the operation and efficiency of the computing system. For example, by sharing the delay calculation across siblings, this means that delay calculations will no longer need to be performed individually for each and every instance of a repeated design portion within the design. This reduces the amount of memoryconsumed by the system to perform the timing analysis activities. Moreover, this approach improves the real-world performance by the processor, since less calculations will need to be performed by the processorto perform analysis over the entire design.

Some embodiments also provide a schedulerthat implements an improved approach to schedule computing resources to process the work for the timing analysis. In some embodiments, the schedulerconstructs a graph-based scheduler to process the workload, which can effectively schedule workloads for worker entities even where siblings are from different topological levels of the design. Example of worker entities which can be scheduled using schedulerinclude, for example, threads (such as threads T, T, . . . Tn), processes, tasks, containers, etc.

shows a flowchart of some embodiments of the invention. At, an electronic design is received for analysis. For example, the design may be received to perform timing closure analysis involving a delay calculation for portions of the design. The design may refer to, but is not limited to, an integrated circuit design, or any other suitable type of electronic design, such as those associated with electronic design automation tools. For example, an electronic design may refer to a combination of hardware (e.g. described by a hardware description language) and software to implement a range of functions.

At, dependencies are identified for the design. If such dependencies are found, then they are broken at step.

At step, timing analysis/delay calculations may be performed, where the delay calculations are shared across siblings. Referring to, shown is an example of input/output flow that may be associated with static timing analysis. STA analysis may receive a number of models, some of which may include, but are not limited to, netlist, library, parasitic, SIF, constraint, multi voltage data, and timing derate models. STA may generate a number of outputs as well such as the timing reports and timing models.

With the paradigm of hierarchical designs, a full chip functionality may be hierarchically divided into different sub-functional requirements and then multiple design teams work together on modeling the specific requirements. Each hierarchical block may include its own netlist, constraints and SPEF information which may then be stitched together at the top level. If there are multiple instances of the same block, the internal constraints and SPEF may be exactly the same across all instances. The interface level netlist of these instances may be receiving different inputs that may depend upon their adjacent blocks and top level netlist/constraints. However, for accurate modeling and close correlation between blocks and chip level timing, designers typically attempt to ensure a similar set of inputs that are within a certain threshold or error tolerance. Significant replication of multiple hierarchical blocks may occur at the top level netlist. These replications may be coming from reusable sub-components shared across multiple blocks and/or due to multi-instantiation of hierarchical blocks.

The delay calculations may be performed for each of the plurality of sibling nets, even for ones where a dependency has been broken for that net. A default input slew may be utilized to perform the delay calculation. Any suitable default input slew may be used as appropriate for the specific application to which the invention is applied. In some embodiments, a default input slew of 5 picoseconds is used to perform the delay calculations.

In effect what happens is that the connections between sibling nets are broken to “pretend” that that they do not have a common path. By doing so, the system can then calculate them all together with the shared delay calculations. Therefore, a stored delay calculation (“DC”) can be shared among the plurality of sibling nets. In this way, embodiments of the invention provide a significant performance enhancement for STA on hierarchical designs particularly in a C-MMMC environment as it may be configured to reduce the number of hours of STA flow runtime.

Embodiments of the system may include infrastructure to facilitate sharing of delay calculations, e.g., with respect to netlist modeling, SPEF parsing, and delay calculation operations. Netlist modeling may involve storing the sibling objects so that iteration over all the netlist sibling nodes may be performed efficiently. SPEF parsing may involve identifying and storing all of the hierarchical cells that have their own SPIT information. In some embodiments, the delay calculation may involve a preliminary step of recursive iteration over the netlist to mark SPEF siblings as well as the actual delay calculation itself, which may be configured to efficiently reuse the delay calculation across siblings.

In some embodiments, netlist level siblings may be implicitly built by building connectivity across all sibling hierarchical cells during netlist creation. This information may be stored in a searchable database and efficient iterators may be provided that can review the sibling hierarchy to access the sibling object information. Some embodiments may include a mechanism for generating SPEF level sibling information. In addition to netlist siblings, SPEF level filtered sibling information may be generated by, for example, identifying hierarchical instances of cells that have the same SPEF information.

The sharing of delay calculations may be applied to any suitable type of delay calculation technology. Some of which may include, but are not limited to, base and signal integrity “SI” for slope based delay calculation, non-linear delay modeling “NLDM”, effective current source model “ECSM”, and statistical on-chip variation “SOCV” The process to share delay calculations may include iterating over the siblings to check if the delay calculation has been performed for any of those sibling nets. The process may further include determining whether the DC has been performed, and if so, the process may include comparing the stage's input slew and/or constraint information with the siblings. If not, the process may include performing a typical, full stage delay calculation. If sharable, the delay calculation may be skipped for a given sibling and the delays/output slew from the sibling stage may be copied/shared.

Some embodiments may be configured to optimize slew/delay storage in the timing graph for multi-instantiated blocks. At the time of storing the slew and delays of the stage, it may be known whether the stage is going to share the delays with its sibling, as such, memory optimization may also be performed. For example, by storing the reference to the original block instead of duplicating the data this may reduce the memory requirement.

At step, the delay calculation for any blocks corresponding to a broken dependency undergoes a re-calculation. The reason for this step is because a default input slew value (rather than the actual input slew value) was applied in the previous steps to allow sharing of the delay calculations. As such, the result for these blocks would be incorrect. To fix this situation, re-calculations of a certain number of stages will be performed. In some embodiments, a re-calculation of approximately 2-3 stages can be performed, since in many situations this permits the slew to stabilize after the 2-3 stages. The results from the previous iterations of the block that was upstream of the breakpoint being used to perform the re-calculations.

Thereafter, at step, the analysis resultsare then generated. The analysis results may be displayed to the user on a display device. Alternatively, the analysis results may be stored in the computer readable storage medium.

shows a flowchart of an approach to implement dependency breaking according to some embodiments of the invention. At, the system identifies the hierarchical siblings in the design. At, an assumption is made regarding inputs/outputs for the hierarchical instance. In particular, an assumption is made that each hierarchical instance has the fan-in/fan-out cones for all its siblings.

Next, at, a set of steps will be performed for each group of hierarchical sibling instances. At step, a traversal is performed from all the output terminals to try and reach input terminals. At step, once reached, then the system will break that connection. At, this will continue until no connections remain between the output and input terminals. At this point, the for each loop is exited at. The process then ends at.

This process is illustrated in.re-creates the example design that was previously described.provides an illustration of a traversal of the terminals, showing a traversal that occur from the output terminal of block Sto the input terminal of block S.

As shown in, upon reaching the input terminal of block S, the connection is broken at location. In the current embodiment, the breakpoint is placed right at the input terminal to the block S, after inverter Inv. However, it is noted that the breakpoint may be inserted at any suitable location between the output of a previous block to the input of a subsequent block. For example, in the current example, the breakpoint could have alternatively been inserted between the output terminal of block Sand the inverter.

Assuming that the breakpoint is inserted at location, then the default input slew would be imposed from that breakpoint location to perform stepthat was described above. Thereafter, when re-calculations are performed, then more-accurate input slew values are applied, and re-calculations may be performed for nets at a number of stages from that point. For example, re-calculations may be performed in this example at two stages including netsand. At this point, the slew should stabilize and the updated delay calculation values should be accurate. However, it is noted that the exact number of stages to re-calculate maybe adjusted depending upon the particularly application to which the invention is applied.

Some embodiments provide an improved approach to implement a scheduler for computing resources/entities within the system. The reason for using an improved scheduler is to avoid any inefficiencies when scheduling computing resources/entities (such as computing threads) to perform work for performing the delay calculations. This is because there may be many stages of workloads, with sibling instances at the different hierarchal levels that correspond to the different stages, and it is possible to unintentionally create unbalanced stages such that certain stages may over-utilize the allocated threads (too much work for threads), while other stages may under-utilize the allocated threads (not enough work for the threads).

Therefore, some embodiments provide for an improved graph-based scheduled that solves these problems. The graph-based scheduler operate efficiently even in the circumstances where siblings may be from different topological levels.

shows a flowchart of an approach to implement an improved graph-based scheduler according to some embodiments of the invention. By way of illustration, the steps in the flowchart ofwill be described in conjunction with the graphshown in. Graphwas constructed in correspondence to the designshown in.

At, nodes of the graph are identified. In some embodiments, when representing a group of sibling instances, a single node is used to represent the entire group. The inputs to the instances are also represented as nodes in the graph. As shown in, nodesandcorrespond to inputs in the design of. In particular, nodecorresponds to input Inthat pertains to the input to inverter Invin blocks Sand the input to inverter Invin block S. Nodecorresponds to the input to inverter Invin blocks S, where this input is from inverter Inv.

Nodes,,, andrepresent groups of sibling instances in the design. In particular, noderepresents sibling inverters Inv, Inv, and Invin in design. Similarly, noderepresents sibling inverters Inv, Inv, and Invin in design, noderepresents sibling inverters Inv, Inv, and Invin in design, and noderepresents sibling inverters Inv, Inv, and Invin in design.

At, edges are identified within the graph. In some embodiments, the nets from the design are represented as the edges within the graph. In the current example of, edgerepresents the group of nets that connect from inverts Invto Invin the design, edgerepresents the group of nets that connect from inverts Invto Invin the design, and edgerepresents the group of nets that connect from inverts Invto Invin the design. Edgecorresponds to the nets that connect from input Into inverters Invand Inv. Edgecorresponds to the net that connects from input from inverter Invto Inv.

At, the input nodes for nodes in the graph are processed. In the example of, this mean that the inputs nodesand, which are input nodes for node, are processed. This means that the scheduler will schedule resources to process the input Incorresponding to node, which is the input to the sibling inverters Invand Inv. The scheduler will also schedule resources to process the input to Inv, which is nodecorresponding to inverter Inv.

Patent Metadata

Filing Date

Unknown

Publication Date

April 7, 2026

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search