Power Management for High Performance

July 1, 2004
You are responsible for the card-level power design in your product. You have determined what power rails you need on each card. You have chosen the best

You are responsible for the card-level power design in your product. You have determined what power rails you need on each card. You have chosen the best power converters to meet your needs with high efficiency. You have made certain they are properly derated and have adequate cooling. You have the right fuses and noise filtering. What else do you need to think about? Optimal power performance requires an effective power management strategy to tie all these major components together. In this article, we'll discuss the various aspects of power management and show you how to meet the needs of your product.

Before you can decide on the appropriate level of power management, you must first decide what that means for your product. Are you designing power for an optical switching system that will handle millions of dollars of traffic every minute and must meet “five nines” (99.999%) or “six nines” (99.9999%) availability? The expectations for the power system will not be the same as for a router or a desktop PC. Do you need redundancy and hot-swap capability? How much monitoring and protection do you need? How can the power system contribute to the overall goals for product performance?

Power Management for Reliability

For a high-performance application, reliability and availability are of paramount importance, and the power system design must support the reliability objectives of the product. Reliability is the length of time for which the system doesn't have any failures and is commonly expressed as mean time between failures (MTBF). Availability is the proportion of the time for which the system fulfills its end purpose, and is sometimes expressed as five nines or six nines. Both of these are extremely important to the end users, with the focus being on availability for large, fault-tolerant systems and on reliability for smaller less complex products.

The basic failure rate of the on-card power system itself is largely determined by the power converters, although it can be worsened seriously by the way in which they are used in a given card (for example, if the cooling is inadequate). The power converters act as building blocks that must also function together as a system to meet the needs of the product. The power management system integrates the building blocks together and is a key part of the product reliability equation. Features such as startup and shutdown sequencing for ASICs improve reliability by preventing possible latch up or damage. Monitoring for overvoltage (OV) and undervoltage (UV) is particularly important in high-availability systems, where any failure in a redundant module must be detected and flagged to allow rapid repair and avoid downtime. The power management system also can provide margining and trimming to adjust the voltages precisely, and can communicate power status to the overall product control system.

DC and AC Systems

Fig. 1 illustrates a typical on-card power system in a 48-V application, showing two isolated dc-dc converters (usually called “bricks”) and three nonisolated dc-dc converters (commonly referred to as Point-of-Load converters, or POLs). Different cards use different combinations of bricks and POLs to suit the card voltage requirements and power level.

The power management system links into all the dc-dc converters and can also link into the card processor to provide status information. Most 48-V cards require high-voltage isolation between input and output, shown as a dashed line in Fig. 1. In this case, some part of the power management function is on the primary side, although the majority is on the secondary side. Communication between the primary and secondary portion may use one or more optocouplers to transmit on-off status or commands. Alternatively, an isolated datalink using an isolation transformer can allow transmission of more detailed information such as primary voltage and current.

In this type of system, power sequencing is implemented by the power management system, making it critical to the product reliability. The power converters themselves may not fail, but if they start up or shut down in the incorrect sequence, the performance of the components on the card will be directly affected. In some cases, components can be damaged catastrophically by an incorrect power sequence. The immediate cause of the failure is the chip failing, but the root cause was a poorly designed power system.

Fig. 2 illustrates a typical ac-powered card that might be used in a server or desktop PC. In this example, an ac-dc power supply is used to produce the voltage rails that require the highest power, with POL converters to generate the lower power rails at the point where they are needed. Usually, either 3.3 V or 5 V is required in the product and can be conveniently used as the input to the POL converters. Alternatively, the ac power supply may produce an intermediate voltage such as 12 V with all output rails derived from this intermediate voltage through POL devices.

In this example, the power management is shown on the secondary side. Major functions include sequencing of the POL converters, voltage adjustment for margining and monitoring of all outputs for OV and UV. Again, it may include an interface to the system processor for status readout or control. Management functions on the primary side are contained within the ac power supply itself. This includes startup and shutdown, and may include additional functions such as temperature monitoring, fan control or primary voltage monitoring for brownout.

Sequencing

One basic function of the power management system is to provide sequencing. Many ICs that use more than one power rail require that these rails are brought up in sequence at startup and often also at shutdown. In some cases, incorrect sequencing causes latch-up; in other cases, it can cause permanent damage to the IC. Some ICs also specify a maximum voltage difference between rails, sometimes with a maximum time limit. Carefully check the specs for the ICs you plan to use and, if necessary, confirm the details with the IC vendor.

To optimize the power system for high performance, it's not sufficient to just provide rudimentary power sequencing. For example, sequencing based on time delays alone may not function correctly under worst-case conditions (such as during a momentary power interruption or when one power converter fails to start correctly). For optimum performance, the sequencing technique must guarantee the rails always follow the proper sequence even under worst-case conditions of startup, shutdown and restart.

The best way to guarantee this is to use full interlocking between rails, where each rail voltage must reach a specific threshold value before the next rail starts. Fig. 3 shows an example of sequencing with three voltage rails. At initial startup, there is a delay before the first rail (rail No. 1) starts. When rail No. 1 voltage reaches its startup interlock threshold, it initiates an interlock delay after which rail No. 2 starts, and so on.

Similarly at shutdown, rail No. 3 must reach its shutdown interlock threshold before the shutdown of rail No. 2 is initiated. For best performance and flexibility, all interlock threshold voltages and time delays should be independently adjustable so they can be optimized for the application. Similarly, an independent threshold should be used to determine when the rail voltages have decayed sufficiently to allow a restart.

Some power management approaches use a separate comparator to set each voltage threshold, whereas others use a single analog-to-digital converter (ADC) with programmable threshold settings. This second approach allows multiple threshold values to be easily achieved, because it doesn't require a separate comparator for each and can provide well-controlled sequencing.

Fault Protection

A second main function of the power management system is to provide protection against faults, primarily OV and UV. For a high reliability application, an independent overvoltage protection (OVP) is normally used, separate from any protection provided within the power converter module. This allows the OVP threshold voltage and detection time to be optimized for each rail. It also ensures the protection circuits are effective against a failure of the power converter's control circuit.

The shutdown time delay for OV conditions is extremely important and requires careful consideration. On one hand, the time should be as short as possible to offer the best protection. On the other hand, the time delay must be long enough to avoid unwanted shutdowns (nuisance trips) due to very short transient spikes, such as ESD. A false shutdown will seriously impact the system availability, making it critical to be certain the OV condition is real before shutting down. In many applications, the optimum OV detection time is on the order of 1 ms.

An automatic recovery capability allows the system to restart after a shutdown caused by a transient condition. But in the case of a hard failure, there is no benefit in continually attempting to restart. If the power management system restricts the number of restart attempts, particularly after an OV shutdown, it can reduce the likelihood of extensive damage to the card.

Traditionally, an OVP circuit is implemented using an RC time delay with a suitable time constant. However, this approach is affected by the amplitude of the OV, as well as the time duration, and therefore cannot discriminate precisely between a real OV condition and a nuisance spike.

A better approach is to use an ADC to measure the voltage and look at multiple samples to ensure that the OV condition is real before taking action. Of course, the ADC sampling rate must be fast enough to allow multiple samples during this 1-ms period.

Fig. 4 illustrates the comparison between analog and digital OV detection, assuming the designed OV detection time is 1 ms. Both analog and digital detection can give good results for a normal OV, although the analog detection typically has wider tolerances. However, for a short high-voltage spike, the analog RC detection is likely to result in a false OV trip, whereas the digital detection is unaffected by spike amplitude.

It's also useful to have an alarm threshold for OV and UV, separate from the corresponding fault shutdown threshold. This can give advanced warning of a partial overload condition or imminent failure, and allow the system controller to take action before a shutdown occurs. Warning thresholds are much easier to implement with a digital controller, because they only require an additional threshold value for the ADC. In an analog controller, each warning threshold would require another comparator circuit.

Note that (even for high reliability applications) overcurrent protection isn't normally implemented separately in the power management system. An overcurrent condition isn't caused by a failure of the power system itself, but by a short circuit or overload on the card. The protection circuit must react fast enough to avoid damage to the power converter and typically uses a pulse-by-pulse current limit within the pulse-width modulation (PWM) controller. This type of current limit provides good overload protection, supplemented by fuses to protect against catastrophic failures.

Margining

The third major function for the power management system is to provide margining. Margining refers to the capability to temporarily alter the rail voltage by a few percentage points up or down on request, and can be used in several ways to improve product performance and reliability.

  • During initial product verification, margin testing can be used to simulate extreme conditions and confirm that the performance has adequate margins to cover expected production variations.

  • During production testing, the rail voltages can be set to the high or low margins for a short period while the card performance is monitored. Any performance degradation during the margin test (such as an increase in bit error rate) can indicate a card with marginal performance that shouldn't be shipped to a customer.

  • A similar test of a card returned by a customer can help troubleshoot marginal performance that might otherwise be undetected, reducing the incidence of “no fault found.”

  • A margin test can be implemented as part of a routine self-test procedure to detect cards with degrading performance before they fail permanently. It also may be possible to use a margin test to assist with remote diagnosis, potentially avoiding an unnecessary maintenance call to a remote site.

During product verification, margining can be done using the trim input provided on almost all standard off-the-shelf power converters. A trim resistor can be temporarily connected from the trim pin to ground or to the output voltage. The same approach can be used in normal production, for example, by using a bed-of-nails fixture, although this may not be convenient.

A better method is to include margin capability in the power management system itself. The power management device can drive each trim input pin from a controlled voltage (usually generated by a DAC within the device) through a suitable trim resistor. The resistor value should be chosen so that the required margin setting can be achieved using close to the full voltage range of the DAC, giving good resolution and avoiding potential noise issues. If a custom-designed power converter is used, it's usually a simple matter to tie into the feedback portion of the control circuit in a similar manner.

Furthermore, if the power management device includes a high-accuracy ADC, the trim capability can be used to improve voltage regulation of the output rails, referred to as closed-loop trimming. Essentially, the device measures each output voltage rail periodically and adjusts the trim slightly, if required. To avoid instability, this must be done as a relatively slow accuracy correction, perhaps once per second or slower, and multiple measurements should be averaged to avoid reacting to short voltage spikes. The trim adjustment at each correction must be limited to a few millivolts to avoid overreacting to measured voltage variations due to ripple and noise.

For the power management system to provide margining, an interface to the card processor is necessary (the control and status link shown in Figs. 1 and 2). This allows the processor to initiate margining as required. For power management devices that use an ADC, the same interface can allow the processor to read out the rail voltages and power system status on demand.

The interface can even allow some power system parameters (for example, voltage thresholds or time delays) to be changed by downloading new configuration data during normal operation. Through this interface, the power system can be fully integrated into the overall management system of the product.

Design Tools

A sophisticated power management system using a digital controller has a large number of configurable parameters, particularly when there are many output rail voltages. Each rail has many variables for sequencing, voltage thresholds, time delays and margining parameters that must be set correctly. Other parameters relate to the input voltage for startup and shutdown, and to additional functions such as card keying or seating.

It's essential to have a good design tool to ensure that all these parameters are set correctly and to minimize design time. The design tool should be simple to use, and should easily accommodate changes during design and prototyping without costly board respins or new software development.

Fig. 5 shows an example of a graphical configuration tool designed for Potentia power control devices. This example shows the topology entry screen set up for the power system illustrated in Fig. 1, with two isolated bricks and three POLs. In addition, the same configuration tool provides a sequence definition screen for the power system. After entering these two screens, all required interlock thresholds, timing, and OV and UV parameters are set automatically as defaults by the tool, which dramatically reduces the design time and effort. Of course, all defaults can be changed easily when required.

Optimum power system performance isn't just about power conversion — the volts and amps, the size and efficiency — it's also about meeting the product objectives for reliability, availability and maintenance, which requires a system approach to the design. The power converters are the brawn, and the power management adds the brains. A power system that is fully integrated into the overall management system of the product can bring real benefits to the end customer.

For more information on this article, CIRCLE 332 on Reader Service Card

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!