Over the last few years, the voltages in typical equipment cards have dropped dramatically—in many cases down to 1 V or lower—while the total card power continues to soar. An increase in different rail voltages also has added complexity to the power system in the form of sequencing and tracking between rails. Meanwhile, expectations for reliability and availability are rising due to the ongoing drive to reduce equipment downtime.
There are several ways to meet the added power-system design requirements without compromising reliability. High-reliability power converters make up a key part of these solutions, but they need support from a well-chosen overall equipment architecture. Also, attention must be paid to details in the power-system integration.
ON-CARD POWER SYSTEM
Current products no longer rely on a simple 5-V power-distribution system. Nowadays, it's not uncommon to have six or more voltages on a single card. Some high-end systems may have up to 20 or more separate power rails, with most below 2 V. These very low voltages must be delivered efficiently at high current, and they must meet increasingly tight regulation, ripple, and transient specifications. Consequently, distributed power systems are now the norm, with multiple dc-dc converters on each card to generate the low voltages very close to the load.
In addition to the need for very low-voltage rails, many ICs impose requirements for sequencing and tracking between power rails during startup and shutdown. The power rails must be controlled so that the difference between them doesn't exceed the specified voltage and/or time limits, even under short-term transient conditions. Combine these requirements with the need to monitor all rails for overvoltage (OV) and undervoltage (UV) protection, and it's plain to see that card power systems have moved out of the "simple-to-build" realm.
The figure illustrates an example of a card power system. In this case, a typical product is powered from 48 V dc, such as a communications system or a high-end compute server. The dc-dc converters supply the voltage rails needed for the card and maintain the required isolation between the 48-V input and the logic outputs. In this example, a single isolated dc-dc converter (usually called a brick) generates an intermediate bus voltage of 5 V that feeds a number of non-isolated point-of-load (POL) power converters.
A wide range of manufacturers offer bricks and POL converters as standard products in many output voltage and current combinations. They can conveniently function as building blocks in a card power system. A high degree of commonality exists among manufacturers, both for the electrical performance and the physical details (e.g., dimensions and pinouts). Though the figure shows a single brick, two or more bricks are often used to generate the rails that require the highest power, with POLs for the lower-power rails. Many combinations of power converters will meet the specific needs of any particular card.
To coordinate the operation of the dc-dc converters, the card power system requires an overall management function. Some degree of management also is necessary on both the primary and secondary sides of the isolation, as shown. While details vary, power-management functions typically include some or all of the following:
Thus, a high-reliability power system requires careful attention to the power-management design, which is equally important as the choice of dc-dc converters.
Power reliability can be seen in two very different ways:
It's important to consider both of these aspects in your design. A good predicted MTBF is necessary, but not sufficient. MTBF itself offers little value to the customer if the power system shuts down for every thunderstorm in the area.
SYSTEM RELIABILITY IMPROVEMENT
Most anecdotal power reliability problems customers see can be traced back to weaknesses in system-level reliability—the component application and system qualification—rather than the fundamental MTBF of the components themselves. For example:
Although these types of problems can occasionally occur even in a well-designed system, the likelihood can be reduced significantly through careful design and thorough qualification testing. The table takes a closer look at these specific problems and offers tips on how they can be avoided.
Obviously, good power-system design is a complex, multifaceted subject that touches on the entire product and its environment. Don't underestimate the task's complexity. Furthermore, although the initial focus is on efficient power conversion, remember that the power-management functions share equal importance in achieving a good power-system performance.
Following three fundamental methods can improve the MTBF of any system. Use fewer components, make the components more reliable, and make the system function even if components fail. Each can play a part in improving power-system reliability, together with comprehensive qualification testing.
Often, component count can be reduced in the power-management system. A dedicated power-management IC can replace a large number of discrete components used for monitoring and control, such as comparators, op amps, optocouplers, and RC time delays. At the same time, a power-management IC can offer much better performance than a discrete solution, improving system reliability by accurately reporting marginal performance while avoiding nuisance trips.
For example, the Potentia PS-2610 measures each output rail voltage every 40 µs using an 8-bit analog-to-digital converter. The PS-2610 employs digital filtering to allow for fast response to a real OV condition while preventing false OV or UV shutdown due to voltage spikes.
A typical POL contains fewer internal components than an isolated brick, and the failure rate can be significantly lower. The manufacturer's quoted failure rate for a typical POL is about 200 FITs (equivalent to an MTBF of 5 million hours), whereas a typical brick is about 500 FITs (which is an MTBF of 2 million hours). On the other hand, a POL usually has lower output power than a brick, so you may need more of them to meet your total power requirement. Of course, reliability is only one of many factors when choosing power converters. But by considering reliability early in the design, you can make the best tradeoff for your application.
MORE RELIABLE COMPONENTS
Component reliability is influenced primarily by the qualification and quality-control processes used in manufacturing, as well as by the stresses applied in the application. Power-conversion reliability can be improved with a modular approach, using standard off-the-shelf dc-dc converters as components in your design. These units, which are built in high volume using an automated process with full quality control, offer excellent performance and reliability. You will avoid the need to calculate component stresses within the power converter, because the design is optimized during the manufacturer's in-house qualification.
Similarly, plan your power-management design around a dedicated power-management IC rather than a general-purpose device, like a gate array or microcontroller (MCU). A power-management design using an MCU or gate array requires extensive testing under both normal operation and fault conditions. This is to ensure that logical errors in programming don't cause incorrect behavior. Conversely, the dedicated power-management device's behavior is already fully tested and qualified by the manufacturer. Only the operating parameters (voltage levels, time delays) require programming.
To dramatically improve system reliability, design the system to be fault-tolerant. In the ideal case, an available backup instantly takes over for any component failure, leaving system performance unaffected. The term availability expresses the proportion of time for which the system performs as expected. The provision of backup components is called redundancy. In a practical system, there are limits to the degree of redundancy that can be achieved, and availability can never reach 100%. Through careful design, redundancy can provide almost complete protection against any single fault, and it can achieve 99.999% (five nines) availability or better.
Most redundant systems achieve redundancy by duplicating entire cards. For example, two identical control-processor cards can be used in a shelf, either of which can take control if the other fails. The 48-V distribution system also is duplicated, with dual 48-V feeds to each card from independent circuit breakers. If any individual circuit breaker trips, the cards still receive uninterrupted power through the second feed. In most cases, it's not considered beneficial to duplicate the on-card power system itself, since any card failure (power or otherwise) means simply replacing the card.
For effective redundancy, it's vital to report all component failures immediately to the operator for maintenance before the backup fails. In the power system, this implies not only comprehensive monitoring of all output-voltage rails, but also monitoring of fuses and power feeds to detect any loss of redundancy. Additional monitoring such as input-current measurement and thermal sensing can provide advanced warning of overload conditions and further improve reliability.
While today's power systems are more complex, high reliability is achievable. Minimizing component count can improve the failure rate and yield a high calculated MTBF. Also, with effective power management, you can implement features that improve overall equipment reliability. Remember that reliability is much more than just MTBF. Carry out thorough qualification testing of your power system to ensure it meets equipment requirements under all conditions.
|POWER-SYSTEM PROBLEMS AND SOLUTIONS|
|Power problem||Suggested solutions|
|Card draws more current than expected|
|Cards are returned NFF|
|Sequencing depends on component tolerances|
|Incorrect shutdown sequencing||Review IC specs to determine whether shutdown sequencing is required. If so, include it as part of the power-management system.|
|Cannot deliver full power at extremes|
|Overheating in system|