IEEE 1625 Helps Promote Safety and Reliability

April 1, 2004
The IEEE standard's proven design methodologies help designers of Li-ion battery-powered systems achieve safety and reliability in their designs despite time-to-market pressures.

As the number and variety of battery-powered products increases, the marketplace becomes an attractive target for design and manufacturing companies diversifying their product portfolios. With this diversification comes quick time-to-market demands and engineers inexperienced with the new applications. This, in turn, increases pressure on the system designer to “just get the job done.”

In response to these conditions, the IEEE 1625 standard was developed by manufacturers of Li-ion/Li-ion polymer cells, battery packs, battery and power management semiconductors, and portable computing systems. The standard aims to offer design methodologies to facilitate development of reliable and safe battery-powered systems that still provide the desired features and functions.

The design methodologies described in the standard are based on lessons learned throughout the industry. As a result, IEEE 1625 can help newcomers to the Li-ion and Li-ion polymer battery-powered applications ensure the safety and reliability of their designs even when the speed of design is paramount.

What is IEEE 1625?

The standard covers design approaches that ensure reliable operation and that minimize the occurrence of faults leading to hazards in portable computing devices and other rechargeable battery-operated systems. From the beginning of the design, it is necessary to examine all facets of the portable design to ensure the reliability of the entire system, as well as that of its individual components, functions or subsystems.

The standard guides the system and subsystem designers through five major areas — system integration, cell, pack, host device and total system reliability. Also covered are the critical operational parameters and how they change with time and environment, the effects of extremes in temperature, and the management of component failure.

Gaining overall compliance requires conformance with each and every subsection of the standard. A portable computing device cannot achieve compliance without consideration of all the related subsystems — including the user (Fig. 1). To achieve compliance, designers of each subsystem must thoroughly review their designs both individually and in conjunction with other subsystems to identify faults that could propagate hazards. Once it has been ascertained that the subsystems all conform to their particular standard requirements, a further analysis is needed to assess the overall system compliance to ensure the design does not allow two faults of any type to propagate a hazard.

Applying IEEE 1625 to Battery Design

Although the designers of the overall system and its various subsystems are affected by this standard, here we focus on the designer of the battery to highlight where the standard is used and the benefits it can bring to safety and reliability of the overall system. A Design Failure Modes and Effects Analysis (DFMEA) approach is a typical example of an industry-wide methodology that can be used to highlight and prioritize the possible roots of faults and hazards.

The architecture and component selection of the battery design can be driven by factors other than the results of the DFMEA. Such factors include price, availability and size. This situation forces the DFMEA to be a living document that changes as the design unfolds.

Consider the design of an SBS1.1-compliant, 4-series by 1-parallel Li-ion battery. The architecture chosen is a new one, involving new ideas that could reduce size, components and overall cost. The architecture change makes close scrutiny of the design analysis more important because the ideas are new to all engineers involved in the design, which is where the standard provides the greatest benefit. Fig. 2 shows the change in architecture and helps clarify the DFMEA analysis.

The existing architecture has the battery management unit (BMU), first-level protection and second-level protection elements monitoring the cell stack to ensure a double level of coverage if one protection element fails. However, the new architecture has the BMU and analog front end (AFE) monitoring each other as well as the cell stack. Theoretically, this allows the second-level protection to be integrated with the first-level protection device.

The new architecture primarily consists of a BMU, an AFE integrated circuit (IC), current-sense resistor, safety FETs and a chemical fuse (Fig. 3). The BMU consists of a 3.3-V microcontroller with high-performance measurement capabilities and field-programmable flash. The AFE provides a high-voltage interface to the cell stack for voltage measurement along with cell balancing control, overcurrent protection and a low dropout regulator (LDO) to power the BMU.

Table 1. Discrete component DFMEA table.Issue Problem Level SEV Issue OCC Protection Features DET Score DSG FET short (including FET drive) Moderate customer experience 3 Discharge current cannot be stopped so battery can be continuously deeply discharged which degrades the usable cell capacity. 3 BMU will detect DSG failure and can optionally blow the fuse. Current, temperature and second-level voltage protection unaffected where the fuse is blown. 3 27 DSG FET open (including FET drive) Severe customer experience 5 Discharge is not possible, but charging is possible through the body diode, which causes very limited current and increased temperature. 3 All protection features are functional. Temperature-based protection can blow fuse if necessary due to FET heating. 3 45 CHG FET short (including FET drive) Moderate safety 8 Charge current cannot be stopped without blowing the fuse. 3 BMU will detect CHG failure and can optionally blow the fuse. Current, temperature and second-level voltage protection unaffected where the fuse is blown. 1 24 CHG FET open (including FET drive) Severe customer experience 5 Charging is not possible, but discharging is possible through the body diode, which causes very limited current and increased temperature. 3 All protection features are functional. Temperature-based protection can blow fuse if necessary. 3 45 Fuse short (including fuse drive) Low safety 6 Fuse cannot be blown. 1 All protection features are functional, so the FETs can protect the system. 2 12 Fuse open (including fuse drive) Severe customer experience 5 PACK+ not connected to cells. 1 All protection features are functional, so the FETs can protect the system. 2 10 Sense resistor open Severe customer experience 5 PACK- not connected to cells. 1 BMU and AFE will operate normally, but no current can flow to or from the system. 2 10 Sense resistor short Severe safety 9 BMU and AFE will always measure zero current. 1 All current-based protection is lost, but both first- and second-level voltage and temperature protection is still fully functional. Voltage vs. current anomalies can be detected by BMU and fuse blown. 2 18 Thermistor short Severe customer experience 5 Temperature will always read very hot. 3 BMU will turn the CHG and DSG FETs off due to overtemperature detection and optionally blow the fuse. 2 30 Thermistor open Severe customer experience 5 Temperature will always read cold. 3 BMU will turn the CHG and DSG FETs off due to undertemperature detection. 2 30

Other components are needed to enable the necessary features and reach the level of reliability required. The pack and cell sections of the standard are directly relevant in this example analysis, although these sections, in turn, can be divided into subsections to aid design focus. The DFMEA tables show the possible fault modes of key discrete components in the battery electronics (Table 1) and BMU and AFE interaction issues (Table 2).

Table 2. BMU and AFE IC interaction DFMEA table.Problem Problem Level SEV Issue OCC Protection Features DET RPN BMU to AFE I2C latch up Low safety 6 BMU voltage measurements, RAM verification and AFE control nonfunctional. 2 BMU will detect and count I2C failures and optionally blow the fuse. Fail counter needs to be able to be periodically cleared based on normal operating conditions. First-level current protection still available. 1 12 AFE to BMU VCC < VCC (min) See Table 3 AFE RST output latched high Severe customer experience 4 BMU is held in reset. 1 BMU no longer functions, so the WDI input will stop and the AFE turn off the FETs and optionally blow the fuse, 1 4 AFE RST output latched low Severe customer experience 5 BMU will never be reset on POR or AFE watchdog fault. 2 BMU will not be reset on watchdog fault, but watchdog will still turn off FETs and optionally blow the fuse. 5 50 AFE TOUT output latched high Low customer experience 1 Thermistor will always be powered. 1 AFE will consume extra current, but there is no safety issue. 7 7 AFE TOUT output latched low Severe safety 9 Thermistor will never be powered. 1 BMU will measure out of range temperature, which will cause the fuse to blow. All current and voltage protection methods are still fully functional. 1 9 BMU to AFE CLK < CLK(min) Severe customer experience 5 AFE will not function correctly. 1 The AFE watchdog will timeout, causing a BMU reset, the FETs to turn off and optionally blow the fuse. 2 10 BMU sense of CELL latched high Moderate safety 7 BMU voltage measurements will be 0 V. 2 First-level voltage protection is not valid, but second-level overvoltage is a completely separate circuit so it is fully operational. All current- and temperature-based protections are fully functional. Additional measurement validation checks can allow the BMU to blow the fuse. 2 28 BMU sense of CELL latched low Severe customer experience 5 BMU voltage measurements will be full scale (high). 2 The BMU will read all cells at overvoltage so pack is not able to be charged but can be discharged. Current and temperature protections still functioning. Additional measurement validation checks can allow the BMU to blow the fuse. 2 20 AFE XALERT latched high No issues 1 XALERT will never activate. 2 BMU polls STATUS for updates and checks XALERT, so no issues. 2 4 AFE XALERT latched low Low customer experience 2 XALERT will always be activated. 2 BMU will always try to check status and clear XALERT, so will consume more power. All protection functions are functioning. 2 8

ICs and discrete components are an important focus for the overall design. The selection of suitably rated components is not covered in this example, but the DFMEA should be extended to include the possible failure modes of the different components including the printed circuit board (PCB) traces, vias and connection points. The design analysis also should be extended beyond this example to include at least second-level faults — that is, two faults occurring independently — to be IEEE-compliant and even a third level if the fault is deemed severe.

In the tables, each item is evaluated and scored for the following criteria:

  • Severity of Fault (SEV)
  • Probability of Occurrence (OCC)
  • Difficulty of Detection (DET).

Typically, evaluation and scoring is based on experience and statistical reliability data. However, this data is not always available. In such cases, the guidance found in IEEE 1625 standard is beneficial because it forces important areas of the system to have specific protection levels and features.

From the tables, it easily can be seen that some design issues are more critical than others. The higher Risk Product Number (RPN) or Score value elements need to be scrutinized to ensure that a fault does not lead to a hazard. In the example analysis, the use of n-channel safety FETs rather than the traditional p-channel FETs is indicated (Table 3). This is a result of the DFMEA and its living nature — a high RPN value exists where p-channel FETs are used, but the RPN value is drastically reduced when n-channel FETs are substituted.

Table 3. DFMEA analysis forcing change. P-Channel Safety FETs N-Channel Safety FETs Problem AFE voltage supply to the BMU is below normal operating limits. Issue BMU is non-functional as power to BMU is lost. This could indicate that the AFE is no longer functional (worst case). Problem level Severe Safety Moderate Customer Experience Protection status A) If the AFE has not failed, then the AFE watchdog will turn the FETs off and attempt to reset the BMU. If the BMU does not recover, then the FETs will remain off and the fault will not propagate to a hazard.
B) However, if the AFE is non-functional, the BMU and AFE protection is non-functional, and the safety FETs will be on due to absence of gate drive. A) If the AFE has not failed, then the AFE watchdog will turn the FETs off and attempt to reset the BMU. If the BMU does not recover, then the FETs will remain off and the fault will not propagate to a hazard.
B) However, if the AFE is non-functional, the BMU and AFE protection is non-functional, and the safety FETs will be off due to absence of gate drive. SEV score 10 3 OCC score 2 2 DET score 4 4 RPN score 80 24

When a full-system DFMEA is completed, the prioritization of tasks and risk assessment can be made with greater confidence. Design changes also can be easily evaluated and rolled back into the DFMEA to ensure the change has not adversely affected other areas of the design.

References

Standard for Rechargeable Batteries for Portable Computing, IEEE1625.

For more information on this article, CIRCLE 341 on Reader Service Card

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!