Reliability of High-Power Mechatronic Systems 1: Aerospace and Automotive Applications: Simulation, Modeling and Optimization [1] 1785482602, 9781785482601

This first volume of a set dedicated to the reliability of high-power mechatronic systems focuses specifically on simula

1,072 251 7MB

English Pages 312 [295] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Reliability of High-Power Mechatronic Systems 1: Aerospace and Automotive Applications: Simulation, Modeling and Optimization [1]
 1785482602, 9781785482601

Table of contents :
Cover
Reliability of High-Power
Mechatronic Systems 1:

Aerospace and Automotive Applications:
Simulation, Modeling and Optimization
Copyright
Foreword 1
Foreword 2
Preface
1 Reliability and Innovation:
Issues and Challenges
2 Reliability in the Automotive World

3 Reliability in the World of Aeronautics
4 Reliability in the World of Defense
The
5 The Objectives of Reliability
6 “Critical” Components
7 Estimated Reliability Prediction
In
8 Simulation of Degradation
Phenomena in Semiconductor
Components in order to Ensure
the Reliability of Integrated Circuits
9 Estimation of Fatigue Damage
of a Control Board Subjected
to Random Vibration
10 Study on the Thermomechanical
Fatigue of Electronic Power Modules
for Traction Applications in Electric and
Hybrid Vehicles (IGBT)
11 Exploration of Thermal Simulation
Aimed at Consolidating the Reliability
Approach of Mechatronic Components
Appendix
List of Authors
Index
Back Cover

Citation preview

Reliability of High-Power Mechatronic Systems 1

Series Editor Abdelkhalak El Hami

Reliability of High-Power Mechatronic Systems 1 Aerospace and Automotive Applications: Simulation, Modeling and Optimization Edited by

Abdelkhalak El Hami David Delaux Henri Grzeskowiak

First published 2017 in Great Britain and the United States by ISTE Press Ltd and Elsevier Ltd

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address: ISTE Press Ltd 27-37 St George’s Road London SW19 4EU UK

Elsevier Ltd The Boulevard, Langford Lane Kidlington, Oxford, OX5 1GB UK

www.iste.co.uk

www.elsevier.com

Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. For information on all our publications visit our website at http://store.elsevier.com/ © ISTE Press Ltd 2017 The rights of Abdelkhalak El Hami, David Delaux and Henri Grzeskowiak to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988. British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library Library of Congress Cataloging in Publication Data A catalog record for this book is available from the Library of Congress ISBN 978-1-78548-260-1 Printed and bound in the UK and US

Foreword 1

Predicting and then guaranteeing the reliability of an electronic system is a major challenge for manufacturers in the automotive, aerospace and defense sectors in addition to those of railway, telecommunications, nuclear and health amongst others. However, above all it is important for us, the daily users of such equipment, who must have absolute confidence in the information being transmitted and decisions made in real time. The increasing development of connected objects (autonomous vehicles, home automations, etc.) will lead to a drastic reduction of human intervention in favor of an intervention by mechatronic systems. These systems will only be able to deploy if the users have absolute confidence in the reliability of the equipment. This equipment will be decomposed according to its two major features. Firstly, in terms of the hardware which is mainly composed of electronic boards (coupled with the mechanical systems), and secondly, the real-time software that allows for the implementation of the said equipment and the achievement of the tasks expected of it. Predicting and ensuring the reliability of electronic equipment is a task that is both immense and without end. On the one hand, the number and diversity of components used to achieve these cards is very high, on the other hand, the new features of such innovative equipment require multiple tests on their inherent reliability and robustness. Before this immense work, a few industrialists (such as Thales Air Systems, Valéo, Safran, NXP) of SMEs (the likes of Areelis, MB Electronique, Ligeron, statXpert, Lescate, Serma, PAK), supported by private laboratories (including CEVAA, Analyses et Surface) and public entities (together with LNE, GPM, LAMIPS, INSA Rouen) have embarked on a

xii

Reliability of High-Power Mechatronic Systems 1

process of setting up resources and skills dedicated to the reliability of highpowered mechatronic components and systems. This association of complementary partners made its debut in the framework of the first program dedicated to reliability, AUDACE or “Analyse des caUses de DéfaillAnce des Composants des systèmes mécatronique Embarqués” (or its English equivalent: the analysis of the causes of defective components embedded in mechatronic systems). This initial project was a great success. It has made it possible to create strong links between the various partners and to set up methods of analysis and measurement that perform extremely well. However, at the same time, it also highlighted the immense scale of the task and the diversity of components and technologies to be mastered. At the end of the first contract, the collective decided to continue the groundbreaking work through a second program: FIRST-MFP or “Fiabiliser et Renforcer des Systèmes Technologiques mécatroniques de forte puissance” (which translates in English as: improved reliability and strengthening of high power, technological, mechatronic systems); in order to address the components specific to electronic power. In effect, the concept of power (ranging from a few KW to several hundred KW). The electronics must be able to cope with the stresses that could otherwise lead to fatigue failures not commonly encountered in low power electronics. The digital modeling, multi-physical testing and the consideration of multiple variables of uncertainty, have led to the development of this follow-up research program: FIRST-MFP. With competitiveness clusters of Astech and MOV'EO, the Aéronautique Normande NAE, the regions of Normandy and Ile de France, as well both the Chambers of Commerce for Rouen and Versailles, this program was able to be implemented and has since achieved exceptional results. In order to share these results with not only the economic actors involved in the reliability of systems, but also with students in the fields of electronic, mechanical and material research, it was decided to record all of the results of this program and publish them in the format of a book. In fact, given the richness of the results, it was decided that two books would be better suited to the task, and with this in mind, I would like to thank very warmly Misters Abdelkhalak EL HAMI, David DELAUX and Henri GRZESKOWIAK for their remarkable work in the implementation of these two volumes, as well as all the participants in the FIRST-MFP program who

Foreword 1

xiii

spent many hours collating their results into a format that could be more easily presented in this production. As such, it should go without saying, that the essential information presented here does not remain the property of a few, but rather is shared by numerous engineers, technicians, researchers and students. Volume 1 is devoted to the presentation of various issues and deals with the modeling and simulation aspects that are essential to the prediction of the performance reliability of future electronic systems. Volume 2 is the compilation of aggravated and accelerated tests carried out on different types of components and high-power subsystems. Together in these two volumes you will find information that is essential and indispensable for the innovation of future equipment that will be integrated into the cars, planes and helicopters of tomorrow. I would like to thank all the contributors of this program as well as the financiers (both national and regional) without whom this project could not have succeeded. It is my deepest wish that the solid alliance which came about as a result of these two programs, Audace and FIRST-MFP, continue their association in view of the many emerging technologies whose reliability must be evaluated. Philippe EUDELINE

Foreword 2

The World of Harsh Environments and High Reliability Demands: Challenges and Solutions The importance of quality and reliability to a system can hardly be disputed. Product failures in the field inevitably lead to losses in the form of repair costs, warranty claims, customer dissatisfaction, product recalls, loss of sale, and in extreme cases, loss of life. Along with continuously increasing electronic content in vehicles, airplanes, trains, appliances and other devices, electronic and mechanical systems are becoming more complex with added functions and capabilities. Needless to say that this trend is making the jobs of design and reliability engineers increasingly challenging, which is confirmed by the growing number of automotive safety recalls. These recalls are triggering an increasing number of changes for preventative measures with OEMs and government regulators producing a number of functional safety standards and other government and industry regulations, all demanding unprecedented levels of quality, reliability and safety in future electronic systems. Besides the human life aspect of safety recalls, these automotive campaigns cost millions or sometimes billions of dollars, which can eventually put a company out of business. The present book Reliability of High Power Mechatronic Systems, edited by A. EL HAMI, D. DELAUX and H. GRZESKOWIAK, is intended to expand our knowledge in the field of reliability in general and in Automotive and Aeronautical applications in particular. New developments in the automotive industry are focusing on three major directions: vehicle autonomy, connectivity and mobility. This brings forward

xvi

Reliability of High-Power Mechatronic Systems 1

further challenges and the need for further advancements in the areas of software reliability, automotive vision systems, vehicle prognostics, driver behavior, cyber security, advanced driver assistance systems, sensor fusion, machine learning and other related fields. On top of that, the ever-increasing demand for “intelligent” safety features and improved comfort in vehicles has led to a corresponding boom in mechatronics. The mechatronic systems (fusion of mechanical, electronic and computer systems) presented in this book are revolutionizing the automotive industry. Application of these devices in the automotive, aerospace, defense and other industries, where products are expected to be subjected to harsh environments such as vibration, mechanical shock, high temperatures, thermal cycling, high humidity, corrosive atmosphere and dust, adds another layer of complexity to the product design and validation process. The goal of meeting the product specifications and the need to assess the future product’s reliability even before the hardware is built, brings forward the importance of understanding the physics of how devices work and especially how they fail. Physics of Failure (PoF) is a necessary approach to the design of critical components which often utilizes accelerated tests based on validated models of degradation. This understanding of failure modes and failure mechanisms is critical to a successful Design for Reliability (DfR) process as opposed to a more conventional test-analyze-and-fix approach which is still often practiced in many industries. DfR is the process of building reliability into the design using the best available science-based methods, which is quickly becoming a must in the age of relentless cost cutting and development cycle time reduction. In the quest to reduce carbon emissions and save energy, the production of hybrid and electric vehicles has been continuously growing, accelerating further development of power electronics systems. This combined with the development of self-driven vehicles, will require more powerful advanced Integrated Circuits (ICs). The large packages and higher power dissipation of these advanced ICs present thermal and thermo-mechanical expansioncontraction fatigue challenges. The continuous trend of ICS’ feature size reduction potentially presents a reverse trend in reliability and longevity of these devices. Smaller and faster circuits with an increasing number of transistors cause higher current densities, lower voltage tolerances and higher electric fields, making ICs vulnerable and more susceptible to wear-out type failure mechanisms. In applications of variable frequency motor drives applications commonly used in hybrid and electric vehicles, Insulated Gate Bipolar Transistor

Foreword 2

xvii

(IGBT) modules are widely used power semiconductor devices. The fast switching characteristics make IGBT-based converters more and more attractive for a variety of power electronics applications. The severe environmental conditions and the stringent requirements in terms of system availability and maintainability impose high reliability levels on single IGBT modules. An important requirement covered in this book (Volume 1: Simulation, Modeling and Optimization) is the ability to withstand power cycles. Hybrid and Electric vehicles experience a large number of power cycles (up to a million) during their life time with high voltage and/or high current and heavy transient loadings which cause temperature changes, leading to mechanical stresses that can result in a failure. Hence IGBTs are susceptible to thermo-mechanics activated failure mechanisms, in particular to the bond wire lift-off mechanism, leaving room for reliability improvement of IGBTs in a number of applications. The reliability of individual components was brought back into focus with the proliferation of functional safety standards, where reliability prediction based on the failure rates of the individual components is required to assess the Safety Integrity Level (SIL) or ASIL for Automotive standards. It is important to note that meeting these SIL requirements often requires the failure rates to be in single or double digits FIT (failure per billion hours), which is 1–2 orders of magnitude lower than what was expected a couple of decades ago. Despite continuous growth in automotive electronics, the consumer electronics industry is still the main driver of the IC market. This presents an additional challenge to the “harsh environment industries”, creating situations where it is often difficult to find automotive grade parts suitable to withstand high temperatures, vibration and other environmental stresses. This forces design engineers to search for new solutions, often adding air or liquid cooling of high power dissipating ICs, hence increasing the complexity of the system and making it difficult to accelerate the testing of these systems. In some of the applications the maximum operating temperatures are now approaching maximum allowable temperatures for the operation of silicon circuits, thus making test acceleration more difficult and sometimes impossible. Therefore, degradation analysis and prognostics may become the way of product testing and validation in the future. Overall, the limitation of physical testing will lead to more reliance on modeling, requiring better understanding of internal design, failure modes and failure mechanisms, many of which are covered in volume 2 of this book (Volume 2: Issues, Testing and Analysis).

xviii

Reliability of High-Power Mechatronic Systems 1

However, these apparent difficulties and the ever growing list of engineering challenges also makes the job of a reliability professional more motivating and exciting. It also facilitates growing influence of reliability professionals on the decision-making process during a product development cycle. However, despite its obvious importance and the continuous expansion of engineering knowledge, quality and reliability education is paradoxically lacking in today’s engineering curriculum. Very few engineering schools offer degree programs or even a sufficient variety of courses in quality or reliability methods. Therefore, the majority of reliability and quality practitioners receive their professional training from their colleagues on the job, professional seminars, journal publications and technical manuscripts, like this one. We hope that the readers will find this book helpful in exploring the expanding field of mechatronics and power devices, understanding how they work and how they fail, and ultimately helping them meet the numerous reliability and design challenges their industries will face for years to come. Andre KLEYNER

Preface

In relation to the perpetual search to improve industrial competitiveness, the development of the methods and the tools for the design of products appears to be a strategic necessity in relation to the crucial need for cost reduction. Nevertheless, a decrease in the cost of design should not impair the reliability of the new systems proposed which also need to progress significantly. This book seeks to propose new methods that simultaneously allow for a quicker design of future mechatronic rupture devices at a lower cost, to be employed in the automotive and aerospace industries, all the while guaranteeing their increased reliability. On the basis of applications for new, innovative products, “high power components and systems”. The reliability of these critical elements is further validated digitally through new multi-physical and probabilistic models that could ultimately lead to new design standards and reliability forecasting. As such, this book subscribes to the field of embedded mechatronics, which can be understood as a key element in the competitiveness of companies located in the automotive and aeronautical sectors. This technology combines mechanics, electronics, software and control-command. The combination of these technologies results in mechatronic systems. A system is a complex set of functions subject to randomness (triggering systematic errors, bit flips, hardware failures), which provides a defined service regardless of its internal state, the state of its environment and the level of stress applied.

xx

Reliability of High-Power Mechatronic Systems 1

The functional structure of systems (software and hardware) has become de facto complex and variable. Preventing and eliminating mistakes is an expected part of the development and verification processes. The potential causes of failure are manifold. They relate to hardware, software and development environments. Non-consistency, combinations of latent or dormant errors, depending on the state of the system and the complexity of the applications, make analysis difficult. The processing of errors (detection and recovery), at the cost of increasing the complexity levels of the system, brings about a better likelihood of good behavior of the system. The evaluation of the reliability performance (part of the RAMS performance of a product) of complex embedded systems requires the development of new approaches. In systems that integrate software, the reliable structure of functions depends on the software. The search for event sequences leading to system failure must therefore involve both software and hardware. The method should contribute to the qualitative and quantitative analysis of the safety of these systems and microsystems. The modes of failure of critical components and mechatronic systems, to date, remain largely uncontrolled. To improve the current level of knowledge we will focus on the study of five mechatronic systems representative of the industrial participants: two inverters, two converters and one tuner. To increase the competitiveness of their mechatronic devices, automotive and aeronautical equipment manufacturers need to innovate both the design and assembly processes in order to reduce product development time. In addition, these innovative products must combine excellent functional and operational performances, including that of reliability, in order to fully meet global market expectations. Expectations for reliability among car manufacturers will certainly increase with the strong market penetration of electric or hybrid vehicles projected on the 2020–2025 horizon to be 10% of the market. To these expectations of operational reliability there is the added need to quickly remove the risks of immaturity associated with product innovations. This need is strongly linked to the minimization of vehicle development times.

Preface

xxi

In the field of aerospace, the requirements primarily relate to the forecasting and control of costs resulting from failures which occur during the commissioning, warranty period and operation of the aircraft. In future, new contracts for the sale of aeronautical equipment will increasingly focus on sales at the time of operation. Although the aerospace sector has relatively low production volumes when compared to that of the automobile sector (in terms of the number of units per product type), the financial stakes are higher and, in fact, aeronautical manufacturers oversize their mechatronic components so as to give a little leeway in the event of misunderstanding the true nature of the problem at hand. Better prediction of the associated failures and risks would help to better address the three main challenges facing the aerospace industry: – improvements on reliability as represented by a decrease in the rate of removal and a reduction in the maintenance cost as represented by the Direct Maintenance Cost (DMC); – improving the detection of reliability problems in order to avoid the problems inherent to detections being made too late in the life cycle of the mechatronic systems, and the associated risk of having to repeat some of the equipment design in the case of structural weakness; – preparing for the future, with expert mastery over reliability, service life and equipment replacement. The intensification of competition in all industrial sectors, particularly in the automotive sector, leads to ever-increasing and increasingly complex technological content of products and thus to the acceptance of an increase in risk-taking in terms of reliability. In the automotive world, several vehicle launches have been compromised as a result of such a problem. Although recall decisions are becoming increasingly commonplace within the industry, they are nevertheless a relatively recent phenomenon. As evidenced by the events of the 1970s concerning Ford and General Motors, an era in which manufacturers were still very reluctant to recall vehicles suspected of having major defects. The cases of the Ford Pinto and the Chevrolet Malibu, whose tanks had serious design flaws that might have caused vehicle fires in the event of a collision, even at very low speeds, finally seem to be a thing of the past. The decision to not proceed with massive recalls after the discovery of defects followed a detailed cost-benefit analysis [SCH 91]. This weighed the potential technical cost of the recall with that of the potential cost of judicial

xxii

Reliability of High-Power Mechatronic Systems 1

damages. The terms of such arbitrations have now changed due to better consumer information and greater severity of court decisions that favor victim compensation, which in turn can lead to severe penalties against the unscrupulous manufacturer. Taking these changes into account, and combined with a surge in vehicle malfunction (in particular due to the massive introduction of electronics), has led to a certain trivialization of recall campaigns. For example: – in 2013, during the month of April alone, some 3.39 million Japanese cars, from brands including Toyota, Nissan, Honda and Mazda, were recalled worldwide because of a potentially faulty passenger airbag; – the American manufacturer Chrysler also announced the recall of 30,000 models of SUV dating from 2012; – furthermore, the Japanese car manufacturer Mitsubishi announced the recall of 4,000 of its electric and hybrid vehicles. The cost of such recall campaigns is difficult to estimate. As an indication, the firm Volkswagen has calculated that the recall cost of 384,000 vehicles equipped with DSG gearboxes could be around $1,500 per vehicle, giving a sum total of $600 million, and this is without even taking into account the harm to the group’s reputation. Faced with the potential costs of such campaigns and their consequences, the precautionary solution is an increase of investments upstream, in the production of systems and components in order to improve their reliability, and as a result, the reliability of their cars. A mechatronic system is a physical action controlled by a smart black-IT box. A simple and well known example is the Electronic Stability Program (ESP) or EBV (Elektronische Bremsen Verteilung), which is offered by all car manufacturers. The range of components that fall within its scope of application is broad: – low and high power autonomous actuators; – various types of sensors (pressure, temperature, imaging, etc.); – energy conversion, storage and management; – active and passive components; – control laws and embedded software;

Preface

xxiii

– communication systems, including wireless technologies etc. The undeniable challenge for these two major strategic industrial sectors is thus: – for the automobile market, according to a study published in the Grandes Ecoles Magazine no. 54, “the world market for carbon-free vehicles is rapidly expanding. With 4.5 million vehicles to be created by 2025, France is expected to generate €12 billion a year, and reduce its CO2 emissions by 3%, and its imports of fossil fuels by 4 million tons of oil. The experts at Mov’eo estimate that components and systems will be directly impacted by the FIRST-MFP research program and will constitute 10% of the cost of an electric vehicle, or €1.2 billion per year; – concerning the aeronautical market, according to Safran estimates, the global market for aeronautical electronics could, in the long term, reach 4 to 5 billion dollars. The firm decision details this impact through the study of the weight of electronics in aircraft: “Electronics represents 6% of the cost of an A320 civil aircraft (€3.7 million) and 10% of the cost of an aircraft like the A380 (€20 million), and the average annual growth of electronics would be more than 6%.” A low-end estimate of the impact from the FIRST-MFP program on aeronautical electronics is 5%, that is 15 to 20 million euros. These two volumes are dedicated to the Reliability of High Power Mechatronic Systems, of which, Volume 1 is dedicated to the Aerospace and Automotive Applications – Simulation, Modeling and Optimization. Chapter 1 exposes reliability and innovation with the issues and challenges. Innovation is today a strategic activity for French and European companies in key industrial sectors such as transport or defense, which are increasingly competitive and demanding. Considered as one of the leaders in terms of the competitiveness of companies and in particular, one of the best responses to the growing competition from emerging countries, innovation is today at the heart of European territorial issues. In the last 10 years, public financial support for innovation has increased considerably, irrespective of the political powers in place. However, for these previous key sectors, technological innovation can be seen to be particularly complex in that the inherent risk in the innovation process is of major importance in terms of reliability, management and reduction of operational malfunctions, the achievement of target performances etc. These latest innovations may under no circumstances be at the expense of the reference values, such as the reliability and safety of the products concerned.

xxiv

Reliability of High-Power Mechatronic Systems 1

Chapter 2 is dedicated to reliability in the automotive world. Reliability and the automotive world are intimately linked through their histories, their stakes and their futures. This section will highlight these three aspects, taking into account the global context of an ever-changing industry. There are many publications on reliability with each shedding a different light on the topic. However, the automotive world is constantly in search of innovation, production, mobility, autonomy, and speed. In 1900, the United States had only 8,000 cars and a few hundred kilometers of road. Currently, there are more than one and a half billion motor vehicles and millions of miles of good roads worldwide. This economy uses and develops techniques and methods which are increasingly important to the assurance of vehicle owners, in terms of total safety and comfort. Chapter 3 presents the topic of reliability in the world of aeronautics. Improved aircraft reliability has facilitated the development of the civil aeronautics industry. These gains were made possible through several factors, including the use of feedback, the implementation of new technologies to limit the impact of human factors, and the improvement of the developmental and maintenance processes. The demand for ever safer products motivates manufacturers to develop new concepts and methods aimed at increasing the reliability of systems. Chapter 4 focuses on reliability in the world of defense. The defense industry is known for its characteristically long development cycles: for example, five to seven years for a new missile. This requires the use of highly innovative technologies; or otherwise running the risk of international competition achieving these breakthroughs ultimately at your expense. New design, or new technologies (components and processes of transfer of these components to PCBs) or new conditions of employment, confer to the product the character known as a “product in revolution” (as opposed to a “product in evolution”). The maturation of such a product in revolution assumes validation through experimentation, by testing at different stages of the development process as well as at the different assembly levels of the product. Chapter 5 presents the main objectives of reliability. The predictive reliability of an electronic device must be quantified as precisely as possible. This involves formulating reliability targets according to the functional and environmental mission profile assigned to the device and its technological structure. This step in the formulation is important because it forces the designer to collect and quantify all the theoretical considerations needed to accurately estimate and guarantee the level of reliability of the product during

Preface

xxv

its life cycle. In the preliminary design stage, the problem posed is thus not soluble in its generality given the heterogeneity of the devices that are comprised of an assortment of degradable and non-degradable components. In this chapter, we propose a methodology that is simple to implement, based on the simplification of assumptions whose validity is therefore justified. Chapter 6 looks at the discourse surrounding critical components. The notion of criticality conceptually refers to electronic and mechatronic devices whose heterogeneous technological structure includes some components whose reliability is insufficient. In order to not degrade the reliability assigned to the device, these components must be replaced periodically during its lifetime. A typology aimed at identifying a “critical” component, combined with a methodological approach, allows us to understand its influence on the reliability of the device. A concrete study of the expected lifetime of a “critical” component shows the theoretical formulation of models, the physical analysis of failures and the experimental implementation and interpretation of accelerated test results. Chapter 7 is dedicated to predictive reliability. It presents the standards regularly used in the estimation of failure rates for electronic components in the industry. After a brief historical background to cover the issues at the time of the creation of these four guides, we will study the different models that compose the four types of critical components: aluminum electrochemical capacitor, film capacitor, IGBT and power choke. In the second part, we will utilize mission profiles taken from real applications across different industrial sectors. We will therefore be able to compare the estimates provided and explain both the differences between the guides and the points of comparison. Chapter 8 presents a simulation of the degradation effects on semiconductor components in order to ensure the inherent reliability of integrated circuits. The design cycle until now only guaranteed the electrical performance of integrated circuits (ICs) before they age. Accelerated aging trials, lasting several months, complement each other to guarantee reliability. In the event of degradation, a reactive design correction will be necessary, leading to a delay in market release. A design cycle that is subject to delay is therefore not under control. This study aims to ensure the required level of reliability for an integrated circuit at the design phase. The new design methodology requires a component aging simulation tool. Reliability tests on manufactured circuits will, in this portion of the study, consolidate the results of the reliability simulation up to the refining of the individualized models according to each component. Once the components of a technology are

xxvi

Reliability of High-Power Mechatronic Systems 1

properly modeled, the number of reliability incident occurrences must tend towards 0. The market release is brought forward and the design cycle comes under control. This makes it possible to hope that in the long term this technology will be put on the market as soon as the first silicon is available, without the need to wait for accelerated qualification tests. The circuits are guaranteed to be conceptually reliable. This chapter aims to establish the level of advancement in terms of: – the evolution of the performances of integrated circuits from the study on the aging of transistors that constitute them; – the advantage of integrating the study on the reliability of integrated circuits during the design phase using dedicated tools. Chapter 9 presents a study on the thermomechanical fatigue of electronic power modules for electric and hybrid vehicle (IGBT or Insulated Gate Bipolar Transistor) traction applications. On-board electronic systems are often exposed to different types of loading due to their operating environment. These loadings are represented mainly by thermal and vibratory effects. In this chapter, we limit ourselves to vibration loads and the effect of random vibrations on an industrial application. The S97 Valeo demonstrator is looked at within the framework of the FIRST-MFP project. The behavior of the control board of the DC-DC converter inverter of a hybrid vehicle is the subject of a study to estimate fatigue damage of braze joints. The random stresses as well as the properties of the materials chosen produce a wide range of uncertainties. Experimental tests as well as digital simulations will make it possible to identify the propagation of uncertainties that influence the behavior of the structure. Chapter 10 presents a study of a control card subjected to random loading and an estimation of damage by fatigue. The current trend in the field of rail transport is to integrate increasingly powerful power modules in progressively smaller volumes. This poses several problems, particularly in terms of reliability, given that during their operating cycles, the semiconductor switches and their immediate environments are subjected to severe thermomechanical stresses. This can lead to their destruction and therefore the failure of the energy conversion function. The main objective of this chapter is to describe the digital approach used to simulate the electrothermomechanical behavior of an IGBT power module able to characterize these constraints. Subsequently, this digital modeling also served as the basis of a reliability study to estimate the lifetime (in terms of thermomechanical fatigue) of these electronic components.

Preface

xxvii

Finally, Chapter 11 is dedicated to the fluidic and thermal characterization of a fan and cooling radiator for a converter inverter. The objective of this study is to develop and exploit digital codes aimed at consolidating the approach for the reliability and robustness of mechatronic components, specifically the thermal component. The support for this work is the study of the thermal behavior of a component (power converter inverter) maintained under thermal conditions by the provision of a forced air flow initiated by the coupling of a fan and a radiator. One hundred and fifty-two configurations have made it possible to simulate the influence of aging or the failure of certain elements of the device on the thermal of the component. From the CDF-3D digital calculations, reliability laws have been obtained to predict the temperature of the components according to the operating scenario.

Abdelkhalak EL HAMI David DELAUX Henri GRZESKOWIAK June 2017

1 Reliability and Innovation: Issues and Challenges

Innovation is today a strategic activity for French and European companies in key industrial sectors such as transport or defense which are becoming increasingly competitive and demanding. Technological innovation can be seen to be particularly complex in that the risk inherent in the innovation process is of major importance in terms of the reliability, management and reduction of operational malfunctions, the achievement of target performances, etc. These latest innovations may under no circumstances be at the expense of reference values such as the reliability and safety of the products concerned.

1.1. Introduction As a theme that has been predominant for a few years now, innovation is widely used in corporate and government communications. Considered as one of the spearheads of business competitiveness amongst companies and in particular as a one of the best responses to the growing competition emerging from developing countries, innovation is today at the heart of European territorial issues. Over the last ten years, public financial support for innovation has increased considerably, irrespective of the political powers in place. An inventory of public support by the National Commission for the Evaluation of Innovation Policies (CNEPI or “Commission nationale d’évaluation des politiques d’innovation”) showed that €10 billion, half a percentage point of Chapter written by Claire LARIVOIRE, Fabien MARTY and David DELAUX.

2

Reliability of High-Power Mechatronic Systems 1

France’s GDP, is now devoted to supporting innovation amongst the principle public actors and for the most part provided by the state, provincial regions and Europe. This desire to encourage even more emphatically the transition of our system towards an economy that is increasingly dependent on innovation is complemented by an institutional structuring that has given rise to major contributors who ensure that this is being implemented. These include: – the Office of the the Commissariat-General for Investment (CGI) which implements the “Programme d’investissements d’avenir” or PIA (an investment program for the future); – the Banque Publique d’Investissement (Bpifrance) which assists companies in all phases of their development by financing their innovation efforts. According to the financial statement on public funding delivered by France Stratégie in January 2016, the innovation programs of the PIA represent 57% of direct government support with Bpifrance financing a further 37% (including the actions of PIA managed by Bpifrance). The main purpose of this considerable financial support contributes to the increase in the capacities of private research. Indirectly, it also promotes increased economic spin-offs for public research through collaborative projects with key industrial players. 1.2. Innovation: spearheading competitiveness It was the Austrian economist Joseph Schumpeter (1883–1950) who in the 1930s introduced the notion of innovation for the first time, which he believed to be the basis of economic cycles and what he would later term as “creative destruction”. The economist defines innovation as “the execution of new combinations” carried out by an “entrepreneur”, a person with initiative, authority and foresight. Today, beyond the entrepreneurial frame of reference, innovation plays a significant role throughout a company and even beyond it [KOR 09]. However, the definition of the term “innovation” varies according to the point of view and context within which it is being applied. In 1980, Barreyre distinguished three contexts in which the term innovation is employed [BAR 80]:

Reliability and Innovation: Issues and Challenges

3

– the global process of creation in which a combination of given elements is used to create a new configuration; – the process of adopting an innovation in a society; – innovation itself. Defining the term innovation requires an interested in the different typologies of this last context. From these three contexts mentioned above, further diverse definitions can be extrapolated: – Schumpeter [SCH 39]: “The introduction of new goods (...), new methods of production (...), the opening of new markets (...) and the establishment of a new organization.” – Barnet [BAR 53]: “Innovation is the set of thoughts, behaviors or elements considered as new because they are qualitatively different from other pre-existing forms.” – Barreyre [BAR 87]: “Innovation [is] a process whose outcome is an original realization which has attributes that create value.” – Urabe [URA 88]: “Innovation consists of generating new ideas and putting them into place through a new product, process or service, leading to a growth of the national economy and an increase in professional activity as well as the creation of a real profit for the innovative activity of the company.” – Porter and Stern [POR 99]: “...the transformation of knowledge into new products, processes, services – involves more than science and technology. This involves the discernment and satisfaction of customers’ needs.” – Rogers [ROG 03]: “An innovation is an idea, practice, or object perceived as new by an individual or a unit of adoption.” – Drucker [DRU 07]: “a change that creates a new dimension of performance.” – Korijanova [KOR 09]: “Innovation is a result, a novelty and a success; that is to say, a new product of industrial quantity manufactured and marketed whose implemented economic value has reached the threshold of profitability.” – Berkun [BER 10]: “a considerable and positive change.” – Kaplan and Warren [KAP 10]: “The use of new technological knowledge, and/or new knowledge on the market, employed in a business

4

Reliability of High-Power Mechatronic Systems 1

model that can deliver a new product and/or service to customers who will buy it at a price that will generate profits.” Over the past few decades and as a consequence of ever-changing contexts, this definition has evolved, with some adopting a realistic account of innovation as a distinct offering with a higher added value compared to that of the existing product. This differentiation with that which is existing is accompanied by a strategic willingness from companies to stand out from their competitors. In this, competitiveness stimulates the innovative activity of companies, which are always more anxious to occupy the cutting edge of the industrial scene. Innovation is in fact usually generated by a multitude of triggers: societal trends, company strategy, proposals from internal and external collaborators, competing solutions, emerging technologies... The latter corresponds to the model commonly called “techno-push”, often initiated by “Research & Development” services. It is the object of all the secrets and intellectual protections that ensure maturation under the best conditions so as to meet a strong need: regulatory, economic, environmental changes and divergences in trends, needs and usages. All these developments justify the industrialist investment in ever more innovative technologies. Innovation is thus a strategic activity for French and European companies in key industrial sectors such as the transport sector, which is becoming increasingly competitive and demanding. Guillaume Devauchelle, Vice President of Innovation and Scientific Development of the French automotive supplier Valeo, explained in a radio program produced by Sud Radio in October 2016, that 45% of the group's turnover comes from sales “that did not exist three years ago”. Nevertheless, there are different strategies for the implementation of the major concepts of innovation closely linked to the preferred form: technological innovation, the business model, user experience… Technological product innovation is defined by the development and marketing of a product with a visibly different value for its user or consumer. Even if, in this regard, we have recently seen the emergence of “low-cost” approaches through “frugal innovation”, these can in no case be to the detriment of reference values such as the “reliability” and “safety” of the products concerned. 1.3. Reliability: a major issue In the automotive, aeronautic and aerospace sectors, technological innovation is particularly complex in the sense that the risk-taking inherent to the innovation process takes on major challenges in terms of safe operation,

Reliability and Innovation: Issues and Challenges

5

management and reduction of operational failures, achievement of target performances, etc. Moreover, the competitiveness of industrial companies in these sectors is increasingly dependent on the reliability of these mechatronic systems. Increasing the use of embedded electronic systems in the transport systems of tomorrow, coupled with the increasingly high demands of contractors, has reinforced reliability as a strategic element. In order to ensure the sustainability of their business relationships, suppliers must be irreproachable in terms of quality, reliability and fault tolerance [HAM 13]. By definition, the reliability of a component or system refers to the likelihood that the component or system will function without fail and perform its functions for a specified period of time under specified conditions. Since the 1940s, major automotive and aerospace companies have been sensitive to this issue: General Motors, NASA, Airbus, Air Force, Citroën, etc. [VIL 88]. Driven by a desire to reduce operating costs associated with failures, companies have raised their requirements in terms of the level of reliability. The need for manufacturers to increase the reliability of their onboard power electronic systems is particularly necessary in order to meet the strong market demand, particularly in the aeronautics sector, which demands a constantly higher production rate. The multiplicity of such high reliability applications in these strategic industrial sectors justifies the implementation of new tools and methods to ensure a high level of reliability. While the majority of solutions to increasing reliability and industrial quality are mostly scientific and technical in nature, it is relevant to include here both organizational and managerial components. The latter are especially important in the context of designing an innovative solution both effectively and efficiently. The evolution of our society has been particularly affected by the phenomenon of globalization, which is characterized by the possibility of simultaneously producing and selling a similar good anywhere on Earth. As shown in Figure 1.1, this phenomenon began to rise dramatically during the 1960s, a trend which continued through to the 1990s from which point it has been increasing even more sharply with each year that passes. This new borderless map of industry reinforces the problem of hypercompetition between goods manufacturers, countries and populations. Extremely efficient means of transport, as demonstrated by the curves in the graph above, now allow for greater geographical flexibility in the management of production sites. This new paradigm requires manufacturers

6

Reliability of High-Power Mechatronic Systems 1

to continue optimizing development, manufacturing and associated costs. This can obviously affect the reliability of the equipment being designed. This observation is particularly acute due to a sharp increase in consumer access to information. Indeed, more and more information is available in real time via the Internet. A consumer is therefore able to judge the reliability, or potential problems, of a product in near real time and adapt his or her purchasing choices accordingly. Billions of Passengers-Km

Passengers in billions of

Figure 1.1. People and cargo transported by plane in the world (scale in billions of passengers-kilometers and billions of tons-kilometers) [ECO 02]

This cannot occur without influencing the evolution of our society, obviously sensitive to our consumer attitude. Indeed, the consumer of the twenty-first century is a true consumerist. The English etymology of “consumerist”, coined as early as 1915 by the Oxford English Dictionary, carries the literal meaning of “consume”. However, the meaning of this word tends to be applied to our society in terms of a real ideology where the consumption of goods is of capital importance, a phenomenon that continues up unto excess. In this way, the consumer will seek to have a new product quicker, more often, and that integrates more and more functions or technologies. The case of mobile phones is a perfect illustration of this “consumerist” phenomenon.

Reliability and Innovation: Issues and Challenges

7

Finally, the reliability issue known as “safety risk” is also to be considered as technology takes an increasingly important place in our everyday lives, offering an ever-improved level of comfort in life. The technological phenomenon of the “autonomous vehicle” characterizes this assistance framework, that is to say, assisted decision-making in terms of driving. However, this technology raises the question of the safety of people, both in the perimeter of the automobile and in the greater environment. A car radio failure will not have the same impact as the failure of an airbag system (Chapter 2 will address this issue in more detail). In other words, reliability in innovation and technology are becoming increasingly important as the consequences or need for “safety” become increasingly important. This being established, we understand that, while there will always be issues of reliability, there is a need for a system to assume new functions, new technologies and new innovations while retaining optimal functionality over a long period of time. The question naturally arises: how, from the point of view of reliability, to survive and grow in this very aggressive universe, driven by constant market competitiveness, increasingly more important profit margins, development costs to be maintained, all the while continuously offering better innovations, etc.? The constraints that have just been described place this book in a context that is simultaneously both critical and strategic. 1.4. Reliability in the innovation process 1.4.1. The management of innovation, the criterion of success for the innovation-reliability parallelism In a sector where fault minimization is paramount, manufacturers must use specific methods and tools that allow the simultaneous fostering of innovation while securing complex and innovative projects. Innovation management allows for an organized implementation of these tools, including within the design process itself. In this way, it helps to minimize failures in the prototyping phase of the product design process upstream. Indeed, fault analysis methods, such as the FMECA method (Failure Mode, Effects and Criticality Analysis), are more frequently used in the preliminary conceptualization phases. Reliability issues are today considered throughout the product development process and no longer only in its testing phases.

8

Reliability of High-Power Mechatronic Systems 1

In order to take into account reliability within the innovation process, its main contributors are involved early on in what is called the co-construction process. 1.4.2. Participatory innovation between creatives and technicians Innovation management promotes and supports the emergence and maturation of innovations, in particular by stimulating creativity at all levels. It thus provides an environment conducive to multidisciplinary exchange and the sharing of knowledge. Collaboration is an approach favored by cutting edge industrialists. Different methods of participatory innovation allow for co-construction to permeate throughout a company, involving stakeholders in the ecosystem that are both closer to and further from the innovation process. The creative intelligence of the employees is honored no matter what the service, the area of expertise or the hierarchical level is. Strong proposals of new ideas for organizational or technical improvements are challenged according to the strategic themes of the company. These can take place in the form of creative challenges, idea boxes or seminars dedicated to innovative themes. Many of these initiatives favor a collective adherence to the business project and the diffusion of a culture of innovation linked to reliability, which is too often neglected in innovation. Moreover, these approaches allow for a global awareness on these subjects, which at first glance may appear to curb creativity. Beyond an internal involvement at all levels, a company is also able to draw on ideas, proposals for solutions and expertise outside its direct ecosystem via open innovation. 1.4.3. Supporting open-innovation through collaborative projects Collaborative projects are a form of open innovation in terms of the acquisition of knowledge or technology through the sharing and production of a common goal that benefits all the various stakeholders. Innovation in the world of research is increasingly financed by such projects and less and less by recurrent state funding. In France, in order to support this innovation process, public authorities ensure that a general empowering framework is maintained and to this end

Reliability and Innovation: Issues and Challenges

9

have appointed specific institutions to manage major research and innovation funding programs over the last fifteen years. The increasing role of local and regional authorities in innovation policies led the French government to reinforce this policy in 2012 based on the creation of incentives for collaborative innovation, including those driven by competitiveness clusters. Since 2005, the calls for all projects involving these clusters have supported 1,625 projects to the tune of R&D expenditure amounting in nearly 6.8 billion euros, of which 2.7 billion euros was publicly funded and of which more than 1.6 billion euros was given by the state, mainly through the financing mechanism of the fund: Fonds Unique Interministérielle (FUI). In this respect, the Haute-Normandie region has adopted a research strategy specializing in the reliability of embedded systems and electronic components for the period of 2014–2020. This strategic choice is not without cause, as the region is already broadly represented in the industrial sectors concerned with reliability, such as aeronautics and the automobile industry. With 40,000 jobs, these sectors account for 35% of industrial employment in the Haute-Normandie region, according to a press release from the NAE. While the region is obviously recognized nationally for the presence of these activities (the third largest region for aeronautics and sixth largest region for automobiles), it owes its European visibility to its excellence in two specific areas: energy efficient systems for propulsion and the reliability of systems and components in embedded systems. On-board electronics are increasingly taking up space in industrial systems and equipment. The reliability of electronics in embedded systems, particularly those common to the aeronautic and automotive sectors, is absolutely crucial, especially when it comes to ensuring the safety of passengers. To meet this challenge, industrialists and researchers from Haute-Normandie have grouped themselves around different research projects. 1.5. Conclusion The increasing importance of technological systems in the everyday life of the population puts reliability at the heart of safety considerations. The authors of this book therefore seek to share pragmatic elements for designers in order to further develop the reliability of complex systems such as “high power mechatronic systems”. Their ambition is to share with readers, whether specialists or not, clear processes and methodologies of reliability to best address the technical and economic issues that concern them professionally and personally.

10

Reliability of High-Power Mechatronic Systems 1

1.6. Bibliography [BAR 53] BARNETT H.G., Innovation: The Basis of Cultural Change, McGraw-Hill, 1953.

[BAR 80] BARREYRE P.Y., “Typologies des innovations”, Revue Française de Gestion, January/February 1980. [BAR 87] BARREYRE P.-Y., LENTREIN D., “La participation des services achat à l'innovation dans les grandes entreprises industrielles : approche organisationnelle et problématique managériale”, Futur et Gestion de l’Entreprise, IAE de Poitiers, 1987. [BER 10] BERKUN S., The Myths of Innovation, 2nd ed., O’Reilly, Farnham, 2010. [DRU 07] DRUCKER P.F., People and Performance: the Best of Peter Drucker on Management, Harvard Business School Press, 2007. [ECO 02] ECONOLOGIE, http://www.econologie.com, United Nations, Civil Aviation, 2002. [HAM 13] EL HAMI A., RADI B., Incertitudes, optimisation et fiabilité des structures, Hermes Sciences-Lavoisier, Paris, 2013. [KAP 10] KAPLAN J.M.,. WARREN A.C., Patterns of Entrepreneurship Management, 4th ed., Wiley, 2010. [KOR 09] KORIAJNOVA E., Aide au management de l’activité d’innovation par l’approche des réseaux de problèmes. Application au problème d’intégration des services Marketing et R&D, PhD thesis, University of Strasbourg, 2009. [POR 99] PORTER M.E., STERN S., The Innovation Index – New Challenges to America’s Prosperity, US Council on Competitiveness, Washington, 1999. [ROG 03] ROGERS, E.M., Diffusion of Innovations, 5th ed., Free Press, New York. [SCH 39] SCHUMPETER J.A., Business Cycles: A Theoretical, Historical, and Statistical Analysis of the Capitalist Process, McGraw-Hill, New York, 1939. [URA 88] URABE K., CHILD J., KAGONO T., Innovation and Management International Comparisons, De Gruyter, Berlin, 1988. [VIL 88] VILLEMEUR A., Sûreté de fonctionnement des systèmes industriels: fiabilité, facteurs humains, informatisation, Eyrolles, Paris, 1988.

2 Reliability in the Automotive World

“Reliability is, after all, engineering in its most practical form.” James R. Schlesinger (1929–2014), US Secretary of State for Defense. Reliability and the automotive world are intimately linked by their histories, their current challenges, and their futures. This section will highlight these three aspects, taking into account the global context of this ever-changing industry. There are very many publications on reliability with each one casting a different light on the theme. Nonetheless, the automotive industry is in a permanent search for innovation, production, mobility, autonomy and speed. In 1900, the United States had only 8,000 cars and a few hundred kilometers of road. Currently, there are more than one and a half billion motor vehicles and millions of kilometers of good roads worldwide. This economy uses and develops techniques and methods, increasingly important to the assurance of vehicle owners, in terms of total safety and comfort.

2.1. Introduction: a history of reliability in the automotive world The phenomenon of globalization and the intense need for innovation in our daily lives suits the automotive world well. In order to fully understand the issues and challenges facing this atypical industrial sector, it is necessary to review the evolution of the very concepts of the automobile and the notion of reliability over time.

Chapter written by David DELAUX.

12

Reliability of High-Power Mechatronic Systems 1

Figure 2.1. The first car – the 1886 Benz [WIK 17]

The automobile has quickly become a reliable means for the transportation of people and goods. Figure 2.1 shows us the car at its birth more than 150 years ago1. The technology of the time was highly mechanized and applied to a mission profile, in other words, a use that was primarily oriented towards assisting urban transport. As early as 1907, there were already nearly 250,000 motorists, and by 1914 there were approximately 500,000 following on the advent of the famous Model T Ford, and more than 50 million cars before the Second World War. The car of today has evolved and seeks to inspire in emotion its driver and passengers, to provide multiple information functions, transport, connectivity, discovery and many other services. In spite of its explosive evolution since the beginning of the 19th Century, the fact that the fundamental way of driving has not changed is quite remarkable. An extract from the 1930 Commercial Vehicle Batteries Contract [BOG 30] states that a “normal” driver in the United States (this expression is representative of 50% of the drivers) did not travel in excess of 1,000 miles per month, that is 19,305 km per year. Today, almost 90 years later, it is clear that a “normal” driver in the USA drives about 22,673 km per year [NHT 06]. 1 The first car with a two-stroke internal combustion “with expanded air” was created by Etienne Lenoir in 1860.

Reliability in the Automotive World

13

Reliability was quickly recognized as an intrinsic factor of the automobile. It was quickly necessary to develop methods and tests (physical or virtual) to assess the reliability performances of vehicles. The methods of reliability testing, which estimate the probability of a system surviving under certain conditions, have in particular drawn upon the statistical theories of the 18th, 19th and 20th centuries. There are probabilistic approaches according to the stochastic diffusion [PRA 65] (the Normal Law) in 1777 by Pierre-Simon Laplace and in 1809 by Carl Friedrich Gauss, the approach of an exponential distribution in explicit forms, among others, by Leonhard Euler in 1748 in his book Introductio in analysin infinitorum or the proportional effect extolled by R. Gibrat in 1930 [GIB 30], [AIT 57] (Log–Normal Law). The probabilistic formulations of the estimate reappear in the movement of Karl Peasron’s school with Ronald Aymler Fischer (1890–1962) and William Sealy Gosset – alias Student – (1876–1937), who were all confronted with statistical studies based on insufficient data sample size. Ronald Fischer is generally presented as the father of Estimation Theory according to the principle of “maximum likelihood” (introduced in 1912 and developed until 1922–1925) [DOD 93]. The basic statistical functions for developing the reliability of a system will be discussed in greater detail in Chapter 5. Much later in 1939 and then again in 1951, Professor and engineer Waloddi Weibull published [WEI 51] a statistical formulation that would surely become the most popular probabilistic distribution estimation for the processing of survival data, namely the Weibull Distribution. It is not surprising to note that this statistical tool is still widely used in the automotive world, given the fact that Professor Weibull worked for major manufacturers like SAAB. It was noted that the purpose of Professor Weibull’s work was not to develop a theoretical statistical model or basic research. Rather, he sought to solve real physical problems, such as problems related to the resistance of materials, fatigue, and breaking points [LAN 08]. Here we find the very nature of reliability, namely a multiple overlap of various sciences such as chemistry, physics, electrics, electronics, fatigue, thermo-mechanics, electromagnetism, pyrotechnics, etc.

14

Reliability of High-Power Mechatronic Systems 1

In the automotive world the evolution of methods and statistical tools dedicated to reliability, as in other industries, was at the same time followed in parallel by an evolution of test methods to demonstrate the performance of the experimental reliability of the vehicle. Figure 2.2 illustrates the various and extreme applications of the first Model T Ford in 1915 and 1926. Today, this type of application would be described as a test of robustness.

Figure 2.2. Model T Ford in a forest in 1915 and on the snow in 1926 [WIK 15]

Reliability in the Automotive World

15

The first car makers, especially the Americans, were ingenious in the way that they rapidly introduced track testing programs (speed oval), deep water test, and runway sections that are progressively degraded to represent different environments. The validation of reliability testing happened quicker and becomes more complex thanks to increasingly sophisticated processes. However, the notion of safety and reliability came to unavoidable crossroads once it became apparent that there were a considerable number of fatalities per car accident in the late 1930s [ERR 03]. Crash test dummies, or anthropomorphic test devices, started appearing as of 1968 [MYR 05] with the creation of the first mannequin by Samuel W. Alderson. As of 1971, General Motors and Ford would use these in their studies of reliability in terms of shock. This brings us to the conclusion that reliability engineers at the beginning of the car adventure were true “heroes” of ingenuity, as they themselves tried safety systems like seatbelts, improving the overall comfort and safety of automobiles. 2.2. The challenges of automotive reliability: complexity of systems and organizations Even though the first electric cars existed as early as 1834; they were only to experience a real boom as of the 2000s, thanks to a formidable revolution in energy storage technologies accompanied by a better performance of electric propulsion systems (as well as, it must also be noted, complemented by a change in mindset of the motorists themselves). The Mercedes concept car shown in Figure 2.3, known by its codename F015, unveiled in 2015 in Los Angeles, is a fine example of ultra “electronized” and computerized high technology. This concept would allow for an all-electric propulsion system with a range of 1,000 km and driven autonomously. Even though electric vehicles represent only 0.2% of today’s market share (see Figure 2.6), the ambition behind such a concept car (practically imagined by all car manufacturers in the world) is far from a utopian dream. Many actors imagine the installation of electric and autonomous vehicles by 2020– 2025. The McKinsey study [ADV 16] advances the figure, estimating that nearly 15% of vehicles in the world will be vehicles that make use of autonomous vehicle technology.

16 6

Reliability off High-Power Mechatronic Systtems 1

Figurre 2.3. F015 Mercedes M Conc cept car of 2015 [WIK 16]

Let us not deceive d ourselvves; this soph histication is neither n exclusiive to the cooncept car noor only for luuxury vehiclees. The vehiccles currentlyy offered (2014–2016) offfer a truly im mpressive system of compplexity. For eexample, 0 Electronic Control C Units (ECUs), moost current moodels have noo less than 100 eaach with a com mputing poweer of more than n 20 computers. The ECUss manage the entire vehiccle, such as itts comfort (e.g g. air conditiooning, lightingg) to the onsumption, pollution) p as well as maanagement off the engine (e.g. fuel co ceertain safety feeatures (e.g. ESP, E airbags). The numberr of ECUs in modern m cars iss equivalent too the number found in the first airbus! The implicatiion of high saafety in the development off vehicles ndards for relliability, suchh as ISO haas given rise to new interrnational stan 266262, Automoobile Safety brrought about by b advancemeents made in tthe years leaading up to 20008. Another inddicator illustratting the sophiistication of thhe automotive world is the growing im mportance of software. Fo or example, current c vehicles have mmand all thhe sensors andd process abbout 100 milliion lines of coode which com information. Thhis number iss four times higher h than thhe last Ameriican F35 a y 24 million liine of code) .... and to coombat aircraft (which had approximately bee clear, we arre always talkking about thee cars that wee use in our eeveryday livves.

Reliability in the Automotive World

17

The implantation of numerous sensors (camera, radar, temperature, pressure, speed, force, deformation, gyroscope, accelerometers, etc.) in mechanical systems inspired a new wave of reliability science, from the year 2000, called mechatronics (a combination of both mechanical and electronic components). This science tries to bring together two “religions” of reliability science, which have historically been very much opposed to one another: the “religion” of electronics and the “religion” of mechanics. In the automotive world, this intensive multiplication of systems (connectivity, mobility, comfort, safety, etc.) at the service of motorists, forces the Science of Reliability to immerse itself in “Big Data Analysis”, that is to become “Data Science” (science that handles very large numbers of data). According to a study by Toyota [TOY 04], it is estimated that there are more than 30,000 components in a standard vehicle these days. The improbability of using Reliability Block Diagram (RBD) or Fault Tree Analysis (FTA) methodology in the face of such a complex system complexity is understandable. All components of the vehicle are likely to become defective at some point in the life of the vehicle, be it electrical, electronic, mechanical or mechatronic. In 2013, the German association ADAC reported 2.6 million failures through such an analysis, spanning over 93 vehicle models, with a sample of 500,000 cars studied (2005–2012). The breakdown of the 2.6 million failures is shown in Figure 2.4.

Figure 2.4. Distribution of failures

18

Reliability of High-Power Mechatronic Systems 1

It found that all the components are affected by failure and had a direct cost for the end customer. According to the study [AUT 13a], owners spend an average of 498 euros per year to replace defective parts. This cost represents the cost of the unreliability of the vehicle’s constituent systems. This financial concept will be developed further in section 2.3. The intensification of global automobile production adds to the steady increase in the complexity of motor vehicles, further entrenching reliability research. Figure 2.5 illustrates the evolution of automobile production worldwide, between 1900 and 2013. Automobile production since 1900

Figure 2.5. Evolution of the world production [AUT 13b]

The trend of the curve clearly has the appearance of an exponential law, starting from a few thousand cars produced annually in 1900 to more than 70 million in 2006. This evolution takes into account the production of cars, trucks and buses according to “Organisation Internationale de Constructeurs Automobile” or OICA (which translates as the International Automobile Manufacturers Organization). One will note the fall between 2006 and 2009 which is explained by the global crisis (essentially the banking crisis) that

Reliab bility in the Autom motive World

19

took plaace during thiss period. Accoording to the OICA, O the yeaar 2015 recordded a worldd production record of ovver 90 millio on cars. This equates to the creationn of three com mplete cars eveery second forr one year! It should be noted in passiing that half of world prodduction in 19 907 was in Frrance ([JAC 116] when it produced 25,000 units out of the 45,000 produced woorldwide). Figuure 2.6 showss the distributtion of autom mobile producction around the world. Europe E accounnts for only onne-fifth of thee world producction. Whilst tthe Asian platform p (Asiaa-Pacific and Eastern Euro ope) covers more m than halff of the worrld’s productioon (53.6%). This T is why th he “India-Chinna-Russia” bloock is often regarded as thhe lung of prooduction and automobile a groowth. From m the point of view of reliability, th his new mapp illustrates tthe dispersiion of industrrial organizattions and tech hnical expertiise. In the paast, reliabiliity centers (ii.e. test centeers, simulatio on centers, statistical team ms, reliabiliity centers, etcc.) were conceentrated in Fraance, Germanyy and the United States. However, H todday this is no longer l the casse in the autom motive industtry. It is thuus necessary too adapt to thiis change and the necessaryy deploymentt of reliabiliity skills and knowledge thhat these new geographic platforms p requuire throughh setting up new organizatioons and strateg gies for nurturring talent.

Figure 2.6. Worldwid de distribution of automobile e production in n 2013 [THE 1 14]

Thiss intensificatioon of producction leads to o a temporal compression of vehicle developmentt programs, in general estimated betw ween 6 and 24

20

Reliability of High-Power Mechatronic Systems 1

months. The importance of highly innovative technologies in an extremely short period of time created a highly constraining environment for reliability engineers. Aggravated testing techniques such as HALT are increasingly used to rapidly precipitate the latent defects inherent to these new technologies. In addition, predictive reliability methodologies, such as FIDES, are used to accurately predict the reliability performance of the system. It must be recognized that the strong annual output of vehicles is also imagined for the global vehicle delivery platform. In other words, the same car model can be used in China or Europe or the United States. This directly impacts the famous “mission profiles” which become more and more complex. Increasingly shorter development times, complex environments (in the mechanical, climatic or other sense), constraining costs and many other factors, have led to the emergence of new methodologies such as “trial customization”. This incredible evolution of automobile reliability is also linked to powerful economic issues. 2.3. The economic stakes in automobile reliability The economic stakes of the automotive industry as seen through the prism of reliability are essentially focused on the costs of development and warranty. Here we focus on the dimension of the warranty cost. Let us first get acquainted with the notion of warranty. As stated in the European directive 1999/44/EC of 25 May 1999, the warranty is a legal protection for the consumer (the owner of the vehicle) against all malfunction for a limited period of time. In the automotive industry, this period is often expressed in years and kilometers. It is surprising to discover that one of the first traces of a “warranty” can be found in a Babylonian Codex shown in Figure 2.7, the Hammurabi Codex, that dates back to the 18th Century BC. It contains a guarantee addressed to a real estate owner for the repair of any defective item (such as a wall) of the house, that was built by the builder.

Reliability in the Automotive World

21

Amongst the guarantees contained in this codex, the guarantee even went so far as to license the death of the manufacturer in the event the occupants of the house died as a result of its collapse! This notion of guarantee is clearly in the genes of mankind, and is especially present in the automobile industry.

Figure 2.7. Hammurabi Codex [COD 04]

The warranty is part of the vehicle’s reliability. Indeed, we often talk about reliability to cover the first 15 years of the vehicle (or 300,000 km) associated with a tolerance of defective parts (for example 10% failure at 15 years or 90% reliability). A three-year warranty for example on a car corresponds to the first three years of vehicle life when the vehicle is reliable. The difference being that the guarantee is a financial commitment, while reliability is a “moral” commitment on the part of the designer. If a part is defective before the end of the warranty, then the part will be changed according to the terms of the contract. However, if the same part fails beyond

22 2

Reliability off High-Power Mechatronic Systtems 1

thiis period, the replacement will be at thee expense of the t user of thee vehicle (exxception willl be made for fo anybody said to be safe accordingg to the judgment of thee automobile manufacturer) m ). Of course, the design of o a product or system iss based on reeliability coommitments (ii.e. 15 years or 300,000 km m) and not onnly over the w warranty peeriod. The study of prooducts failing g during thee guarantee provides intteresting dataa on reliabilitty performancce even thouggh one must keep in miind that this iss but a restrictted vision of only o a few years (often onee to three yeears) in the fieeld. At the samee time, this nootion is financcially importannt because it commits coonsiderable staakes for the manufacturers m and a equipmennt suppliers. BearingPoinnt [BEA 07] estimated e thatt the cost to guarantee g paid by the maanufacturers in i the world iss between $45 5–50 billion. The US moonitoring agenncy (NHTSA)) carries out surveillance oon alerts cooncerning quaality and reliabbility, shown in Figure 2.88. First of all, one can noote a certain consistency in i vehicle reccalls betweenn 1966 and 1995, the ranndom appearaance of defeccts on vehiclees (with the exception e of tthe 1981 Foord crisis). Hoowever, there is some break k with this treend that can bee seen in the late 1990s. It is well knoown that this disruption is directly relateed to the nents. exxtensive integrration of electtronic compon

Figure 2.8. Evolution E of au utomotive reca alls in the Unite ed States [EDW 10]

Reliability in the Automotive World

23

What reliability information can we derive from the warranty data? This is what we will discuss in the next paragraph. 2.4. An analysis of reliability through the analysis of warranty Let us remember that the study of the warranty data only provides us with a vision of reliability that is focused on the first portion of a vehicle’s life. Nevertheless, it is profitable to pay close attention to these as they are drawn from real life. Of course, all the data or information obtained through warranty claims is not directly related to the reliability of the product or system. For example, [WU 11] has shown that more than 10% of defective parts have either been misused (e.g. karcher cleaning of the engine bonnet components) or other various human factors are involved. The Japanese Ministry of Transport carried out an investigation into the territory’s recalls and tried to identify the share of original failures due to reliability. After analyzing 1,211 recalls (corresponding to several thousand vehicle failures) over the period of 2006–2010, 54% of the defects were found to be due to the reliability of the product. One of course understands the importance of analyzing the information occurring during the guarantee period both quantitatively and qualitatively. Industrialists, manufacturers and equipment producers have set up services to analyze and monitor the development of failures during the warranty period as well as the physical analysis of the parts. This information allows them to provide crucial feedback to the development of products upstream and to rectify if necessary the standards of validating reliability tests. 2.5. Conclusion: the future of reliability in the automotive world We have seen the evolution and the explosive progress of motor vehicles with increasing complexity and sophistication. As Jean-Luc Brossard, R&D Director of the French Automotive Platform (PFA), reminds us, this complexity involves “no fewer than 25 sensors to enable driving aids, that includes multifunction cameras, radar, lidar, ultrasonic sensors, antennas, and other future systems that will come about with the arrival of 5G” [ESS 16]. Experts in the field of reliability have had to develop, enrich and innovate new methods in order to anticipate and appreciate the risks

24

Reliability of High-Power Mechatronic Systems 1

associated with the vehicle. The future of the automobile is clearly going to be built on the three pillars: connectivity, mobility and autonomy. The near future will of course push these boundaries further, as it must! 2.6. Bibliography [ADV 16] ADVANCED INDUSTRIES, “Automotive revolution – perspective towards 2030”, McKinsey & Company, 2016. [AIT 57] AITCHISON J., BROWN J.A.C., The Lognormal Distribution, Cambridge University Press, Cambridge, 1957. [AUT 13a] AUTOS INFOS, Estimated average cost for vehicle repair, source available at: Estimation du coût moyen de réparation d’un véhicule, available at: http://autoinfos.fr/Vehicules-connectes-une,7192, 2013. [AUT 13b] AUTO FOREVER, Production mondiale automobile depuis 1900 (automobile production since 1900), available at: http://www.auto-forever.com/productionautomobile-depuis-1900/, 2013. [BEA 07] BEARINGPOINT, Global Automotive Warranty Survey Report, Report, 2007. [BOG 30] BOGART G.G., FINK E.E., “Business practice regarding warranties in the sale of goods”, Illinois Law Review, vol. 25, pp. 400–417, 1930. [BRA 16] BRASSEUL J., Petite Histoire des faits économiques – De nos origines à nos jours, Armand Colin, 2016. [COD 04] CODE OF HAMMURABI, available at: https://commons.wikimedia.org/ wiki/File:CodexOfHammurabi.jpg?uselang=fr, 2004. [DOD 93] DODGE Y., Statistique – Dictionnaire Encyclopédique, Dunod, Paris, 1993. [EDW 10] NIEDERMEYER E., “What’s wrong with this picture: total recall edition”, Recall on the rise, available at: http://www.thetruth aboutcars.com/2010/07/whatswrong-with-this-picture-total-recall-edition/, 2010. [ERR 03] ERROL M., The fog of the war: Eleven lessons from the life of Robert S. McNamara, American documentary, 2003. [ESS 16] Essais & Simulations, no. 126, p. 46, Oct-Nov. 2016. [GIB 30] GIBRAT R., “Une loi des répartitions économiques : l’effet proportionnel”, Bull. Stat. Gen. Fr., vol. 19, pp. 469–513, 1930. [LAN 08] LANNOY A., Maîtrise des risques et sûreté de fonctionnement, Repères historiques et méthodologiques, Tec & Doc, Paris, 2008. [MYR 05] MYRNA O., “Samuel Alderson, 90: Inventor of Dummies used to test Car Safety”, Los Angeles Times, 17 February 2005.

Reliability in the Automotive World

25

[NHT 06] NHTSA, Vehicle survivability and travel mileage schedules, Technical Report, DOT HS 809-952, 2006. [PRA 65] PRABHU N.U., Stochastic Processes, Macmillan, London, 1965. [THE 14] THEMAVISION VEHICULES & MOBILITES, Pwv Autofacts 2014, available at: http://www.them avision.fr/jcms/rw_438175/infographie-le-marche-automobilemondial-en-2020-quelle-repartition-de-la-production, 2014. [TOY 04] TOYOTA, estimate of the number of components in a vehicle, available at: http://www.toyota.co.jp/en/kids/faq/d/01/04/, 2004. [WEI 51] WEIBULL W., “A statistical distribution function of wide applicability”, J. Appl. Mech. Trans. ASME, vol. 18, no. 3, pp. 293–297, 1951. at: https://commons.wikimedia.org/wiki/File: [WIK 15] WIKI COMMONS, available Unidentified_rural_letter_carrier_with_modified_Model-T_Ford.jpg, https://commons. wikimedia.org/wiki /File:Model_T_on_a_New_Road_(8113428270).jpg, 2015. [WIK 16] WIKI COMMONS, Motor show Geneva, available at: https://commons. wikimedia.org/wiki/File:2016-03-1_Geneva_Motor_Show_0999.JPG?uselang=fr, 2016. [WIK 17] WIKI COMMONS, https://commons.wikimedia.org/wiki/File:790751_14498 29_800_ 530_benz_patent_motorwagen_1886.jpg, 2017. [WU 11] WU S., “Warranty claim analysis considering human factors”, Reliability Engineering and System Safety, vol. 96, pp. 131–138, 2011.

3 Reliability in the World of Aeronautics

Improved aircraft reliability has enabled the development of the civil aeronautics industry. These gains were made possible through several factors: the use of experience feedback, the implementation of new technologies to limit the impact by human factors and the improvement of development and maintenance processes. The demand for ever safer products motivates manufacturers to develop new concepts and methods to increase the reliability of systems.

3.1. Introduction The development of aviation is quite simply man’s desire to fly, the dream of Icarus to soar higher and higher. However, as soon as aviation became possible, the military quickly realized that aside from this romanticism the feat had other implications and as such needed to occupy a central position in their theater of operations. Aircraft could be used for reconnaissance missions to locate enemy positions, even more effective because of their difficulty to target, which meant that many avenues became possible. Indeed, the weaponization of aircraft has allowed us to inflict incredible damage and promote chaos deep in enemy lines. The aircraft quickly became a strategic and decisive weapon. The two world wars of the 20th Century contributed greatly to the development of aircraft technologies, particularly in terms of engines (the rotary engine and the turbojet) and the use of lighter materials (aluminum and nickel-based alloys). The aircraft became more reliable, faster, and more Chapter written by Tony LHOMMEAU, Régis MEURET and Agnès MATHEVET.

28

Reliability of High-Power Mechatronic Systems 1

capable of carrying large payloads, often covering a range of several thousand kilometers, thus paving the way for civilian transport. In the late 1950s, with the onset of globalization, passenger transport in a safe and fast manner over long distances was a major development. Aviation very quickly found its place in the market which gave rise to the civil aeronautical industry. Initially reserved for the wealthy classes, air transport is gradually opening up to the mass market thanks to an increase in the size of aircraft. This is the era of the jet. The Boeing 747, nicknamed the Jumbo Jet, symbolizes this period, transporting on average 550 passengers ever since the late 1960s.

Figure 3.1. Evolution of aircraft since 1925 as a function of their maximum take-off weight (MTOW)

Economics is a major topic for airlines and impacts the development of new aircraft. Lighter materials are introduced as well as more efficient engines. For example, the dual-flow engine reduced fuel consumption from 4.5 liters per passenger per 100 km on the 1970’s B747, to less than 3 liters on the A380.

Reliability y in the World of o Aeronautics

29

How wever, this inccrease in the airline a market would not havve been possibble without the improveements in thee way passen ngers felt aboout comfort aand a safety. Effforts to improove safety haave been just as important as above all reducingg fuel consum mption. Between 1960 and d 2015, the accident a rate pper million departures deecreased by a factor of 100 0. This gain was w the resultt of technicaal improvemeents such as steering assisstance and the prohibition of pilots maneuvering m o outside the fligght envelope but b also due to t the robustness of devvelopment prrocesses, thee emergence of digital tools to help dimensiioning parts, the taking intto account off feedback froom entire fleeets, and in particular p the improvement i of more expliicit processes.

Fig gure 3.2. Rate e of accidents s per million de epartures from m 1959 to 2015 5 [BOE 15]

The actors in the aeronautics seector agree that the increasee in air transpport c withhout constantlyy improving on o how passeengers feel aboout cannot continue safety. The developmennt of aircraft and their systems s is gooverned by the regulations of the ICA AO (International Civil Av viation Organiization) and thhen by local authorities, such as the FAA F for the United Statess and EASA for h this reggulation definees the certificcation processs of Europe. On the one hand, the aircrraft and its syystems, but allso the set of requirementss to be respectted and the means of dem monstration.

30

Reliability of High-Power Mechatronic Systems 1

To these requirements should be added all the commercial requirements which may arise, for example, from the air force in terms of the standards required of a military aircraft or from the point of view of an aircraft manufacturer with regards to a specific engine. We can therefore distinguish two parties: – in response to the regulatory requirements for the certification of the product vis-à-vis the EASA and/or the FAA; – in response to the business requirements or technical specifications related to constraints specified by the customer. In this case, one must next speak of the qualification validating the declaration that a particular product complies with the technical specifications as defined by the customer. Certification imposes a strict development process to ensure the safety of passengers, crew but also that of the maintenance personnel and people on the ground. This process involves monitoring all the specifications and their variations, the traceability of the configuration management and all the documentation. Documentation which defines the performance status of the product originates from the design phase and continues with serial production. The regulatory requirements for certification are listed in the “Certification Specifications” (CS) available on the EASA website. In addition to these documents, international working groups agree on the development of guidelines for the implementation of regulations. These guidelines include: – ARP 4754A for the development of civilian aircraft and associated systems [SOC 11]; – ARP 4761 describing the methods for the evaluation of safety [SOC 15]; – DO254 [RAD 00] and DO178 [RAD 12] for the certification of electronic hardware and software. The evolution of technologies requires the updating of these guidelines in order to define a relevant demonstration framework as provided by the design office, in order to meet safety requirements.

Reliability in the World of Aeronautics

31

Taking into account certification requirements has an influence on the technological choices beyond the definition of the system architectures. These choices are motivated by the idea that risk taking is not an option. It explains the conservatism of architectures and technological choices used for the physical realization of products. It is the reason why the aeronautical industry favors an evolution of existing solutions that have benefited from feedback rather than technological revolutions that have not attained the necessary validation only gained through multiple stages of maturity. The product development phase begins with an analysis of the functions performed by the product. At the end of this analysis, each function is categorized into the classes of severity according to the impact of the function failure on the complete system (usually the aircraft or the engine). Depending on the severity class obtained, a maximum failure occurrence rate is allocated. For a function failure leading to a catastrophic event such as the loss of the plane and all its passengers, there will be a maximum failure rate per every 109 hours of flight. A decline in these rates will result from improvements to the equipment used to carry out the function in question. At the same time, a Design Assurance Level (DAL) will be allocated to denote the level of development insurance for each piece of equipment and/or function related to the identified severity. This study classifies equipment into five categories known as DAL A, B, C, D and E. In order to verify the performance of the requirements specified above, a “Failure Modes and Criticality Analysis” (FMECA) may be conducted. This allows tracing each system and/or equipment failure and associating a failure rate. The estimation of the failure rate of the parts is carried out thanks to the feedback experience of individual parts, available in use or in guides such as the MIL HDBK 217F, RDF2000, and FIDES for electronic components or the RADC TR 85-194 for mechanical parts. These handbooks define the failure rate for a unit component in order to estimate the probability of a subsystem encountering a previously identified case. If the rate is higher than the requirement several solutions are possible: – change the technological solution (use solutions that are more mature, but which generally perform lower); – increase the reliability of the technological elements by an extended qualification and a better control of the development process; – implement a redundant architecture that works as an alternative in the event of a part failure.

32

Reliability of High-Power Mechatronic Systems 1

Each solution has its own set of constraints and benefits, which have been studied at length by engineers and after which various arrangements of performance, cost and mass are compared. DAL Level

Failure Rate

Severity of the high level failure

A

10-9 /hour

Catastrophic – Failure may cause a crash. Error or loss of critical function required to safely fly and land aircraft.

10-7 /hour

Hazardous – Failure has a large negative impact on safety or performance, or reduces the ability of the crew to operate the aircraft due to physical distress or a higher workload, or causes serious or fatal injuries among the passengers.

10-5 /hour

Major – Failure is significant, but has a lesser impact than a Hazardous failure (for example, leads to passenger discomfort rather than injuries) or significantly increases crew workload (safety related)

B

C

D

10 /hour

Minor – Failure is noticeable, but has a lesser impact than a Major failure (for example, causing passenger inconvenience or a routine flight plan change)

E



No effect – Failure has no impact on safety, aircraft operation, or crew workload.

-3

Table 3.1. Definitions and criteria associated with DAL levels

3.2. Safety and reliability The classical approach for the dimensioning of parts includes a margin (normalized or not) in case of maximum loading. Today, design firms are moving towards constraint-resistance approaches that require a probabilistic estimation of constraints and the dimensioning of the system's resistance given to a specific degree of confidence. This approach establishes the dimensioning level on a statistical criterion, rather than a fixed margin. It also allows, from the assumptions of resistance dispersion, to fix the level of quality and the controls authorizing the parts in production.

Reliability y in the World of o Aeronautics

33

Lookk at Figure 3.33:

Determ ministic Design Approach A Rmin : minimum m resistaance k: safetty factor S0 : Noominal constrainnt

Probaabilistic Design Approach verage resistancce R: av S: meean stress

proach and Fiigure 3.3. Detterministic app consstraints-resista ance of a desiign [WCO12]

provvides the criteeria Besiides masteringg the design of o parts, this approach a n the principle that the piecee is for mannufacturing annd dispersion. It is based on not onlyy defined by itts plan but alsso through thee mastery of thhe entire process of devellopment sincee this contributes to its physsical characterristics. Thiss principle is applied to suuperstructuress, such as briidges, under the wn as the Eurrocodes. A sem miEuropeaan design andd calculation standards know ding points acccording to thheir probabilistic approacch consists of defining load g period. For climatic loads this peeriod is equall to occurrennces over a given xceeding 2% % per year, for 50 yearrs, corresponnding to a probability p ex earthquaakes it is 4755 years (10% over 50 yearrs), and 1,0000 years for rooad traffic (10% in 100 yeears). Expeerimental feeedback in thee aeronautics sector is a practice thatt is widely exploited, noot only for reasons r of contractual waarranty, but for ugh the follow w-up proceduures p throu reasons of safety. It is put into practice s in serrvice. This is at the behestt of made onn a fleet of products and systems d to safety, annd if necessaary, authoritties who needd to follow events e related make im mprovements on the entiire product line (such as the recalls by automottive manufactturers). Thiss provides feeedback to the t manufactturer on thee experience of o technologiical individuual parts. Thhe level of maturity m and reliability of ure developmeents to optim mize solutionns is thereforee known, alloowing for futu

34 4

Reliability off High-Power Mechatronic Systtems 1

the design of the system’ss architecturees. This activvity is an im mportant coompetitive vecctor for any coompany. 3.3. Maintaina ability/availa ability Today, operrators seek noot only the op ptimization off system perfoormance, o improve the performancee of their itss mass and itss consumptionn, but also to invvestment by mastering m the overall operatting costs of thhe product. The overall cost of the prroduct includees the cost of acquisition, a thhe cost of the operations necessary n for its maintenan nce, its non-aavailability, ass well as o In aeronautics, a ooperating the expenditurees necessary to keep it in operation. he cost of kerrosene for trannsporting exxpenses includde, but are nott limited to, th the part and thhe energy it collects from m the aircraft in order to function prroperly. The maintennance of an aiircraft costs 8– –10% of the total cost for ooperating the aircraft. Optimizing O m maintenance operations o caan potentiallyy impact avvailability, as the aircraft caan only fly iff the equipment is at the obbligatory levvel. Operatorss therefore waant to reduce maintenance m c costs while prreserving airrcraft availabiility. The proggnosis of failu ure is a lever to t respond to tthis need byy informing the t operators on the statee of health off the systemss and to identify the affeected part befoore it fails.

Figure e 3.4. Typical cost of operatting an aircrafft. For a color version of o the figure, se ee www.iste.c co.uk/elhami/m mechatronic1.zzip

Reliability in the World of Aeronautics

35

The anticipation of breakdowns makes it possible to carry out maintenance interventions just before they are needed, not in a preventive but rather a systematic way, which can lead to extra cost linked to either overmaintenance or curative maintenance, the latter resulting in an extra cost linked to the potential immobilization of the aircraft, which can sometimes lead to delay fines and/or cancellation penalties payable to irate passengers (currently this costs €600/passenger in Europe excluding accommodation expenses). 3.4. Tomorrow’s challenges One of the issues of the surrounding future aeronautics is how to respond to the increasing need for mobility. The demands on aircraft are being pushed even further by new players in the market, the “low-cost” airlines. These operators have opened up air travel to a greater number of passengers whilst increasing the performance of the airplane, and reducing the waiting time at boarding gates resulting in greater daily rotations. Their economic success is based on an offer of tickets at more competitive prices, sometimes to the detriment of the level of comfort offered by more traditional companies, but nonetheless maintaining an equivalent level of safety. The specifications of these new players require a greater availability of aircraft, 350 days a year, with rotations of up to 10 flights per day for a total lifetime of 25 years. This demand for economic profitability requires aircraft to have greater reliability. The choice of architectures, systems and technologies used in the design of these systems is critical because the reliability of an aircraft (in the sense of operational readiness) is a selling point scrutinized by the airlines. Projections show a doubling of the fleet in service over the next 25 years. As such, this trend requires that reliability is improved upon by a factor of two products in order to keep the same number of accidents and thus maintain the feeling of safety by users. Today this rate is 0.15 accidents for one million departures. The goal is to divide it by two in order to achieve 0.07. As aeronautics has shown in the past, technological breakdowns during service have increased reliability, which in turn has made it possible to respond to these challenges.

36 6

Reliability off High-Power Mechatronic Systtems 1

Figure 3.5. Global G aircraftt fleet and fore ecasts over the next n 20 years [CMO14]. For a color versiion of the figurre, mi/mechatronicc1.zip see www.istte.co.uk/elham

As part of thhis book, there are two stro ong trends in the t technical eevolution off aircraft. The first concernss the increasin ng diffusion of composite m materials, suuch as those deeveloped by SAFRAN S and d embedded inn the family off LEAP® geeneration engiines. These soo-called 3D co omposite ruptture technologgies used onn the fan bladdes allow for a mass gain of 450 kg perr aircraft. Thee second cooncerns the concept of a more elecctric airplane and that hhas been im mplemented inn the very latest developmen nts in aircraft, such as the A A380, or moore particularly the B787. The T objectivee here is to repplace three cenntralized poower distributtion networks (electric, pneumatic, p hyydraulic) by a single neetwork, whichh would be eleectric and with h localized seccondary networks that coould be hydrauulic, allowingg for an overaall simplificatiion of the sysstem and therefore a new w possibility off improving itts reliability. Aircraft thatt are more coomposite and more electricc are confirmeed trends beecause they noot only meet the t need to reeduce aircraft consumption but also brring new advanntages and funnctionalities such s as: – decreased maintenance time through h Line Replaceement Units ((LRU) of lesss than 15 minnutes and whiich can be filleed using electrrical equipmeent; – gain in groound at the syystem level;

Reliability y in the World of o Aeronautics

37

– deecrease (potentially elimiination) of hydraulic h oill leaks, fluiids, generallly harmful poollutants thannks to the Eleectrical Mechhanical Actuaator (EMA);; – loccation and annticipation off breakdownss thanks to more m intelligent equipmeent that self-ddiagnoses itself and which can c operate in defective modde.

Fig gure 3.6. Capa acity of electrical generation n (KVA, Kilo Volt V Ampere)

The second challeenge facing thhe aeronauticss industry is thhe change in the contracttual use of prroducts. Indeeed, airlines wiishing to havee greater conttrol over thhe overall coost of their aircraft sign n contracts with w equipm ment manufaccturers and soo-called “per flight hour” contracts. c Theese contracts are no longger traditional, a purchase of equipmentt and a maintenance contraact, but a contract that aggregates a the whole. Thee airline payss the equipment i up to the OEM to take ovver manufaccturer for eachh hour of opeeration and it is the mainntenance and bear the costss, in addition to ensuring thhe availabilityy of the equiipment as negotiated. Thiss mode of operation requirees that the equ uipment manuffacturer not onnly takes particular p carre of its equuipment but also balances the cost of manufaccture, the quaality and choicce of technolo ogies, the cost demonstratioon, maintainnability and durability d of materials m to en nsure the proffitability of thheir contractt. Withh this need foor control, sollutions for mo onitoring andd prognosticatiing the statee of health of the t equipmennt have become a necessity in i order to allow

38

Reliability of High-Power Mechatronic Systems 1

for the individualized monitoring of the equipment operated by the customers, through new features such as: – the monitoring of the state of health of the parts; – the location of the fault; – the anticipation of degradation and failures in order to better plan the maintenance operations and prepare the spare parts in the workshop, thereby allowing for a better allocation of the “spare” stock (replacement equipment available); – the monitoring of critical features. The equipment manufacturer or the system provider do not operate their own product. As such, by monitoring how their equipment or system is used also allows them to check that the product is being used normally. The followup of strong and weak signals may allow them to perform the maintenance operations when they are just about needed, while still maximizing availability. 3.5. Conclusion The aviation industry’s desire to improve the reliability of aircraft has been rewarded by a significant reduction in the number of accidents by a ratio of 100 in 50 years. This effort has resulted in the airplane becoming the safest means of transport, a vital prerequisite for mass transport. Consequently, air traffic has increased steadily over the last 50 years which has meant a corresponding steady increase in the global aircraft fleet. Over the next 20 years, it is estimated that the number of aircraft will double. This estimated doubling has encouraged manufacturers to increase their efforts to improve on the reliability of aircraft in order to maintain the same perceptions or risk. These investments are oriented towards the improvement of current systems, such as the replacement of hydraulic with electrical actuators, but also to new principles such as the prediction of breakdowns, which allow for experts to anticipate the maintenance of the products. 3.6. Bibliography [BOE 15] BOEING COMMERCIAL AIRPLANES, Statistical Summary of Commercial Jet Airplane Accidents Worldwide Operations: 1959–2015, Report, 2016.

Reliability in the World of Aeronautics

39

[CMO 14] BOEING COMMERCIAL AIRPLANES, Current Market Outlook: 2014–2033, Report, 2014. [RAD 00] RADIO TECHNICAL COMMISSION FOR AERONAUTICS, Design Assurance Guidance for Airborne Electronic Hardware, 2000. [RAD 12] RADIO TECHNICAL COMMISSION FOR AERONAUTICS, Software Considerations in Airborne Systems and Equipment Certification, 2012. [SOC 11] SOCIETY OF AUTOMOTIVE ENGINEERS, Aerospace Recommended Practice: Guidelines for Development of Civil Aircraft and Systems, 2011. [SOC 15] SOCIETY OF AUTOMOTIVE ENGINEERS, Aerospace Recommended Practice: Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment, 2015. [WIK 12] WIKI COMMONS, available at: https://commons.wikimedia.org/wiki/File: Fiabilite_approche_deter ministe_et_probabiliste.svg?uselang=fr, 2012.

4 Reliability in the World of Defense

The defense industry is characterized by long development cycles: for example, it takes five to seven years to develop a new missile. This requires the application of very innovative technologies or else the international competition will succeed at the strategic expense of the defense in question. These long cycle developments inspire new designs, new technologies (PCB reporting components and processes) or even new conditions of use which revolutionize products (as opposed to the evolution of a product). The maturation of a product in revolution assumes validation through experimentation, through tests at different stages of development and at the different levels of assembly of the product.

4.1. Introduction This chapter examines the evolution of reliability in the world of defense. It puts into perspective the role of environmental engineering as fundamental for raising a product to a high degree of maturation. This is an assumed prerequisite acquired by reliability engineering experts. However, operational reliability will not be immediately apparent at the beginning of product deployment. As outlined in the text below, it is necessary to go through a learning phase: from the implementation of the product to the attention paid to the functional and environmental conditions of its use (in order to, as far as possible, take into account the conditions of development). Trials have always played a very important role in “maturing” the product under development, with reliability growth tests and reliability demonstrations

Chapter written by Henri GRZESKOWIAK.

42

Reliability of High-Power Mechatronic Systems 1

shifting essentially to accelerated testing, as aggravated tests are not particularly used in the field of defense. Experts in reliability engineering in France have promoted an original approach to reliability prediction assessments to guide the control of the reliability process and the audit thereof (see section 4.7.1.2). On the other hand, this guide, like other existing international guides, has several weak points: – aging and the wear and tear of components is not taken into account (an exponential mortality law is generally assumed); – recent components such as IGBTs, MOSFETs and some less recent components such as electrolytic capacitors or film capacitors are not present in these guides, or if they are the corresponding data are not sufficiently reliable. They thus become critical components. 4.2. Operational dependability in the world of defense The term “sûreté de fonctionnement”, invented in France 30 years ago to encompass several concepts, has no exact equivalent in English. In France, “sûreté de fonctionnement” refers to four concepts: reliability, maintainability, availability and safety. In English the term dependability covers the concepts of reliability, availability and maintainability; however it does not include the safety aspect of dependability and is therefore treated separately. The word dependability is often assimilated inappropriately to “sûreté de fonctionnement”. As such, the term RAMS (Reliability, Availability, Maintainability and Safety) is preferred. The abbreviation RAM is used to designate the tryptic reliability, availability and maintainability, with: Reliability = 1 / MTBF; Maintainability = 1 / MTTR; Availability = MTBF / (MTTR + MTBF).

Reliability in the World of Defense

43

Figure 4.1. Performance architecture contributing to RAMS

– Reliability: The ability of a system to remain constantly operational for a given time and under a specific set of conditions for use. The MTBF (mean time between failures) refers to the average time between consecutive failures: MTBF = Sum of Good Operation Time / Number of Failures. The Sum of Good Operation Time includes the time outside of a failure event. As such, the MTBF, when expressed in units, can be very informative to operators, for example: number of faults per 100 hours of production. – Maintainability: the ability of a system to be quickly returned to operational status. Thus systems with easily removable components benefit from a better maintainability than other systems. Maintainability means, for an entity being used under given conditions, that the likelihood of a given maintenance operation can be performed over a given time interval. Maintenance is provided under given conditions and according to prescribed utilization procedures and processes.

44

Reliability of High-Power Mechatronic Systems 1

The indicator MTTR (Mean Time To Repair) literally means the average time it takes to repair a component or system, as well as expressing the average time taken up by repair tasks. It is calculated by adding the active maintenance time as well as the maintenance time schedules, the total of which is divided by the number of interventions: MTTR = Total Stop Time / Number of Stops. Active Time includes all the time taken up by: - fault location; - diagnosis; - intervention; - checks and tests. The time appendices include the time taken to carry out: - the detection; - the call for maintenance; - the arrival of maintenance; - the logistics of intervention. – Availability: The ability of a system to be operational as and when requested. This is an important concept for a safety device, such as a circuitbreaker for example. High availability is compatible with low reliability if the device can be repaired very quickly. The notion of availability expresses the probability that an entity is in a state of “availability” under given conditions at a given time, assuming that the provision of external means is assured. The availability or availability rate is the ratio of Actual Time of Availability / Required Time or the ratio of the Operating Time / (Operating Time + Unavailability Time). Availability is expressed as a function of the previous indicators in the following way: Availability = MTBF / (MTTR + MTBF).

Reliability in the World of Defense

Manufacturer Characteristics of the system Reliability

User Characteristics of usage Maintenance policy

Maintainability

Intrinsic Availability

45

Maintenance logistics

Operational Availability Usage cost Maintenance cost Ownership cost

Figure 4.2. Performance architecture leading to operational availability [NFX 85]

The NFX 60 503 standard gives the following definitions: - intrinsic availability: a value determined by the conditions of maintenance and supposedly ideal operation; - operational availability: a value determined by the conditions of maintenance and the operating data. – Safety: The ability of a system not to experience failures that are considered catastrophic (involving the safety of people and/or property) during a given time period. It should be noted that a good RAMS product assumes that it operates in and endures normal environments and under limited circumstances (as specified). The maximum (or extreme stress) and endurance values are considered to be the cumulative damage in normal environments and under limited circumstances which correspond with the environmental mission profile (involving the number of occurrences and the environmental conditions characterizing each occurrence: duration, temperature, hygrometry, etc.). A hypothesis concerning the damage law that links some of these parameters with time should also be included, in order to be able to accelerate time by increasing, for example, the amplitude of the environmental stresses. In these degradation models the functional aspect is sometimes taken into account as an input parameter. In the (more frequent) cases where it is not, it will be necessary to take into account these operating conditions (currents, tensions and their respective temporal evolutions, etc.) so that they come to interact with the environmental conditions of the accelerated test. Figure 4.2 shows the main stages in the maturation of a

46

Reliability of High-Power Mechatronic Systems 1

product in terms of its ability to function in the maximum normal and limit environments and its ability to endure the normal or cumulative environments of its life profile. The term “reliability growth” could also be used because any correction of a defect observed during developmental trials results in an increase in the intrinsic reliability of the product. Since reliability engineering, represented by reliable experts, does not manage and does not interact with these activities at all (this is the most frequent experience but there are exceptions), it is preferable to speak of “maturation growth”. This term usually reflects that the product maturity is approaching its final stage of development, including its production; it is not necessary to recall each time that it is a matter of maturation with respect to the product's ability to function in the maximum normal or limit environment and its ability to endure the cumulative environments of its lifetime profile. The product will therefore be said to be mature (in the sense defined above) after the implementation of actions on the process of engineering in consideration of the environment. These actions generally include accelerated, aggravated and even dedicated reliability tests, which are partly called for by a methodology that takes into account the environment (according to AFNOR pr XP-X 50144-1 to 6) and which results in a mature product at the end of development. The operations of stress screening in production are usually implemented in order to reveal the early defects (induced by the processes of production still not sufficiently controlled), potentially leading to modifications of the design. It is on a supposedly mature product that the reliability engineering will implement the analyses specific to the profession. Reliability growth trials are not cited here because they are long and generally inconsistent with current cost and delay requirements (development cycles have steadily decreased); we can cite specific cases of mass production where they remain relevant (dashboards for cars [CÊT 97]). It should be noted, however, that despite this supposed state of the product’s maturation after the application of the actions described in Figure 4.3, experience has shown that reliability prediction by reliability engineering is not apparent immediately (see Figure 4.4). A period of time is necessary for the end-user to be trained in the use of the product, to correctly assimilate the instructions for use and to adapt to the ergonomics of the product, especially for the conditions of use taken into account in development and recalled in the precautions for the use of the product. It is only at the end of this learning period that the product will have the degree of reliability as predicted by the reliability engineering.

*A drop in the reliability of the product is usually observed at the beginning of production.

Qualification tests

Taking into account the environment: Implementation of the set of 6 standards NF pr XP – X50144 1 to 6

Precipitation of latent defects in manufacturing: – stress screening – severe stress screening

Elimination of the product weak points: – accelerated tests – aggravated tests

47

Processes in the field of environmental engineering

Reliability in the World of Defense

Maturation growth

Figure 4.3. The engineering of environmental consideration

Failure rate

sup.

Defect distributions at the beginning of life: • •

15% component defects + manufacturing 85% defects associated with: - usage outside of specified domain - poorly studied ergonomics - poorly applied instructions (misunderstood and/ or inadequate training)

inf.

Predictions

Time

Figure 4.4. Evolution of the failure rate as a function of time found on a system at the beginning of its deployment [MAT 90]

48

Reliability of High-Power Mechatronic Systems 1

4.3. History of reliability in the world of defense 4.3.1. History of reliability in the US world of defense 4.3.1.1. Highlights relating to reliability in the US world of defense In 1951, Weibull published a statistical function which subsequently became famous by the name of its creator and is known as the Weibull distribution. In 1952, failure rates were analyzed by referring to the exponential distribution and compared to the normal distribution. Also in 1952, the Advisory Group on the Reliability of Electronic Equipment (AGREE) to the Department of Defense (DoD) fielded concerns regarding reliability and recommended the following three guidelines for the creation of reliable systems: – develop reliable components; – establish the quality and reliability requirements for suppliers. For the first time in 1954 a national symposium on reliability and quality control took place in the United States and in the following year the Institute of Electrical and Electronic Engineering (IEEE) formed an organization called the Association for Reliability and Quality Control. During the next two years, several important documents concerning reliability were created. In 1980, the DoD updated the MIL-STD-785A military handbook with the release of another publication, the MIL-STD-785B. The updated standard includes reliability growth and developing reliability testing as a recommended practice. In 1981, the DoD published the MIL-HDBK-189 handbook on Reliability Growth Management, which addressed the principles of reliability growth, its benefits and the guidelines and procedures to be used in managing the evolution of reliability. The MIL-HDBK-781 reliability tests, methods, plans and environments for engineering development, qualification and production were published and provided test methods, test plans, and test environmental profile factors that could be used in reliability testing during the development, qualification and production of systems and equipment; version D published in 1986 was replaced in 1996 by the MIL-HDBK-781.

Reliability in the World of Defense

49

The standard of reliability demonstration tests, MIL STD 785 B (Military Standard: Reliability Program for Systems and Equipment Development and Production) was released on 15 September 1980; its latest revision is Notice 3 issued in 1998. Its date of withdrawal as an active standard for the DoD is not known to the author. In 1983, the Naval Surface Warfare Center published the Statistical Modeling and Estimation of Reliability Functions for Software (SMERFS) to assess the reliability of software. In 1984, the DoD Air Force published an action plan, R&M 2000, to integrate the tasks of reliability and maintainability (R&M) into development processes. In 1988, a manual on reliability prediction procedures for mechanical equipment was issued by the Carderock Naval Research Center, Maryland. This manual was more commonly known as the Carderock Manual and the Navy published it as NSWC 92 / L01 in 1992. Within the field of reliability testing, the United States has been a driving force in the western world. Sufficiently convincing proof of this can be found by referring to the list of technical documents published in the US Reliability and Sustainability Symposia, IEEE reliability transactions or IEST publications published in the Product Reliability (PR) division. 4.3.1.2. Stress screening The stress screening of equipment, sub-assemblies and systems has been practiced for decades in the United States and significant research has been conducted to optimize these stress screening profiles. This optimization consists in defining the types of stress screening profiles (defining the nature of the stimulations and their amplitude, their durations of application, etc.) according to the nature of the equipment, its complexity and its usage profile. There are many publications, such as: – NAVMAT P9492, still relevant in 2016 for electronic equipment; – ESSEH 1981 “Environment Stress Screening Guidelines”;

50

Reliability of High-Power Mechatronic Systems 1

– “Stress Screening of Electronic Equipment” the Proceedings of the Annual Reliability and Maintainability Symposium held on the 21 September 1981; – “Stress Screening of Electronic Hardware”, a Report of the National Technical Information Service from May 1982; – report Ref. TR-81-87 of the Rome Air Development Center (RADC) of March 1981: “Which environmental stress screens should be used?”; – report of a study released in 2000 by the National Center for Manufacturing Sciences entitled “Environment Stress Screening 2000”. A great many American companies have acquired significant experience in the field of stress screening electronic equipment. For these companies, the stress screening of electronic equipment is an important factor in the reduction of costs in the general context of the “life cycle costs of the equipment”. For most of them, their experience in the matter is such that the correlation between the practice of an appropriate stress screening and the resulting financial gain could be evaluated for different product lines. The corresponding figures are presented in numerous publications on the subject. 4.3.1.3. Reliability demonstration tests in the US It appears that the practice of reliability demonstration tests at the qualification stage or even the production stage dates back to the end of the 1950s. In general, these tests were applicable to dominant equipment for which a failure rate assumption could be reasonably made. The test plans can be in the form of Fixed Length Tests (FLT) or Product Reliability Sequential Tests (PRST). Reliability demonstration tests seem to have had some success up until the end of the 1960s insofar as, in the field of armaments for example, the DoD could expect to obtain a good guarantee on the reliability of the equipment being evaluated (that is to say, at fairly low risk for the customer) and at a reasonable price. However, since the 1970s, with the increase in MTBF levels by ratios of about 10 due to the constant improvements of technologies, the demonstration tests needed to verify the requirement of a sufficiently low customer risk have become too expensive. For this reason, reliability demonstration tests, both at the qualification and production stages, have given way to more active and less expensive reliability growth tests, which

Reliability in the World of Defense

51

are themselves integrated into development tests. Then, gradually, it became more efficient to reexamine reliable products as early as possible in the development by ensuring good design practices, and as such even reliability growth tests gave way to accelerated tests, which are considered to this day to be the best experimental approaches attuned to both cost and delay in terms of modern requirements. The 5000.40 directive issued by the DoD in July 1980 on “reliability and maintainability”, recommends that demonstration tests should be carried out only if the expenditure incurred corresponds to a revenue that is at least equal. However, during the period from 1974 to the early 2000s, there were certain materials for which the reliability tests in production have been applied continuously and systematically: US Air Force air missiles and US Navy vessels have been subjected to systematic reliability testing at the Pacific Missile Test Center in Point Mugu, near Santa Barbara. The tests were applied to four to six missiles taken from a production batch, and consisted of mechanical vibration, acoustic vibration and rapid temperature variation (30°C/min) tests representing the mission profile. These tests lasted an average of one to two weeks, until the first equipment failures in operation (including the seeker function) were observed. The validity of the results obtained by these tests was regularly confirmed by the results obtained from a payload at the bottom of a plane during real flights. 4.3.1.4. Reliability growth tests in the US Here, the term “reliability growth tests” means all the types of tests whose principal objective is to reveal latent defects in the design of new equipment and, therefore, through the corrective actions made possible, the improvement of the reliability of this equipment. Such tests were generally carried out at the development stage, more or less integrated with other development tests; they were sometimes also conducted at the preproduction stage in the case of large series. Tests of this nature have been very successful in the United States, particularly in the field of armaments, for which the assurance of good reliability during operational life is a major objective. It is therefore not

52

Reliability of High-Power Mechatronic Systems 1

surprising that the military sector has served as a melting pot for investigations into the art and manner of conducting such tests. In order to trace the evolution of tests for the growth of reliability in the United States, we can roughly distinguish between two successive periods: – From the end of the 1960s up to about 1975, the experiments being pursued essentially consisted in expanding the classical development tests so as to model the observed evolution of reliability through these tests and verify the agreement of the experimental results with the best known mathematical models of growth (in particular that of Duane). These experiments have given rise to an abundant literature, which can be consulted in particular in the archives of the U.S.A Reliability Conferences, and those of the Institute of Environmental Sciences and Technology (IEST). Although the results obtained from these tests were always homogeneous, the general trend was observed in the increase in actual reliability over the course of these tests (not always in conformity with the classical logarithmic model of Duane) through the demonstration of an indisputable financial gain in the general context of the items’ lifecycle costs. – From the mid-1970s to the 2000s, reliability growth tests were investigated further through the pursuit of financial optimization in its broadest sense. In light of the experience gained (mainly in the military and avionic disciplines), it appeared that: - control over hardware reliability before beginning production represented a very important factor in reducing costs likely to be caused by the discovery of latent defects in the operational life of these products; - the best way to accelerate reliability growth and control it, along with other characteristics of the product, was to subject the product to cyclic testing in combined environments which simulated the future environmental profile of the product, the stress levels being applied equal to or more severe than the actual levels in order to reveal defects earlier. These types of tests, known as Combined Environmental Reliability Testing (CERT), have been very much favored by the United States for many years and have been the subject of numerous articles in the specialized literature of the Institute of Environmental Sciences and Technology (IEST). In general, these CERT tests were an integral part of the reliability growth tests performed during hardware development.

Reliability in the World of Defense

53

In some cases, for example in large markets with extended series, these could be carried out during the preproduction phase in order to be able to incorporate any corrective actions that would have subsequently proved more expensive under the established production regime. In most of the papers devoted to experiments in reliability growth tests, and in particular CERT, the financial benefits of conducting these tests were emphasized. Certain titles of articles on this subject were in themselves evocative, such as that of V.A. Mosca (United Technologies Corporation) published in Aerospace Congress & Exposition (Oct.1983): “Extraordinary Benefits of CERT on Digital Electronic Engine Controls” [MOS 83]. It is therefore clear from the literature of the United States on the subject that reliability growth tests have become more widely accepted in the same way as other development tests. They appeared to the client as a means of ensuring that reliability growth would take place according to the model originally envisaged and that this model would make it possible to ensure the best resource management. According to a study by the Rome Air Development Center (RADC), the TR 84-20/AD-A141232, the average cost of reliability growth tests represented 3% of the total cost of development. This expense was in general far lower than those resulting from an insufficient reliability in the life cycle of the equipment. It was in conjunction with this dual need that the draft standard MILSTD-781D was prepared. This marked a significant step forward in respect to version C which is primarily concerned with reliability in the development process. The example of the AMRAAM missile clearly illustrated the importance of the means dedicated to reliability growth tests as well as the value of the results obtained: six missiles had been subjected to reliability growth tests representing 12,500 cumulative hours; real-world test hours attained a 1,000hour MTBF from an initial MTBF estimated at 100 hours and with a planned growth rate equal to 0.5 (Duane coefficient). In addition, these six missiles had previously undergone stress screening. Finally, a 10-missile reliability demonstration was scheduled, to be carried out once the reliability growth tests had been completed.

54

Reliability of High-Power Mechatronic Systems 1

In conclusion, it appears that significant resources were devoted to reliability tests in the United States and these efforts, as far as the armaments sector was concerned, resulted from both the requirements of the DoD and the awareness of the industrialists concerned of the impact of the early maturity of the reliability of their products in terms of long- and mediumterm costs. In this context, the MILSTD-781-D standard played an important role, as evidenced by the many articles that were written on it. According to W.E. Wallace [WAL 85], the evolution envisaged favored a better knowledge of the quantitative relations between the constraints applied and the MTBF obtained. The corresponding probabilistic models, once validated, could lead to a better controlled use of the accelerated tests and therefore to a significant reduction in costs and delays brought about by the reliability growth tests. A partial list of literature illustrating the US’s experience in reliability testing is exemplified by [CRO 75, CRO 78, SCH 75, COX 76, JRF 79, HBC 84, MUR 78]. 4.4. Reliability in the field of defense in the US today 4.4.1. Some observations In recent years, there has been a dramatic increase in the number of systems that did not meet Reliability, Availability and Maintainability (RAM) requirements. Figure 4.5 shows a collection of defense systems including those that responded to and met the reliability specifications as well as those that did not. Research into the reasons for this has brought to light: – a reduction of the personnel responsible for the acquisition of new equipment over the past 15 years, which has had a negative impact on the implementation of the practices associated with RAM requirements; – that with few exceptions, the practice of reliability growth methods was interrupted during system design and development;

Reliability in the World of Defense

55

– that the relevant military specifications, standards and other directives are not used; – that the benefits of improvements associated with RAM such as reducing life cycle costs and demand on logistics systems are no longer widely shared. Demonstrated vs. Specified Reliability

Figure 4.5. Proportion of defense programs not respecting reliability requirements in the US [DEF 08]

4.4.2. Recommendations for improving this situation The most important point for correcting high failure rates is to ensure robust formulation and that RAM requirements are taken into consideration by integrating them into the development process. No test or set of tests will compensate for deficiencies in the formulation of RAM requirements. To this end, the following actions related to the consideration of RAM requirements have been considered as necessary: – identify and define RAM requirements and integrate them into the tender as a mandatory contractual requirement; – when selecting the bid, assess the bidders’ approaches to meeting RAM requirements; – ensure the subcontractors;

top-down

declination

of

RAM

requirements

to

56

Reliability of High-Power Mechatronic Systems 1

– require the development of advanced indicators to ensure that RAM requirements are monitored; – include a robust reliability growth program as a mandatory contractual requirement and document advancements in each program progress review; – ensure that a credible reliability assessment is conducted during the various stages of the technical review process and that reliability criteria are feasible within an operational environment; – strengthen the program manager's ability to assess RAM-related achievements. 4.5. Importance of taking into account influential environments in the product life profile Figure 4.6 presents proposals for the improvement of equipment that has already been deployed and which has encountered failures in operation in the three Services of the DoD. Initially, three-quarters of these failures were related to innovative design or the use of new technology and a quarter appeared to be related to the influence of environmental agents. After further analysis, it was confirmed that environmental agents played a major role in three quarters of these failures. The origins of the failures were primarily linked to: – sand and dust; – low temperatures; – high temperatures; – rain; – storage conditions. How could these failures have been avoided? By better taking into account the distribution of their occurrences, by better taking into account their impact in the design phase, and by more representative tests. This is the object of the series of six standards AFNOR pr XP-x50144 1 to 6: – NF X 50-144-1:2000, Test Design And Realization – Test In Environment – Part 1: Bases Of The Process;

Reliab bility in the World of Defense

57

– XP P X50-144-2:2013, Demonstration Of The Resistance To Environnmental Factoors – Design And A Executio on Of Environnmental Testss – Part 2: Guidelines G Foor The Custom misation Process In Environm ment; – XP P X 50-144-3:2014, Demonstration D n Of The Resistance To Environnmental Factoors – Design And A Executio on Of Environnmental Testss – Part 3: Implementaation Of Thee Customisattion Process In Mechaniical Environnment; – prX XP X 50-144-4, Demonstraation Of The Resistance Too Environmenntal Factors – Design andd Execution off Environmenttal Tests – Parrt 4: Applicatiion vironments; of the Personalizationn Approach inn Climatic Env – XP P X 50-144-55, Demonstrattion Of The Resistance R Too Environmenntal Factors – Design andd Execution of o Environmen ntal Tests – Part P 5: Guarannty Coefficiient; – prrXP X 50-1444-6, Tests – Conception n and Executtion of Testss – Environnmental Tests – Part 6: Testt Factors.

F Figure 4.6. Lea ading causes of failure on newly n deployed d systems

58

Reliability of High-Power Mechatronic Systems 1

4.6. Websites dedicated to the dissemination of good reliability practices in the United States The leading association in the field of reliability in the United States, where many representatives of the DoD have been actively involved for decades, the Institute of Environmental Sciences and Technology (IEST) (http://www.iest.org), each year organizes a technical symposium called ESTECH. There are several working groups within the IEST working on questions relating to product reliability: – PR001: Management and Technical Guidelines for the ESS Process; – PR002: Automotive Issues; – PR003: HALT and HASS; – PR781: Reliability Testing Handbook (MIL-HDBK-781; and also disseminating Recommended Practices: http://www.iest.org/ Standards-RPs/Recommended-Practices. Those practices relating to the field of reliability are entitled: – MIL-STD-781; – Environmental Stress Screening of Electronic Hardware. The IEST also organizes the annual RAMS symposium, dedicated to discussing reliability, availability, maintainability and safety. The site address is: http://www.iest.org/Meetings/RAMS-2017. The Society of Automotive Engineers (SAE) organizes a major congress every year in Detroit. Many articles related to reliability are presented here, as well as a large number of tutorials. However, this is not the reference symposium for US defense officials. Another important player in the United States in the field of electronic component reliability is the Center for Advanced Life Cycle Engineering (CALCE) at the University of Maryland. CALCE began as a National Science Foundation Center in 1985 and has evolved into a center that is funded by more than 150 major international electronics companies and organizations, which together endow over $6 million per year for research. Dependence on modern technologies and complex electronics poses

Reliability in the World of Defense

59

significant and growing challenges to life cycle risk management. From the irritated customer at the ATM to the vexed business traveler delayed by a flight, or from the dangerous malfunction of an automobile equipment to a $100 million loss for satellites and airliners, “the inability to properly manage the life cycle of electronic systems affects everyone”, says President George Dieter and Professor Michael Pecht, founder and director of CALCE, an IEEE Fellow, ASME Fellow and IMAPS Fellow, who recently received the highest honor in the field of reliability, the “Lifetime Achievement Award” from the IEEE Reliability Society. CALCE has made significant contributions in: – the design for reliability and the evaluation of reliability based on physics; – accelerated test methods; – the development of standards for the industry; – the management of the electronic supply chain; – reliability and qualification analysis; – life cycle management and obsolescence management; – prognoses and management of equipment health. CALCE has been a leader in the development of a number of IEEE, JEDEC, GEIA and IEC best practice standards. In addition, publications of CALCE have affected a number of industry standards, crucially including: – IEEE 1332: Standard Reliability Program for the Development and Production of Electronic Components; – IEEE 1624: Standard for Organizational Reliability Capability; – IEEE 1413: Standard Methodology for Reliability Prediction and Assessment for Electronic Systems and Equipment; – JEP 148: Reliability Qualification of Semiconductor Devices Based on Physics of Failure Risk and Opportunity Assessment; – GEIA-STD-0005-2: Standard for Mitigating the Effects of Tin Whiskers in Aerospace and High Performance Electronic Systems; – IEC / PAS 62240 (also published as GEIA 4900): Use of Semiconductor Devices Outside Manufacturers’ Specified Temperature Ranges.

60

Reliability of High-Power Mechatronic Systems 1

In the area of reliability and qualification analysis, CALCE processes and models have become standardized for analyses based on the Physics of Failure (PoF) of electronic systems. CALCE’s accomplishments in reliability and qualification analysis also cover the evaluation of organization as a measure of the effectiveness of an organization in meeting customer requirements. A wide range of industries, including aerospace, automotive, household appliances, medical and telecommunications industries, are actively using CALCE software and accelerated testing approaches. Among the organizations that apply CALCE methods, the United States Army is paving the way, creating and maintaining a PoF analysis group that uses calcePWA software and techniques to evaluate electronic designs for US defense programs. NASA also utilizes CALCE PoF models in the planning of manned missions to the Moon and Mars. International companies such as Boeing, Daimler, General Electric, General Motors and Vestas use CALCE PoF models to incorporate electronic power modules into systems such as aircraft or hybrid cars. CALCE is at the forefront of research in prognostics and health management (PHM). CALCE has formed a research collaboration to meet the demand for PHM applications. Thanks to the work of their PHM consortium, CALCE has developed a new paradigm for reliability prediction on the basis of prognostics where sensor data are integrated into the models, thus enabling an in situ assessment of the deviation of a specific product compared to the expected normal operating state. In the electronic part of the field of supply chain management, CALCE has developed the concept of “uprating”, a process that mitigates the risks associated with the use of semiconductor devices outside of the manufacturer's specifications. CALCE has also developed the first quantitative analysis for the life span of electronic components, enabling electronic integrators to reduce the inventory of spare parts, and thus save money. CALCE’s electronic component management methodology forms the core of the electronic components management plan for the commercial avionics industry. CALCE has developed a number of methods, including obsolescence prediction algorithms, which are used in prominent tools and databases (i2, Qtec, PartMiner and SiliconExpert). CALCE is also a leading provider of tools and methodologies for the strategic management of long-life systems. CALCE has developed the most widely used method for managing

Reliability in the World of Defense

61

the obsolescence of electronic components, which is used by organizations such as Motorola, Northrop Grumman and the US Department of Defense. For example, by applying the CALCE methods Motorola said it saved $33 million. Northrop Grumman was able to predict the dates for the replacement of the F-22’s radar components (due to obsolescence) five years earlier than previously. The researchers and professors of CALCE have also created a graduate program on Electronic Systems and Products at the University of Maryland which has trained several hundred engineers. In addition to the graduate program, CALCE also provides professional development courses and webbased seminars for engineers working in the industry. More information on CALCE can be obtained on their website http:// www.calce.umd.edu. The mission of CALCE is to provide industry support so as to help develop competitive electronic products (5 to 10 times) more quickly. This support includes: – the provision of design and manufacture methods; – simulation techniques and models; – experimental methods; – guides, handbooks and training materials. Multidisciplinary researchers are trained to identify, evaluate and update methods and practices for the design, manufacture and evaluation of reliable and competitive electronic products through collaborative studies with manufacturers. These carefully selected studies include: – modeling of component failures; – accelerated tests; – characterization methods; – components in high temperatures; – software, modeling and automation; – connectors and electrical contacts;

62

Reliability of High-Power Mechatronic Systems 1

– modeling and characterization of manufacturing processes; – optoelectronic circuits; – components micro-encapsulated in plastic. The technological innovations resulting from these studies are transferred between universities and manufacturers. This research has led to significant advances in flexible printed circuits, printed circuit boards, microwave circuits, monolithic integrated circuits, plastic components, controlling metal migration, high density electrical and optical connectors, conventional connectors and interconnectors. Advanced thermo-mechanical fatigue models have been developed especially for surface-mount technology (SMT). Knowledge of research on the physics of failure, in the characterization of materials and the effects of temperatures on electronics, is used to develop guides and accelerated test procedures. In addition, finite element analysis is used to estimate the reliability of advanced interconnected technologies and the latest component transfer techniques. This research has been successfully transformed into industrial processes, methods and recommended practices through the implementation of software on industrial sites. Some of these companies integrate CALCE software into their own tools. Future developments will be aimed at reducing the impact of production costs but maintaining the quality and reliability of electronics. The scope covers multi-chip modules (MCM) and high temperature electronics and optoelectronics with a special focus on the development of innovative electronic products. The Reliability Analysis Center (RAC) was a US Department of Defense: the Information Analysis Center (IAC). Since its inception in 1968, the RAC has improved the reliability, maintainability and quality of components and manufactured items for the US government and industries supplying the DoD. To this end, the RAC collected, analyzed, archived in computerized databases and published data on the quality and reliability of equipment and systems including microcircuit, discrete semiconductors, electromechanical and mechanical components. The RAC evaluated and published information on engineering techniques and methods. The information was distributed in

Reliability in the World of Defense

63

the form of data compilations, application guides and software tools. Training courses and consulting services were also provided. The US Air Force Research Laboratory was responsible for the technical management of the RAC. In the meantime, the RAC has become the RIAC – the Reliability Information Analysis Center – which has ceased activities that have been taken up by an already existing private company Quanterion (https://www. quanterion.com/products-services/tools/). Quanterion now offers related services to the government and to the industry through the center of knowledge RMQSI which stands for (Reliability Maintainability Quality Safety Interoperability). Quanterion’s experience in performing a multitude of analytical activities over the decades has allowed them to have an in-depth understanding of specific processes, including the identification of tasks that can be simplified and/or automated to reduce the time and resources for many processes. They have developed a set of software tools to provide the desired simplification, which are integrated with complementary informative ground rules and many other practical tips. They offer a variety of software tools to improve productivity: – Nonelectronic Part. Reliability Data NPRD-2016s. The 2016 edition dedicated to the reliability data for non-electronic components and parts (NPRD-2016) presents failure rates for a wide variety of electrical assemblies and electromechanical/mechanical parts and assemblies. Compared to its predecessor NPRD-2011, NPRD-2016 adds 138,000 new parts and more than 370 billion component hours, representing an increase in content of about 400%. – Electronic Part Reliability Data EPRD-2014. This document contains reliability data on commercial and military electronic components for use in a reliability analysis. It contains data on the failure rate of integrated circuits, discrete semiconductors (diodes, transistors, optoelectronic devices), resistors, capacitors and inductors/processors, all of which have been obtained from these electronic components in operation.

64

Reliability of High-Power Mechatronic Systems 1

– Failure Mode/Mechanism Distribution FA-2016. The FA-2016 database contains data on failure modes and their distribution for a variety of electrical, mechanical and electromechanical parts and assemblies. These data can be used to assist in reliability analysis and the prediction of Failure Modes, Effects and Criticality Analysis (FMECA) and Fault Tree Analysis (FTA). – Quanterion Automated Database (NPRD-2016, FMD-2016, EPRD2014). This interactive software tool provides an efficient search and retrieval function for all the information contained in the EPERD, NPRD and the FMD which also exist in paper form. Containing an integrated search engine, information can be searched according to a part number, part type, manufacturer or National Stock Number (NSN). Additional filters are available to specify the application environment and quality levels. – Quanterion Automated Text Reader (NPRD-2016, FMD-2016, EPRD2014). This interactive software tool provides access to all information contained in the individual sections of the products that are in the paper version of PERD, NPRD and FMD. In each section, the tool provides basic search and navigation capabilities that are able to locate information of interest. Individual pages can be printed for archival or reporting purposes as and when required. – Reliability Online Automated Databook System (ROADS) – All Databooks (NPRD ERPD FMD). The Reliability Online Automated Databook System (ROADS) is a subscription service that provides personalized access to all or some of the following Databooks: NPRD, FMD, EPRD. ROADS Databooks are periodically updated and their configurations checked so as to ensure that the data presented is always traceable to a specific version of a paper. – 217Plus™: 2015 (Spreadsheet Calculator). 217Plus™: 2015 is the last iteration of the tool for the reliability prediction of popular electronics and includes several new and updated templates in addition to a number of improved features. – Warranty Calculator. The Warranty Calculator provides calculations and comparisons for ten types of different warranty guarantees currently in

Reliability in the World of Defense

65

automated use. The tool also includes data, information and references which provide additional guidance for more than 30 types of supplementary warranties. – Derating Calculator (NAVSEA SD-18 Parts Requirements and Application Manual). This Excel® spreadsheet calculation tool determines if a design meets the derating requirements of the SD NAVSEA-18 in terms of its components. – QuART (Reliability Toolkit). QuART is an automated version of the popular RADC reliability “toolbox”, which includes a number of tools for prediction and design analysis to test planning and statistical analysis using the failure data. – Total Life Cycle Cost Benefits calculator (TLCC). This entails the calculation of benefits analysis into the causes of the failure in order to assist decision-making. It aims to justify (or not) the financial investment by performing an analysis of the root causes of the failure and comparing them with the savings that can be achieved over the long term that would result from the improved reliability. 4.7. French websites dedicated to the dissemination of good practices in the field of reliability in France today 4.7.1. France’s main actors in the field of reliability in interaction with the world of defense 4.7.1.1. The IMdR (The Institute for the Control of Risks) In the field of reliability, numerous projects have been carried out under the aegis of the ImdR. The results of these projects are sold by IMdR see: http:// www.imdr.eu/Menu/ACTIVITES/Publications+IMdR/Consultation-commande +de+documents+en+vente/page_retour/390/p-386.html. These projects, although not carried out under the auspices of the French Ministry of Defense, the DGA, were more or less inspired by the needs of the DGA.

66

Reliability of High-Power Mechatronic Systems 1

For example: – Project n°3/96: “Taking into account the impact of the environment in the construction of the RAMS of a product” describes the link between the profession of reliability engineering and the environmental engineering profession; – Project n°4/99: “Recommendations for the industrial use of highly accelerated tests”; – Project n°9/2003 “Approach to characterizing the life profile of a product”; – Project 1/2005: “Optimization of the design and dimensioning of a product using an extension of the resistance / stress approach”; – Project 1/2007: “Modeling of structural degradation and optimization of their inspections”; – Project 3/2008: “Methods and tools aging/rejuvenation of maintained equipment”.

for

the

detection

of

4.7.1.2. FIDES The DGA is one of the founders of the FIDES Guide (www.fidesreliability.org/), the overall methodology of reliability engineering in electronics, consisting of two parts – An assessment guide for reliability prediction; – A guide to mastering and auditing the reliability process. The objectives of the FIDES Guide are, on the one hand, to allow a realistic assessment of the reliability of electronic products, including systems that encounter severe or very benign environments (storage) and, on the other hand, to provide a concrete tool for the construction and control of this reliability. Its main characteristics are: – the existence of modelizations for electrical, electronic and electromechanical components as well as electronic boards, or PCBA (Printed Circuit Board Assembly), and certain sub-assemblies; – the identification and consideration of all technological and physical factors that have an identified role in the reliability;

Reliability in the World of Defense

67

– an accurate account of the life profile; – the taking into account of accidental electrical, mechanical and thermal (or overstress) overloads; – the taking into account of failures related to the development, production, operation and maintenance processes; – the possibility of distinguishing several suppliers from the same component. Through the identification of contributors to reliability, whether technological, physical or manufacturing, the FIDES Guide allows engineers to act on definitions throughout the product life cycle so as to better control and improve reliability. For the record, the other existing directories are: – the IEC. TR 62380: 2004 Reliability data handbook – Universal model for reliability prediction of electronics components, PCBs and equipment; – the “217 Plus” Manual on reliability data – Universal model for the reliability prediction of the reliability of electronic components, PCBS and equipment – is the continuation of MIL HDBK 217 F by a private company; – the MIL-HDBK-217F, notice 2 has been officially canceled. The “217 Plus” manual has somehow taken over in another form and under the aegis of a commercially-oriented company. The reliability of the estimated equipment is still largely based on the use of the constant rate of failure provided by the above directories. This is despite many opinions that have attempted to put into perspective the validity of derived predictions in this way. This is particularly important for plastic components, which are still attached to very pessimistic values (in the case of MIL-HDBK-217) related to their problematic behavior at extreme temperatures (below 0°C and above 85°C). Conversely, when used in the 0°C to 85°C temperature range, they can be seen as reliable and similar to metal components. A promising alternative to this approach has developed a set of information over the past few years by implementing mortality rates and statistical laws. Supporters of this school consider that faults in components

68

Reliability of High-Power Mechatronic Systems 1

arise as the deterministic consequence of physio-chemical phenomena, which are accessible by modeling. This school is represented by CALCE (see above). 4.7.1.3. The ASTE (Association for the Development of Environmental Sciences and Techniques) The ASTE (Association for the Development of Environmental Science and Technology) in France is the sister association of the IEST in the US. It is a crossroads bringing together experts from various fields and sectors of industry and the DGA. Among the publications linked to reliability, which are always available to purchase, are: – the reliability of electronic components; – the stress screening of the electronic equipment (edition 1986); – the severe stress screening of electronic equipment (2006 edition); – the role of tests in the mastery of product reliability. It also offers several training modules in connection with reliability and testing. 4.7.1.4. Workshop 10 of the CEN (European Center of Standardization) Workshop 10, organized in the framework of the CEN with the assistance of experts from the DGA and the French industries involved in the development of arms, constitutes a reference document for a European defense common market. Twenty disciplines have been selected and, for each one, a group of experts have been established. The aims of the various expert groups created were: 1) to identify the standards and procedures used today by national procurement agencies; 2) to assist future defense materials specialists in working with a shorter and more coherent set of standards and best practices to use in the selection of standards.

Reliability in the World of Defense

69

A second version of the manual developed by “Expert Group 8” in the field of Environmental Engineering was published in April 2011 and is available on the CEN WS 10 website (see link below). The EG17 deals with reliability and safety. Below is the website from which the various expert group reports can be downloaded: https://edstar.eda.europa.eu/best-practice-recommendations Below is a list of all the JMC EDSTAR expert group reports. They make best practice recommendations for the corresponding specific technical areas: – Nuclear Biological Nuclear Defense – EG 01 – Final Report; – Energy Equipment – EG 02 – Final Report; – Fuels and Lubricants – EG 03 – Final Report; – Batteries – EG 04 – Final Report; – Packaging – EG 05 – Final Report; – Electrical Interfaces – EG 06 – Final Report; – Electromagnetic Environment – EG 07 – Final Report; – Environmental Engineering – EG 08 – Final Report; – Armored Vehicle Technology – EG 09 – Final Report; – Ammunition – EG 10 – Final Report; – Paints and Coatings – EG 11 – Final Report; – Fluid Handling Systems – EG 12 – Final Report; – Management of the Project Lifecycle – EG 13 – Final Report; – Life Cycle Technical Documentation – EG 14 – Final Report. 4.8. Reminder of a few real-life examples in the world of defense 4.8.1. Case study 1: the KURKS accident The KURKS (Russian nuclear submarine) accident, in which all crew members perished, was the result of the disastrous effects of contamination

70

Reliability of High-Power Mechatronic Systems 1

by fluids. The fuel torpedoes contained hydrogen peroxide, which is in itself a low-sensitivity liquid (a torch flame does not ignite it), but upon contact with certain metals a large volume of gas is released. A torpedo was accidentally turned on (empty and still in its compartment, that is to say the torpedo was not launched) and the unexpected heat generated by it caused an effusion of hydrogen peroxide: the gases then spread throughout the pressurized envelope and ended up exploding the torpedo. This explosion of small amplitude caused a fire in the torpedo compartment which in a few minutes resulted in the explosion of all the torpedoes and the loss of the submarine. Two explosions, the first barely visible on the recording and the larger second one, were both recorded by a seismograph in Great Britain. By enlarging the pressure transient of the first explosion one finds the second explosion almost superimposed, and for good reason: it translated the same transient response for the same structure even though it was set off by two successive shocks of very different magnitudes. This observation made it possible to invalidate the hypothesis of assault by an enemy submarine being propounded by the Russian officials. 4.8.2. Case study 2: missile reliability tests at Point Mugu At the request of US Navy, Method 523, “Vibro-Acoustic Temperature” was added to the MIL-STD-810D (March 10, 1975). This came about as a consequence of the failures encountered in the operational use of different types of missiles fired from combat aircraft during the Vietnam War. A significant amount of NO-GO (a NO-GO corresponds to a missile firing failure following from a pilot giving the firing order) had been reported; a committee of experts had prepared a proposal for corrective measures recommending that reliability tests be carried out on these missiles. A typical reliability test requires the Unit Under Test (UUT) to be exposed to a combination of influential environmental factors for a sufficiently long period of time in order to obtain information of the relevant failures. These tests have been deployed on several (approximately seven) parallel programs and have been applied either in the Reliability Development Growth Test (RDGT), the Reliability Qualification Test (RQT) or the Production Reliability Acceptance Test (PRAT). Production tests (PRAT) consisted of applying vibration tests (low frequency with electro-hydraulic vibration generator) on four or even up to six missiles, all taken from the same production batch, in addition to acoustic vibration tests (controlled at 1 kHz) with temperature variations (30°C/min, where the missile is placed in a

Reliability in the World of Defense

71

transparent plastic envelope) at the same time, together representing the profile of the mission. The duration of these PRAT tests is one to two weeks with the objective of observing failures during the period. The validity of these test results is regularly confirmed by the results obtained from the payload during combat flights. The DGA was tempted to carry out such an installation in France in the early 1980s. The project was very advanced and was interrupted at the stage of the study firstly due to the operating cost of such an installation (200€/hr for 5,000 hours (approximately 1M€) in addition to the cost of the missile(s) to be tested). Another argument made against this installation was that the average MTBF of the American missiles before these tests was approximately 150 hours and 1,500 hours after the corrections due to the application of the RDGT tests. By comparison, the French missiles showed an operational reliability estimated at 1,500 hours without having been subjected to these tests. In the early 2000s, test activity at Point Mugu gradually decreased until it was abandoned. The message that accompanied the cessation of these tests was that all the experience gained throughout the 25 years of the application of these tests, and the feedback from operational experience, was addressed as early as possible in the design stages. 4.8.3. Case study 3: reliability tests at the CEAT (Toulouse, France) A French test center the CEAT (Toulouse Aeronautical Test Center), in the 1970s built a chamber capable of simultaneously applying some of the following environmental agents: vibrations, temperature, humidity and depression (altitude). After performing several thousand hours’ worth of reliability testing on avionics equipment, they wrote a “guide to operating a reliability testing laboratory”. Finally, the experiment was stopped, as illustrated by a scenario where a piece of equipment for a combat aircraft was subjected to one year of combined reliability tests, after which a failure still occurred. The failure was appraised in order to identify the causes with the aim of finding a correction; a correction which had to be validated by the application of numerous combined tests over several months. After almost two years of application, the equipment in question underwent a modification of its definition for reasons independent of the test.

72

Reliability of High-Power Mechatronic Systems 1

The conclusion of the CEAT was that this kind of test on avionics equipment that had a high MTBF was irrelevant. However, on the other hand, this type of test was more relevant for avionics equipment with a more stable definition, such as Pitot Tubes. 4.8.4. Case study 4: the Falklands War During this war, ground equipment from a ground-to-air missile system was subjected to increasing moisture vapor penetration from the outside to the inside of the equipment, thereby replicating the usual breathing phenomenon of equipment subjected to a diurnal temperature cycle. Condensation occurs inside the equipment when the relative humidity reaches the saturation point and the internal temperature of the equipment is equal to the dew point; the resulting water droplets flow along the inner wall partially filling the bottom of the housing with water. After a few days, there was enough water to fill a glass and the personnel servicing the anti-aircraft system came up with an adequate solution to the problem by drilling a hole in the casing which allowed the water to evacuate. It should be noted that the usual standards describing moisture tests do not produce condensation. The test specifier seeking to reproduce this effect must therefore define a preliminary application on the unit under test (UUT) in order to fill it with moist air whilst applying a sufficiently rapid temperature variation test that will reproduce the phenomena of condensation and runoff. The simple application of a standard wet heat test is not capable of producing such effects. 4.8.5. Case study 5: air missile and buffeting on a combat aircraft during captive flight The various environmental tests that are applied to equipment during the qualification phase should not precipitate a failure at a higher level of assembly but rather at a lower level of assembly; otherwise, these tests would not be applied at the end of product development. In the event of failure at a high level of assembly, this would lead to a significant overhaul of the equipment in question involving high costs and long delays. On the contrary, static mechanical and fatigue tests which validate the structural dimensioning are carried out at the outset of development. So, what exactly happened on this medium-range air-to-air missile is that structural degradations occurred during the captive flight of a combat aircraft

Reliability in the World of Defense

73

in specific phases of the flight, known as buffeting. This phenomenon corresponds to a fluid (air)/structure interaction leading to high amplitudes of low frequency vibrations. There are several types of buffeting: one that solely affects the aircraft and the other affecting the missile mounted under the wing of the aircraft. The surrounding conditions (altitude, speed, incidence, etc.) are not the same for the different types of buffeting. The aircraft manufacturer is naturally familiar with the aircraft itself: the electrical controls are located so as to avoid the aircraft’s buffeting zone, as this would be too dangerous for the aircraft, and many tests are carried out towards the end of development. However, by comparison, less is known about the buffeting that affects the missile because many missile flights would be required in order to test the various different attachment positions and different flight points (altitude, speed, incidence = angle of the missile to the air flow). This explains why the problem affecting the program in question was only revealed at a very late stage. The cost of design modifications and the delays induced were very significant. 4.8.6. Case 6: incorrectly taking into account the inrush current variability of a cold start diode A sub-assembly had just been qualified when after a few weeks, one of the components of this subassembly showed a failure during a cold start test at -40°C. It was found, after analysis, that the diode exhibited significant variability when confronted with the inrush current occurring during a cold start, which was inhibiting the electronic function of the diode. 4.8.7. Lessons learned from these six case studies The causes of the Kursk catastrophe (case study 1) were an insufficient analysis in terms of failure and criticality (FMECA or Failure Modes and Effects Analysis), attached to the implementation of hydrogen peroxide, well known for causing other catastrophes. The question as to the consequences of a hydrogen peroxide leakage should have been discussed, with particular focus on whether the situation could remain under control in all cases. In the case studies of Point Mugu (case study 2) and the CEAT (case study 3), there were dedicated reliability tests (i.e. no acceleration process could be implemented, other than the elimination of non-binding values).

74

Reliability of High-Power Mechatronic Systems 1

These very long and costly tests are no longer (with the odd exception) fashionable and have been gradually abandoned. In the case studies of the Falklands (case study 4) and the buffering of airto-air missile (case study 5), the real values of the environmental agents and their interactions had not been taken into account correctly. In the case of the Falklands, the analysis of climatic environmental conditions should have focused specifically on the risk of condensation runoff on the skin of the electronic box of the firing station. Furthermore, the laboratory simulation required more than just a simple humidity heat test at constant temperature. In case study 6, it was the variability in the performance of an electronic component in cold conditions which had escaped the designer who did not therefore seek the corresponding means of mitigation. In general, when the analysis for causes of failures related to environment are carried out, the following list of causes can be determined: – actual environmental conditions different from the conditions taken into account under development (case studies 4 and 5); – poor consideration of environmental variability and/or resistance to the environment (case study 6); – inadequate transmission of information and of the experience acquired by the users in the field from one project team to another (case study 4). 4.9. Conclusion In the defense sector, the challenges facing reliability require a preliminary maturation of the system through experimentation. The aim is to adapt the test specifications of a system in such a way so as to satisfy its most realistic operational performance requirements. This is economically justifiable for systems produced in very small series, such as defense systems. In consonance with this approach is the idea of empowering a supervisor with the responsibility of achieving the desired operational performance for the whole field of employment, rather than for a limited area, through a number of agreed tests. Therefore, it is the prime contractor who defines the test program at all levels of the system’s assembly, themselves entirely committed to the testing program in order to demonstrate the good performance of the system on the entire operational application. The role of the client is to specify and translate, in technical terms, the areas of

Reliability in the World of Defense

75

application and indicate the expression of requirements. For example, when this document indicates that a fighter aircraft must withstand the landing shock, it is the client who will specify this level of aggression. 4.10. Bibliography [CÊT 97] CÊTRE B., Essais de fiabilite sur tableaux de bord – état de l’art, 1997. [COX 76] COX T.D., KEELY J., “Reliability Growth Management of SATCOM Terminals”, Proceedings of the Annual Reliability and Maintainability Symposium, Las Vegas, 1976. [CRO 75] CROW L.H., “On Tracking Reliability Growth”, Proceedings of the Annual Reliability and Maintainability Symposium, Washington D.C., 1975. [CRO 78] CROW L.H., “On the AMSAA Reliability Growth Model”, Seminar Proceedings, Institute of Environmental Sciences, Washington D.C., 1978. [DEF 08] DEFENSE SCIENCE BOARD, Report of the Defense Science Board Task Force on Developmental Test & Evaluation, Washington D.C., 2008. [HBC 84] CHENOWETH H.B., BREMERMAN M.V., “Reliability Growth Theory Assumption Revisited”, The Journal of Environmental Sciences, Jan/Feb 1984. [JRF 79] FARRELL J.R., KINDL H.J., “The Operational Impact of the Navy’s First TAAF Program”, Proceedings of the Annual Reliability and Maintainability Symposium, Washington D.C., 1979. [MAT 90] MATRA DÉFENSE, Tirée de planches sur la justification des essais en environnement, 1990. [MOS 83] Extraordinary Benefits of Combined Environmental Reliability Testing (CERT) On Digital Electronic Engine Controls, SAE Technical Paper 831480, 1983. [MUR 78] MURDEN W.P., HOY W.W., “Defining Cost-Effective System Effectiveness and Safety Programs”, Proceedings of the IEEE, Los Angeles, 1978. [NFX 85] AFNOR, November 1985: Introduction à la disponibilité, NFX 60503, 1985. [SCH 75] SCHAFER R.E., SALLEE R.B., TORREZ J.D., Reliability Growth Study, RADC-TR-75-253, Hughes Aircraft Company, 1975. [WAL 85] WALLACE W.E., “Present Practices and Future Plans for MILSTD-781”, Naval Research Logistics Quarterly, vol. 32, 1985.

5 The Objectives of Reliability

The reliability prediction of an electronic device must be quantified as precisely as possible. This involves formulating reliability targets according to the functional and environmental aspects of the mission profile assigned to the device and its technological structure. This formulation step is important because it forces the designer to collect and quantify all the theoretical considerations needed to estimate and guarantee the level of reliability required during the life cycle. At the preliminary design stage, the generality of the problem thus posed is not soluble in view of the heterogeneity of the devices, which are themselves comprised of a mixture of degradable and non-degradable components. In this chapter, we propose a methodology that is simple to implement, based on simplifying assumptions whose validity is justified.

5.1. Introduction and objectives For technical and economic reasons, the use of electronic devices is becoming widespread in many fields of application. Under the influence of technical progress, two essential factors are involved in terms of reliability requirements: the increase in performance associated with a reduction in costs and, correspondingly, in terms of the lifetime assigned to these devices. This implies, depending on the fields of application concerned, guaranteeing the highest possible level of reliability up until the end of their life cycle.

Chapter written by Lambert PIERRAT.

78

Reliability of High-Power Mechatronic Systems 1

Given its usual normative definition – “Reliability is a characteristic of a device expressed as the probability of it performing a required function under specific conditions over a given period of time” [AFN 88] – in other words, reliability must be defined and guaranteed by considering additional criteria: technical (maintenance of functional performance) and/or economic performance (an increase in operating costs, obsolescence, etc.). Therefore, it is a general formulation that requires clarification when it is to be used in a particular situation. Although the consistency of current mechatronic devices exceeds the strict framework of electronics, in the sense that mechanical components are associated with it, there is no need to discriminate between the reliability objectives of these two types of applications [PIE 15]. Indeed, the mechanisms of physical degradation affecting the electronic and mechanical components are quite often similar. This chapter is organized in the following manner: Firstly, we will reflect on the problem that underlies the development of current electronic devices. Next, we will present the fundamental notions behind the formulation of reliability objectives, focusing in particular on the most used reliability laws (exponential and Weibull). Then, we will demonstrate how these can be formulated explicitly and simply, both for the whole of a system and for some of its components deemed as “critical”. Finally, a case study will summarize the proposed methodological approach. 5.2. Genesis and problem of reliability 5.2.1. The genesis of reliability Different aspects of reliability are taken into account during the design and development process of an electronic device. This becomes qualitative once the engineer has designed the device according to the specifications of the concept brief. It is defined once the topology of the device has been defined; its components are dimensioned with functional margins that take into account the technical characteristics as guaranteed by their manufacturers (reliability allocation).

The Objectives of Reliability

79

In general, this leads to the realization of a prototype which can be subjected to various tests its design margins and designed to characterize robustness (aggravated tests), improve its performance (functional tests) and its ability to meet the requirements of its function (qualification tests). At this stage, it is considered that the device is non-repairable, mature and a priori sufficiently reliable, although this reliability has not yet been precisely quantified. The last step is to define and quantify the so-called “predictive” reliability objectives imposed by the design brief. As will be seen, such an objective is sufficiently general and selfsufficient. It is separate from any other considerations relating to the various stages of the previous design and development processes (aggravated tests, reliability growth, qualification, RAMS, etc.). 5.2.2. Predictive problems Quantitatively estimating the reliability of a device that is not yet functional is based on a predictive approach, which is much more difficult to implement because its assigned life span can reach several decades and its corresponding level of reliability to guarantee is very high. For various reasons, information available prior to this last step is affected by uncertainties related to its technology and to its operational and environmental conditions (mission profile). The predictive process therefore involves propagating these uncertainties as a function of time up until the end-of-life moment, which will inevitably lead to an increase in final uncertainty. The definition of reliability objectives must therefore aggregate and reduce all initial uncertainties in order to arrive at a predication with a quantifiable measure of risk. This problem is part of a probabilistic context, the inevitable extension of a deterministic context that is clearly too simplistic and incapable of achieving the predicted objectives. In practice, this involves taking into account and using reliability models of a system and its components, based on statistical laws representative of the processes of failure or degradation.

80 0

Reliability off High-Power Mechatronic Systtems 1

To this efffect, we brieffly recall bellow the conccepts and nootions of relliability that will w be used. 5.3. Concepts s and notion ns of reliability In order to move away from f the general perspectiive, it is neceessary to ns. The definittion and quanttification maake additions to the above consideration off reliability tarrgets are basedd on concepts summarized below. b 5.3.1. Qualitative approach to the life fe cycle The so-calleed bathtub-shhaped tubularr curve is com mmonly employed to fe cycle of an electronic e dev vice (Figure 5..1): illustrate the life

Figure 5.1. The tubularr “Bathtub” currve

This graph represents the t qualitativ ve evolution of a unit’s rate of instantaneous failure f – notedd indifferently y λ ( t ) or h ( t ) , as a funnction of tim me or of any other o relevantt variable: for example the distance traveeled by a veehicle, the num mber of switcch maneuvers,, the charge-ddischarge of a battery, revvolutions of a bearing, etc. The correspponding tempporal trajectorry is made upp of three suuccessive peeriods: – the relattively short period p of yo outh in whichh the rate of failure deecreases as a result of thhe elimination n of the mosst fragile com mponents rem maining in a population p (stress screening g);

The Objectives of Reliability

81

– the longest useful life period in which the failure rate is virtually constant as a result of mixing a large number of mature components within the system in question; – the so-called aging period, during which the failure rate grows more or less rapidly, as a result of the degradation that affects certain components subject to operating constraints. Until relatively recently, the reliability of electronic devices was generally considered to be limited to the useful life at a constant rate. That is, the aging period was not affected provided it was not reached before the end of the estimated life cycle. Before analyzing the limits of this constant failure rate hypothesis, let us recall that there are two essential advantages linked to its simplicity: – on the one hand, in contrast to a constant failure rate, there is an average time between two successive failures: the Mean Time To Failure (MTTF), which relates to non-repairable components (this concept will be further developed later); – on the other hand, if a device has a fairly large number of components with different failure rates, its failure rate tends to a constant. Historically, this approach has been based on the creation of data banks summarizing failure rates from the feedback of many electronic components [USD 91, FID 10]. At the stage of a preliminary design, their use allows designers to carry out reliability studies oriented towards the estimation of an MTTF, a simple but imperfect indicator whose simplicity makes it easy to initiate an iterative process. As for the aging period, this was originally more concerned with the reliability of mechanical components, which are subject to wear and tear, a process that can start prematurely. In this case, it is necessary to substitute the constant rate with an increasing rate, depending on the kinetics of degradation taking place. Today, this notion of wear and tear extends to electronic devices (a fortiori mechatronics) in which certain components of reliability are obviously insufficient, and which are therefore considered as “critical”

82

Reliability of High-Power Mechatronic Systems 1

(electrolytic capacitors, batteries, power semiconductors subjected to pulse regimes, etc.) (see Chapter 6). 5.3.2. Notions of reliability In the following paragraphs, we shall limit ourselves to a few of the simplest definitions and mathematical indicators that make it possible to grasp the rest of the methodological approach. Reliability is usually based on statistico-probabilistic notions, that is, the occurrence of a failure can be defined from a probability law. We will consider the Weibull law [WEI 51], most commonly used to represent a temporally increasing failure rate, of which one form naturally degenerates into an exponential law with a constant failure rate. Note that this is far from the only existing model, but as we will see, it at least has the merit of leading to physical interpretations that are both understandable and applicable to engineers. To define ideas, first consider a population with a very high number of identical components ( N ) that cannot be repaired, and subjected to physical

constraints. Over time, we observe a number of faulty components n ( t ) ,

which leads us to define the empirical survival function S ( t ) (reliability

function) as the percentage of survivors, namely: ⎡ n (t ) ⎤ S (t ) = 1 − ⎢ ⎥ ⎣ N ⎦

[5.1]

Hypothetically, all components are sound at the moment of origin ( S ( 0 ) = 1 ) and gradually fail as a function of time, until population exhaustion: ( S ( ∞ ) → 0 ). If we assimilate the discrete process of failure moments (a common statistical approach called a “frequentist” inference) to a continuous temporal flow, this empirical survival function tends towards a probability of survival, which in the case of a Weibull law with two parameters, is written as:

The Objectives of Reliability

S w ( t ) = Pr ob (τ > t ) = exp − ( t η )

β

83

[5.2]

For a device that cannot be repaired, as in the case of a discrete approach, the survival function decreases monotonically from the initial moment to the end-of-life moment. The Weibull law, devoid of any discrepancy, has two parameters whose interpretation is as follows: – a scale parameter

(η )

sometimes referred to as a “characteristic

lifetime”: this is the time at which the probability of survival which was equal to the unit at the origin time ( t = 0 ) is equal to: S (η ) = (1 e ) ≈ 0.368 ;

– a dimensionless parameter ( β ) , which statistically characterizes the relative dispersion of the law and can be assimilated physically with a kinetic degradation if ( β > 1) . In direct relation to the problem of reliability prediction, the survival function makes it possible to define the rate of instantaneous failure, as:

hw ( t ) = −

∂ β −1 ⎡ Ln ( Sw ( t ) ) ⎤⎦ = ( β η ) ⋅ ( t η ) ∂t ⎣

[5.3]

This is interpreted as the probability that a component that has worked correctly up to the moment: ( t ) will suffer a failure in time ( t + ΔT ) , so that

( ΔT → 0) .

Statistically, this is not a probability in the strict sense of the term, that is to say, without dimension and belonging to the interval ( 0,1) (bounded by the occurrence of events both impossible or certain), but a conditional probability of failure, having the inverse dimension of time, that is: (1 t ) .

It can be seen that if ( β > 1) the failure rate is temporally increasing, which can be interpreted as a physical process of progressive degradation, corresponding to the aging period of the “bathtub” curve. Another statistical indicator often used is MTTF (Mean Time to Failure) defined as an integral average:

84

Reliability of High-Power Mechatronic Systems 1



MTTF = ∫ S ( t ) ⋅ dt

[5.4]

0

The explicit solution of this definite integral comprising the complete gamma function Γ ( x ) (or generalized factorial function) [JAH 60], is as follows: MTTFw = (η ) ⋅ Γ ⎡⎣1 + (1 β ) ⎤⎦ = (η ) ⋅ {(1 β )}

[5.5]

This is an indicator to which a probability of failure is attached depending on the parameters (η , β ) of the Weibull law being considered. It should be noted that these indicators (survival function, failure rate and MTTF) are not independent and that the knowledge of any one of these leads to that of the others, via adequate transformations. For

( β = 1)

the Weibull law defined by two parameters naturally

degenerates into an exponential law with only one scale parameter, namely:

Se (η , t ) = Sw (η ,1) = exp − ( t η )

[5.6]

Therefore, the preceding indicators derived from the Weibull law take very simple forms and are explicitly connected in the opposite form:

he ( t ) = −

∂ ⎡ Ln ( Se ( t ) ) ⎤⎦ = (1 η ) ∂t ⎣

MTTFe = (η ) ≡ ⎡⎣1 he ( t ) ⎤⎦

[5.7] [5.8]

As mentioned above, the extreme popularity of the exponential law is historical: it is based on the hypothesis of the constant rate, which naturally corresponds to electronic devices in terms of the long useful life portion of the “bathtub” curve. Note that if the time flux of failures is governed by a homogeneous Poisson process with a rate of ( he ) , the distribution of the time interval between two successive events is an exponential law with the scale parameter (η ) .

The Objectives of Reliability

85

While the Weibull law corresponds to a nonhomogeneous process whose rate depends on time, the link between exponential law and Poisson's homogeneous flux is often interpreted in terms of random failures, given the absence of memory of the law and its corresponding process [THO 88]. As will be seen, the Weibull (increasing failure rate) and exponential (constant failure rate) laws can be applied to components and systems, respectively. Applied to degradable components, ( MTTFe ) is an indicator which, because of its integral nature, is not necessarily very informative since its power for discrimination is rather low. We can verify this later through an example based on a coherent comparison between exponential and Weibull laws that are defined by the same lifetime characteristic. On the other hand, for a system, it is a simple indicator that can be used in the first approach. 5.4. Components and system Today’s electronic devices incorporate increasingly numerous and diverse functionalities. A logical approach to reliability prediction is not to consider the system from the outset, but first and foremost in terms of its constituent components. This is contrary to the approach used in the field of RAMS, which is essentially at the systems level and considers that the failure rates of the components are known a priori. In terms of structures, the topology of the technological device makes it possible to generate a topology of a different nature (reliability network or failure tree). This approach, which makes it possible to precisely estimate the reliability of a system, requires the use of simulation software [BAR 75]. In fact, it will be seen that it is possible, as a first approximation, to approach the question in a much more direct way, considering that all the elementary components intervene within a series topological structure comparable to the “weak link” notion. In the case of constant elementary failure rates, this approach makes it possible to calculate an upper limit boundary for the overall failure rate through the addition of elementary rates. A more general asymptotic property has been formulated by [DRE 60].

86

Reliability of High-Power Mechatronic Systems 1

The network topology equivalent to the real network is based on two successive levels of complexity: – that of the elementary components: considered in the usual sense of the term in electronics, that is to say, technologically non-breakable and whose failure rates are inherently weak and generally well established on the basis of sufficient feedback: passive components (resistors, capacitors, etc.) and active components (diodes, transistors, integrated circuits, memories, etc.); – that of the system: integrating various elementary components within a topological structure whose objective is to ensure the function assigned to the device. These two levels are the subject of specific approaches, each corresponding to their very different natures. Among the elementary components, some of them, those subject to degradation, are called “critical” and their model of generic reliability obeys a Weibull law (or if necessary any other bi- or even tri-parametric). As for the system, its modeling is due to combinatorial aspects induced by its topological organization and, if its size is sufficient (inclusion of a large number of “non-critical” elementary components), its generic reliability model tends to an exponential law with a constant rate [DRE 60]. These distinctions may lead to different reliability indicators for each of these two entities. 5.5. Objectives of reliability Reliability is predictive in the sense that the probability of failure of a component results from the exploitation of increasingly complex models, built on physico-mathematical bases: statistical (independent of time and based on sufficient numbers), probabilistic (the number of samples available that can be reduced to a single unit), or even stochastic models (temporal evolutionary processes). Under these conditions, we will always be confronted with more or less dispersed estimators, depending on the structural representativeness of the model and its parametric uncertainties.

The Objectives of Reliability

87

In order to be able to express the reliability guarantee of a component or a system in an exploitable form, it is necessary to define and quantify a set of more general contractual specifications in advance. 5.5.1. Functional characteristics These characteristics, which are specific to the device under consideration, will make it possible to quantify the reliability objectives, taking into account its structure and the performances assigned to it. These essentially are: – the lifetime assigned to the device (or possibly a lower one of a “critical” component that could be repaired one or more times), defined by the fulfilment of an adequate criterion: technical failure, economic profitability, obsolescence, etc.; – the mission profile of the device, as defined by a set of physical, environmental (generally climatic) and functional (performance) constraints relating to the whole device and/or its “critical” components; – the reliability model of the device, resulting from a combination of the reliability models of its components, characterized by a constant failure rate (whole system), and/or by an increasing failure rate (kinetics of degradation of “critical” components). 5.5.2. Objectives of guaranteed reliability Reliability objectives can be formulated independent of the reliability model envisaged at the system level, whether simple (elementary components and “critical” equivalents at a constant rate) or composite (association of elementary components at a constant rate and “critical” degradables at increasing rates). Considering the Weibull and exponential models mentioned above, it is necessary to specify that: – a failure criterion: the occurrence of a sudden technological failure, purely random (in the case of a constant failure rate), or the crossing of a level of degradation or parametric drift (increasing failure rate), any defaults deemed to be functionally inadmissible;

88

Reliability of High-Power Mechatronic Systems 1

– a minima an MTTF that could have a confidence interval or a one-sided boundary (of minimal value), given that regardless of the reliability model being considered, the various indicators (MTTF, failure rate, probability of failure) are univocally related quantities (the knowledge of any one of them leads to that of the other two); – ideally, the probability of failure at the end of the lifetime (or possibly at the end of a period determined by the period of replacement for a “critical” component); note that for sufficiently reliable devices whose reliability must be guaranteed for a long service life, it is preferable to consider the probability of failure, equal to the complement of the survival function: for example, if ( S = 0.999 ), it is preferable to consider its complement to the unit, that is, a probability of failure ( P = 1 − S = 10 −3 ); – optionally, a one-sided boundary of the confidence interval associated with this probability; this terminal is a measure of the risk of exceeding that which has been decided, given that for a long lifetime device, the probability of failure is affected by a variability which increases as a function of the extent of the prediction interval. 5.6. Adequacy of specifications 5.6.1. Current limitations In practice, it is frequently observed that reliability targets for devices known to be reliable (electronic or mechatronic) are not always formulated with the necessary rigor. Thus, life-cycle duration and final failure probability are not always specified and inconsistencies sometimes appear between several time-domain formulations, such as MTTF and lifetime, in the absence of an indication for the probability of failure that should be associated with it. Unfortunately, there is still confusion over the misuse of Mean Time Between Failures (MTBF) instead of Mean Time To Failure (MTTF). The MTTF is the one that is of interest in the case of non-repairable systems, whereas the MTBF implies extending the notion of reliability to that of availability, taking into account the repair times. In the case of electronic devices, a modular design makes it possible to neglect the periodic replacement time of “critical” components that have a

The Objectives of Reliability

89

significantly lower expected lifetime than that assigned to the device as a whole. Under these conditions, the reliability of the system can be considered independently of its “critical” components, which avoids the need to undertake an availability study whose framework exceeds that of reliability. It should be noted that the concept of availability, more particularly used in the case of industrial processes which are subjected over the course of their life cycle to numerous successive maintenance operations, is irrelevant in the case of a highly reliable and long-lasting electronic device. In the field of application that concerns us, it is therefore preferable to ban the use of the term MTBF. At the preliminary design stage, ( MTTFe ) is probably the most intuitive indicator, its simple formulation corresponding to the exponential law with a constant failure rate. In the case of degradable components, ( MTTFw ) corresponds to Weibull law with an increasing failure rate, but its formulation is less simple because it requires a priori knowledge of the two characteristic parameters of this law. However, as will be seen below, the substitution of ( MTTFw ) with ( MTTFe ) implies certain precautions, the relevance of the latter being conditioned by the portion of lifetime being considered. 5.6.2. Relevance of the MTTF The MTTF (Mean Time To Failure) characterizes the average survival function of a non-repairable component. We show, on the basis of a quantitative example, that ( MTTFe ) is an irrelevant integral measure, given its weak discriminatory power with respect to ( MTTFw ). To do this, let us compare two components, both defined by their survival functions, failure rate and MTTF. One is a degradable component obeying a Weibull law (thus with a timeincreasing failure rate), the other a component obeying an exponential law (thus a constant failure rate).

90

Reliability of High-Power Mechatronic Systems 1

In order to simplify the comparison, the influence of the scale parameters is set free by setting their characteristic lifetime to the same unit value, that is to say, (ηw = ηe ≡ 1) . In this example, the shape parameter chosen for the Weibull law, that is

( β = 2) , corresponds to a linearly increasing failure rate.

By way of comparison, the table below combines the values of the selected parameters as well as those of the MTTF and the corresponding survival probabilities.

( LAWS )

(η ) ( β )

(Weibull )

(1) ( 2 ) ( t )1

( exp onential ) (1) (1)

h (t )

(1)

( MTTF )

(

S ( MTTF )

)

−1 π 2 ≈ 0.886 ( e ≈ 0.368 )

(1.000)

(e

−1

≈ 0.368 )

S (η 2 = 0.5)

(e

−0.25

(e

−0.5

S ( 2η = 2 )

≈ 0.779 )

(e

−4

≈ 0.018 )

≈ 0.607 )

(e

−2

≈ 0.135 )

The probabilities of survival and the MTTF for these two reliability laws can be compared as follows: – with a timeframe equal to the characteristic lifetime, their probabilities of survival are identical (about 37%); – with a timeframe equal to half of the characteristic duration of life, the probability of survival of the non-degradable component (approximately 61%) is well below (–22%) the degradable component (approximately 78%); – on the other hand, with a timeframe equal to double the characteristic lifespan, the probability of survival of the non-degradable component (about 14%) is significantly higher than that of the degradable component (about 2%); – their MTTFs are far from revealing these disparities, since in relative values they do not differ as clearly (1.00 & 0.89), their ratio being close to unity (approximately ( 0.94 ± 6% ) . In conclusion, although the MTTFs remain comparable in the vicinity of the characteristic lifetime, as soon as one moves away from these, the probabilities of survival differ noticeably, all the more so as the observation

The Objectives of Reliability

91

horizon increases. The probabilities of failure are equal to the complementary units’ probabilities of survival; their ratio is even more accentuated (of the order of 7.5 to double the characteristic lifespan!). 5.7. Methodological approach With the aim of more precisely summarizing the above considerations, we shall use an example to simply demonstrate how to separate the study of reliability predictions: one relative to the system (based on an exponential law with a constant failure rate); the other relating to a “critical” component (based on a Weibull law with an increasing failure rate). In order to ensure that a common reliability objective is satisfied, both entities are subject to the same probability of failure, at the end of the expected lifetime (for the case of the system) and at the end of the replacement period (for the case of the “critical” component). A policy of periodic replacement with a new preventative component intervenes when the probability of failure reaches a critical threshold. This determines the replacement period that meets this requirement. In general, this replacement periodicity constitutes important information in terms of the life cycle management of the device. This makes it possible to determine the optimum size of the replacement stock associated with the device on a technico-economic basis: a compromise between the cost of the stock and the probability of it breaking. In addition, the procedure for estimating the failure rate of an essentially deterministic device can be improved by introducing a statistical variability which gives it the status of a random variable [PIE 16]. 5.7.1. Formulation of the example problem Let us consider an electronic device as a card, with a rather large number of very reliable small components, as well as a filtering capacitor for electrolytic technology, assimilated to a “critical” component. The latter, which is far less reliable, will have to be replaced periodically with a new component. The calendar lifetime assigned to the device is equal to 10 years.

92

Reliability of High-Power Mechatronic Systems 1

The operating rate of the device is 86%, which corresponds to an effective lifetime of (T = 75336h ) . We want to estimate the probability of failure at the end of life for all the components implanted on the card (except the “critical” component). For the sake of consistency, this “critical” component is required to have the same probability of failure after a replacement period that will be necessary to determine. These two types of components will therefore be considered separately, with an acceptable separation hypothesis making it possible to simplify the procedure. As an alternative, improving the estimation of the failure rate by associating it with a measure of statistical variability will be proposed. 5.7.2. All of the on-board components The ( MTTFe ) of the system is the result of the combination of the failure rates relative to its constituent components. If these are characterized by constant failure rates, assimilating the real topology of the system with a simple serial topology allows one to estimate its failure rate by using the additivity of elementary component failure rates. Thus, we obtain an upper boundary of the system failure rate whose deviation from the exact (unknown) value can only be calculated if one takes into account the topology of the real system. However, it will be seen how to improve this estimate by considering the variability of elementary failure rates. For all the small passive components (resistors, capacitors) and active components (transistors, integrated circuits, memories, etc.), there are databases derived from empirical feedback, allowing one to estimate their supposed constant failure rates [USD 91, FID 10]. The simple additivity hypothesis of constant elementary rates results in a global rate which is consistent with the asymptotic tendency which results

The Objectives of Reliability

93

from mixing in a sufficient number of different Weibull (and/or exponential) laws [DRE 60]. In the context of this example, the addition of the elementary failure rates results in a constant rate of ( h ≈ 115 ⋅ 10 −9 / h ) . The probability of failure in terms of the exponential law at the time of the end of life is written as:

P ( h, T ) = 1 − exp− ( h ⋅ T ) ≈ ( h ⋅ T )

[5.9]

The approximation being justified on the basis of a limited expansion of the exponential law, given the low value of the argument is ( h ⋅ T ≈ 1% 1.2V for EOT = 0.9 nm (Equivalent Oxide Thickness) [ELI 13]. For oxide thicknesses of less than 5nm, HBD is often preceded by SBD (Soft-BD). An SBD is observed as a partial loss of dielectric properties, resulting in a slight increase in the gate current and a significant increase in the gate current noise. Finally, in the ultra-thin oxides (about 2.5 nm), the SBD is followed by a gradual loss of the “Progressive-SBD” (PBD) dielectric

Simulation of Degradation Phenomena in Semiconductor Components

165

properties up to the final HBD. The PBD is detected as a slow increase in gate current over time. It has been shown that the degradation process before the BD [MAR 07] and the location of the BD spot [FER 07] varies widely from one transistor to another of the same size, and this has a strong influence on the degradation effect of the transistors, hence the difficulty of modeling the TDDB [KAC 02]. The implementation of this degradation mechanism in reliability simulators makes it possible to warn the designer of the risks of oxide breakdown in the circuit during the design phase according to the stress being applied. The simulation of the TDDB is not quantitative but probabilistic. 8.2.2. Degradation in bipolar transistors Several reliability studies have been carried out in order to discover the degradation modes of these components, in particular the physical origin of the degradation mechanisms. The impact of the degradation on bipolar transistors for the performance of Radio Frequency (RF) and millimetric (mmW) circuits remains little known. In this section we present a reminder of the physics of degradation phenomena best known in bipolar transistors and which are mainly either: – the mechanism of Mixed Mode Degradation (MMD); – the degradation mechanism of Reverse Base-Emitter Bias (RVBE). 8.2.2.1. The principle of operation for a bipolar transistor A bipolar transistor consists of two PN junctions positioned head-to-tail sharing a common region that is the base. The juxtaposition of these two junctions leads to an NPN or PNP junction transistor in which the two types of carriers intervene. A representation of a bipolar transistor is shown in Figure 8.19 below. The bipolar transistor is a device that is traversed by a vertical current “carried” by voluminal charges and controlled by its base voltage. The main current is controlled by the Base-Emitter junction. This depends on the electron gradient in the base. In an ideal transistor, the current should not vary when the polarization of the Base-Collector junction varies. In order to

16 66

Reliability of o High-Power Mechatronic M Sys stems 1

guuarantee this effect, e it is neccessary to enssure a base dooping greater tthan that off the collector..

Figure 8.19. Siimplified diagrram of a conve entional bipola ar transistor [M MOV 18]

The bipolar transistor cann operate in: – direct (norrmal) operatioon: where the base-emitter junction j is livve (Vbe > 0) and the base--collector juncction is inverteed (Vbc 0); – saturated operation: o whhere both the Base-Emitter B j junction and tthe BaseCoollector junctiion are live (V Vbe > 0 and Vbc > 0); – blocked operation: wheere both the Base-Emitter B juunction and thhe BaseCoollector junctiion are inverteed (Vbe < 0 and Vbc < 0). In order to achieve a better performance in very high frequency f appplications, thee bipolar trannsistors are suubjected to po olarization connditions such that the coollector-emitterr voltage is cloose to or higheer than the breeakdown voltaage in the oppen base BVceoo. 8.2.2.2. MMD degradation 8.2.2.2.1. The phenomeno on of MMD degradation Degradationn in mixed mode m [ZHA 02] occurs when the ddevice is sim multaneously subjected to a high collectorr-base voltage Vcb and a high current IC.. This mode of o polarizationn is similar to o the live modde of operatioon of the traansistor except that the colleector-emitter junction j Vce iss polarized beeyond the BV Vceo which reppresents the brreakdown volttage of the em mitter-collectorr junction

Simulation of Degradation Phenomena in Semiconductor Components

167

in the open base. At high voltage Vcb, electrons from the emitter through the base receive sufficient energy to create an avalanche in the base-collector junction.

Figure 8.20. Avalanche and hole creation due to mixed mode degradation in SiGe HBT transistors

8.2.2.2.2. The MMD degradation model in the simulator We present here the model for MMD degradation of bipolar transistors implemented in the simulation reliability tool. This degradation model [8.8] represents the lifetime as a function of the stress-collector stress Vbc_stress, the current density of the collector IC, and temperature.

168

Reliability of High-Power Mechatronic Systems 1

α τ lifetime = A. e

µVbe _ read

.e

Vcb _ stress

−β

.| JE |

Ea

− P .( E )−γ . e KT AE

[8.8]

with: – τlifetime: the lifetime of the component according to a degradation criterion; – Vbe,read: the base-emitter voltage during the characterization phase; – VCb,stress: the base-collector tension during stress; – I E: the current density of the transmitter; – PE/AE: the effective perimeter of the transmitter/effective area of the transmitter; – A, µ, α, β et γ: process-dependent constants which are also determined from the accelerated aging tests. 8.2.2.2.3. The effect of MMD degradation on the main electrical parameters of the NPN The MMD degradation of a bipolar transistor generates an increase in the base current due to an increase in interface traps in the base-emitter Spacer oxide and in the oxide region of the deep trench of the Shallow-Trench Isolation (STI) (Figure 8.20). This degradation is easily observed on the “Gummel” curve plotted before and after the application of a 10-year stress period. The model of the MMD degradation of the base current Ib is presented by [8.9] below:

Δlb = Fstat ( AE + PE ) ⋅ t

n stress

⋅e

−α Vcb _ stress

b

⎛ (VK beT _ stress)/q ⎞ ⎛ mV(KbeT_ read)/q ⎞ m(K .−ΔTEa )/q 2 stress 2 read 2 read ⋅⎜e − 1⎟ ⋅ ⎜ e − 1⎟ ⋅ e ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ with: – t: the duration of the stress, – α, b and n: constants that depend on the process.

[8.9]

Simulation of Degradation Phenomena in Semiconductor Components

169

I c before stress I c after stress

i b after stress

I b before stress

Figure 8.21. The current of the collector Ic and the base current Ib before and after a MMD stress (W = 0.3 μm, L = 1 μm, number of emitters = 2, Vcb_stress = 1.2V, stress temperature = 25°C)

In general, the collector current IC does not degrade because of the MMD mechanism, whereas Ib increases and consequently the current gain β decreases. The relative drift of β is thus deduced by the following equation:

Δβ

β

=

Δ Iβ Iβ + Δ Iβ

[8.10]

– Effect of collector-base breakdown voltage Vcb_stress: Figure 8.22 shows the simulation results for an increase in the current Ib and the degradation of the current gain after a stress duration of 10 years as a function of the collector-base breakdown voltage Vcb. As can be seen from the curves, the simulation results obtained using the reliability simulator are in good agreement with the degradation models. It is clear that the degradation of the transistor’s performance increases with the stress voltage Vcb_stress and that this degradation becomes very significant for values between 1V and 1.5V. For a stress of 1.5V the gain in current is totally degraded. In order to avoid the influence from the degradation of this transistor’s performance onto the reliability of the circuits, it is essential to operate it at Vcb values which do not exceed 1V.

170

Reliability of High-Power Mechatronic Systems 1

Figure 8.22. Degradation of the current Ib and the gain β of the transistor after a MMD stress as a function of the voltage Vcb_stress (W = 0.3μm, L = 1μm, number of emitters = 2, Vbe_stress = 0.7V, stress temperature = 40°C )

– The effect of temperature: In this section, we study the effect of temperature stress on the degradation of Ib and β.

Figure 8.23. Degradation of the current Ib and the gain β of the transistor after a MMD stress as a function of the stress temperature (W=0.3µm, L = 1µm, number of transmitters = 2, VCB_stress = 1.2V, Vbe_stress = 0.7V)

As in almost all degradation mechanisms, temperature stress has a very significant influence on the drift of the transistors parameters. We note that at

Simulation of Degradation Phenomena in Semiconductor Components

171

25°C, the MMD produces a gain degradation of 67%, but by increasing the temperature to 100°C., the degradation reaches 100%. – Influence by the transmitter geometry: Figure 8.24 is an illustration of the drift of Ib and of β after aging as a function of the emitter surface. We conclude, in a first step, that the gain degradation does not depend on the transistor geometry. Moreover, the increase in ∆Ib after aging as a function of the emitter surface is due to the increase in the base current before aging as a function of the emitter’s perimeter. This increase is related to extensions of depletion zones.

Figure 8.24. Degradation of the current Ib and the gain β of the transistor after a stress type MMD as a function of the emitter surface (Vcb_stress = 1.2V, Vbe_stress = 0.7V, stress temperature = 40°C)

8.2.2.3. RVBE degradation 8.2.2.3.1. The phenomenon of RVBE degradation As its name indicates, RVBE degradation occurs when the device is subjected to a high reverse base-emitter voltage. The effect is attributed to hot carriers [PET 15, REV 01] in the BaseEmitter depletion region, which create interface traps at the junction edges in the Spacer zone, as shown in Figure 8.26. These traps, in turn, lead to an additional base current. The generating-recombination current (G-R) in the base of the transistor increases, leading to an increase in the base current Ib with a low Vbe value and a decrease in the current gain β.

172

Reliability of High-Power Mechatronic Systems 1

In general, the collector current Ic does not degrade by the RVBE mechanism, whereas Ib increases and consequently the current gain β decreases. Due to the specific dependence of the voltage Vbe and the base current Ib, this degradation of the current gain β is mainly considered at low base-emitter voltages Vbe. Recognized as a fundamental problem of reliability, this mode of degradation has been extensively studied and modeled over the past 20 years.

Figure 8.25. Mechanism of RVBE degradation in bipolar transistors: hot carrier injection and creation of interface traps

8.2.2.3.2 The RVBE degradation model in the simulator We present the RVBE degradation model in bipolar transistors implemented in the reliability simulation tool. This degradation model [8.11] represents the lifetime as a function of the base-emitter voltage stress Veb_stress.

α τ lifetime = A. e

µVbe _ read

.e V

eb _ stress

.(

PE −γ ) AE

[8.11]

with: – Τlifetime: the lifetime according to a well-defined degradation criterion; – Veb_stress: the base-emitter voltage stress;

Simulation of Degradation Phenomena in Semiconductor Components

173

– PE /AE: the effective perimeter on the effective area of the transmitter; – A, µ, α et γ: process-dependent constants which are also determined from the accelerated aging tests. 8.2.2.3.3. The effect of RVBE degradation on the main electrical parameters of NPN The degradation of transistor performance by the RVBE mechanism is easily observed on the “Gummel” curve. Figure 8.26 represents the plot of the curve before and after a 10-year period of RVBE type stress obtained using the reliability simulator. The RVBE degradation has only a direct effect on the base current Ib at low Vbe. There are different models of RVBE degradation, we present through [8.12] that which is taken into account by the simulator.

Δlb = Fstat .PE .t

n stress

.e

−α Vcb _ stress

Ea ⎛ − K2Tstress / q . ⎜ (1− c) + c. e ⎜ ⎝

read 2 ⎞ ⎛ m .(VKbe2T_read ⎞ m .( K2 −. ΔVTread )/ q )/ q ⎟⋅⎜e − 1⎟ . e ⎟ ⎟ ⎜ ⎠ ⎠ ⎝

[8.12]

with: – t: the duration of stress; – A, c and n: constants that depend on the process.

Ic before stress Ic after stress

ib after stress

Ib before stress

Figure 8.26. The currents Ic and Ib before and after an RVBE stress (W = 0.3μm, L = 1μm, number of emitters = 2, Vbe_stress = -1.7V, stress temperature = 25°C)

174

Reliability of High-Power Mechatronic Systems 1

The degradation of Ib follows the model proposed by Burnett and Hu [BUR 17]. It describes the evolution of excess base current as a power law of stress time. As mentioned above, the excess of the base current is a consequence of the generation-recombination of current which occurs at the Si-SiO 2 interface. We have previously stated that the inverse voltage Vbe does not generate the degradation of the collector current Ic. The degradation of the current gain is thus a consequence of the increase in the base current Ib. The relative drift of β is thus calculated from this equation:

Δβ

β

=

Δ Iβ Iβ + Δ Iβ

[8.13]

– The effect of reverse base-emitter stress voltage Vbe_stress: Figure 8.27 shows the simulation results of an increase in the current Ib and the degradation of the gain in current β after a stress duration of 10 years as a function of the reverse base-emitter stress voltage Vbe_stress.

Figure 8.27. Degradation of the current Ib and the gain β of the transistor after a RVBE stress as a function of the reverse base-emitter voltage Veb_stress (W = 0.3μm, L = 1μm, number of emitters = 2, stress temperature = 40°C)

Simulation of Degradation Phenomena in Semiconductor Components

175

One notes on the curves that the simulation results obtained using the reliability simulator are consistent with the degradation models. We clearly see that the degradation of Ib and β increases as a function of the reverse voltage Vbe and that the degradation becomes significant for values between1.5V and -2.3V. To avoid the influence of degradation of this transistor’s performance on the reliability of the circuits, it is necessary to avoid using it in this voltage range in an inverse polarization. – The effect of temperature: We note from Figure 8.28 that the degradation of β is less significant at high temperatures than at low temperatures, unlike other degradation mechanisms.

Figure 8.28. Degradation of the current Ib and the gain β of the transistor after a RVBE type stress as a function of the stress temperature (W = 0.3μm, L = 1μm, number of emitters = 2, Vbe_stress = -2V)

– Influence by the transmitter geometry: Figure 8.29 is an illustration of the drift of Ib and of β after aging as a function of the emitter surface. Here again, we conclude that the geometry of the bipolar transistors has no influence on the degradation of the gain β.

176

Reliability of High-Power Mechatronic Systems 1

Figure 8.29. Degradation of the current Ib and the gain β of the transistor after an RVBE type stress as a function of the emitter surface of the transistor (Vbe_stress = -2V, stress temperature = 40°C)

8.2.3. Conclusion In this section, we conducted a general study on the various degradation mechanisms of MOS and bipolar transistors. The results presented are obtained using a reliability simulator that integrates the different models of degradation mechanisms. These results highlight the degradation of component performances such as drain current, threshold voltage of MOS transistors as well as the base current and current gain of bipolar transistors. As mentioned in the introduction to this chapter, the study on the degradation of the electrical parameters of components according to the geometry and nature of the stress is very important in defining the limits of a technology’s use. This investigation makes it possible to design circuits which are both efficient and reliable. In this chapter, we have only presented the degradation of the static parameters of components. However, in the case of RF and millimetric applications, it is essential to pay particular attention to the degradation of the transistors’ dynamic parameters, such as the evolution of the transition frequency Ft and the S parameters. These parameters play an essential role in the performance of these circuits.

Simulation of Degradation Phenomena in Semiconductor Components

177

8.3. Study on the degradation of a ring oscillator 8.3.1. Introduction In this section, we evaluate the degradation of a ring oscillator to illustrate reliability at the level of the circuit. This study relates to the degradation of the oscillation frequency under the impact of degradation by MOS transistors. Firstly, we shall present the circuit under study. Next, we shall define the stresses to be applied to the active components in order to attain the desired levels of degradation. And finally, we shall expose the results on reliability of the circuit being tested. 8.3.2. Presentation of the oscillator In many digital systems the ring oscillator is an essential element in evaluating the technology in question. The ring oscillator studied in this chapter is shown in Figure 8.30. The circuit is designed with CMOS technology. The oscillation frequency is 80MHz with a supply voltage of 1.8V. This is then divided by 128 using seven serial D flip-flops.

Figure 8.30. Structure of the ring oscillator

The oscillator is composed of 40 “OR” ports in series looped on a “NAND” port. The circuit is activated and deactivated by means of a control input “enb (B)”. In this example, we are interested in the study of the degradation of the circuit oscillation frequency Fosc. The latter is inversely proportional to the

178

Reliability of High-Power Mechatronic Systems 1

propagation time Tp, defined by the propagation delay in the chain of “OR” ports:

Fosc =

1 40* T p

[8.14]

We define ∆Fosc% as the percentage degradation of the circuit oscillation frequency:

F (before _ stress ) − Fosc (after _ stress ) ΔFosc% = osc *100 Fosc (before _ stress )

[8.15]

To do this, the circuit is subjected to two different types of stress: – for the first stress, the oscillator is in standby mode (Venb = 0); – for the second stress, the oscillator is in ringing mode Venb = 1). Since the circuit is designed with CMOS technology, its performance is altered by the HCI and BTI degradation mechanisms. 8.3.3. Study on the aging of the circuit according to these modes of operation 8.3.3.1. First mode: standby mode 8.3.3.1.1. The stress and extraction conditions of the oscillation frequency Fosc Initially, reliability simulations for different stress times (168, 336, 504… 840 hours) were performed on the circuit in standby mode. These stress conditions are presented in Table 8.1. Venb (V)

T (°C)

VDD (V)

Conditions of Stress

0

150

1.8

Conditions of Characterization

1.8

150

1.8

Table 8.1. Stress Conditions and Frequency Measurement Fosc

Simulation of Degradation Phenomena in Semiconductor Components

179

To validate the simulation results, we also conducted experimental tests on the circuit. The oscillator also underwent stress called High-TemperatureOperating-Life (HTOL) for 840 hours under the same conditions as the simulations. After every 168 hours of stress, the circuit is activated to measure the oscillation frequency Fosc. This measurement is carried out at the same temperature as that used during the stress (150°C.) in order to avoid the phenomena of resilience. 8.3.3.1.2. Presentation and discussion of the results Figure 8.31 shows the results for the degradation of the Fosc frequency at the output of the divider.

Figure 8.31. Degradation of the oscillation frequency Fosc after stress in standby mode

As we can see from the curves (Figure 8.31), the measurement results are consistent with the simulations carried out using the simulator. After 168 hours of stress, the frequency of the oscillator decreased by about 1.75%. For higher stress times, the frequency deteriorated gradually by a decrease of 2.25% from its initial value after 840 hours of stress. Furthermore, the drift of the frequency corresponds to an increase in the propagation time in the chain of the “OR” ports.

180

Reliability of High-Power Mechatronic Systems 1

The reliability simulator is able to identify most of the devices susceptible to degradation which have led to a decrease in the oscillation frequency of the circuit. In Figures 8.32 (a) and (b), the transistors reported by the simulator are shown to be major contributors to the degradation of the performance under the aforementioned stress conditions.

Figure 8.32. The degraded transistors of the “NAND (a)” door and the “OR (b)” port during stress in standby mode

Increasing the threshold voltage of the PMOS (MP3) transistors in the “OR” ports under the NBTI constraint is the main cause for the decrease in frequency. The increase in the threshold voltage of the MP1 and MP2 transistors of the “NAND” port also has a lesser effect on the degradation of the frequency. Moreover, the NMOS transistors are not degraded since there is no current flowing in the circuit in standby mode. Degradation of NMOS by hot carriers occurs only if there is electron circulation within the transistors. 8.3.3.2. Second mode: ringing mode 8.3.3.2.1. The stress and extraction conditions of the oscillation frequency Fosc In this section, the circuit is placed under stress in activated mode. The oscillator is thus subjected to stress at a high temperature in order to accelerate the degradation. The stress conditions as well as the characterization conditions are presented in Table 8.2.

Simulation of Degradation Phenomena in Semiconductor Components

181

For this mode of stress, only the simulation results are presented in this chapter, as we have not performed accelerated aging tests for this mode. Venb (V)

T (°C)

VDD (V)

Conditions of stress

0

150

1.8

Conditions of characterization

1.8

150

1.8

Table 8.2. Stress Conditions and Frequency Measurement Fosc

8.3.3.2.2. Presentation and discussion of the results Figure 8.33 shows the results of the degradation of the Fosc frequency at the output of the divider after stress in the activated mode.

Figure 8.33. Degradation of oscillation frequency Fosc after stress in the activated mode

The oscillation frequency Fosc decreases by about 2.3% after 168 hours of stress. For higher stress times, the frequency degrades gradually, a decrease of 3.1% of its initial value after 840 hours of stress. It is clear that the degradation after stress in the activated mode is slightly higher than that in standby mode. The degradation of the circuit performance is due to the increase in the threshold voltage of the PMOS transistors linked to the NBTI degradation mechanism and

182

Reliability of High-Power Mechatronic Systems 1

to the increase of those of the NMOS transistors, due to the impact of the hot carriers during the dynamic switching periods. To identify components sensitive to degradation, we report two situations of polarization on the circuit transistors, which depend on the looped signal at the input of the “NAND” control port: – the input loop signal changes from 1 to 0, and is held at 0 until the next feedback; – the input loop signal changes from 0 to 1 and is held at 1 until the next feedback.

Figure 8.34. The degraded transistors of the “NAND (a)” port and the “OR (b)” port under stress in activated mode

Figure 8.35. The degraded transistors of the “NAND (a)” port and the “OR (b)” port under stress in activated mode

Simulation of Degradation Phenomena in Semiconductor Components

183

In summary, the degradation of the circuit oscillation frequency is due, on the one hand, to the degradation of the NMOS transistors during the dynamic switching periods, and on the other hand, to the degradation of the PMOS transistors by the NBTI degradation mechanism which manifests as soon as a negative voltage Vgs is present (Figure 8.36). Nevertheless, it should be mentioned that simulation results of circuit degradation under activated-mode stress would be more accurate if the effect of MPOS degradation by hot carriers (HCI) were taken into account by the simulator. This effect occurs during dynamic switching periods and has a significant contribution to the reliability of the oscillator. For a reliability simulator to be as accurate as possible, it must encompass all models of degradation mechanisms that occur within the components. Degradation models must also be accurate.

Figure 8.36. Different degradation mechanisms that cause a degradation of the ring oscillator during stress in activated mode

To conclude, after analyzing the degradation of the oscillation frequency, the designer must make modifications to his circuit in order to reduce the drift of the frequency as a function of time while respecting the specifications of the design brief concerning the performance of the circuit. 8.4. Conclusion In this chapter, we have highlighted the physical phenomena allowing for an overall understanding about the impact of aging transistors on the

184

Reliability of High-Power Mechatronic Systems 1

performances of integrated circuits and the interest in integrating the reliability study of circuits during the design phase using dedicated tools. The progress of integration in microelectronics leads to a weakening of the robustness of circuits, and consequently aggravates the effects of unreliability. This has forced designers to allow greater empirical margins during the design phase in order to ensure that the circuit operates properly over time. One of the solutions to optimize circuit design while at the same time guaranteeing reliability to take them into account at the design stage. This level of flow in development allows for an optimization that cannot otherwise be obtained by the current methodology. This methodology can be transposed onto any design system constructed from unit elements whose mechanisms of degradation over time we know. 8.5. Bibliography [ABA 04] ABADEER W., TONTI W., “Bias temperature reliability of N+ and P+ polysilicon gated NMOSFETs and PMOSFETs”, International Reliability Physics Symposium Proceedings, pp. 147–149, 1993. [BUR 88] BURNETT J.D., HU C., “Modeling hot-carrier effects in polysilicon emitter bipolar transistors,” IEEE Trans. Electron Devices, vol. 35, no. 12, pp. 2238– 2244, Dec. 1988. [DEN 05] DENAIS M., HUARD V., PARTHASARATHY C. et al., “New perspectives on NBTI in Advanced technologies: modelling and characterization”, European Solid-State Device Research Conference, Grenoble, 2005. [DEN 06] DENAIS M., HUARD V. et al., “On-the-fly characterization of NBTI in ultrathin gate oxide PMOSFET’s”, IEDM, pp. 109–112, 2006. [MAR 13] MARICAU E., GIELE G., Analog IC Reliability in Nanometer CMOS, Springer, Berlin, 2013. [FER 07] FERNÁNDEZ R., RODRÍGUEZ R, NAFRÍA M. et al., “MOSFET output characteristics after oxide breakdown”, Microelectron. Eng., vol. 84, no. 1, pp. 31–36, 2007. [GUE 09] GUERIN C., HUARD V., BRAVAIX A., “General framework about defect creation at the Si/SiO2 interface”, Journal of Applied Physics, vol. 105, p. 114513, 2009. [HU 92] HU C., “IC Reliability Simulation”, IEEE Journal of Solid-State Circuits, vol. 27, pp. 241–246, 1992.

Simulation of Degradation Phenomena in Semiconductor Components

185

[HU 02] HU C. et al., “Hot-Electron-Induced MOSFET Degradation-Model, Monitor, and improvement”, IEEE Trans. On Electron Dev (1985), vol. 32 pp. 375–385, 2002. [KAM 98] KAMOHARA S., PARK D., HU C., “Deep-trap SILC (stress induced leakage current) model for nominal and weak oxides”, International Reliability Physics Symposium, pp. 57–61, April 1998. [KAC 02] KACZER B., DEGRAEVE R., RASRAS M. et al., “Analysis and modeling of a digital CMOS circuit operation and reliability after gate oxide breakdown: A case study”, Microelectron. Reliab., vol. 42, nos. 4–5, pp. 555–564, 2002. [MOV 18] MOVGP0, available at: https://commons.wikimedia.org/wiki/File:Bipolar_ Junction_Transistor_PNP_Structure_integrated_lateral.png”, 2006. [MUL 86] MULLER K., Device electronics for IC, 2nd ed., Wiley, New York, 1986. [MAR 07] MARTÍN-MARTÍNEZ J. et al., “Lifetime estimation of analog circuits from the electrical characteristics of stressed MOSFETs”, Microelectron. Reliab., vol. 47, nos. 9–11, pp. 1349–1352, 2007. [PET 15] PETERSEN S.A., LI G.P., “Hot carriers effects in advanced self-aligned bipolar transistors,” IEDM Tech. Dig., vol. 1985, pp. 22–25, 2015. [REV 01] REVIL N., GARROS X., “Hot-carrier reliability for Si and SiGe HBTs: Aging procedure, extrapolation model limitations and applications,” Microelectron. Reliab., vol. 41, nos. 9–10, pp. 1307–1312, 2001. [WU 05] WU E., SU J., “Power-law voltage acceleration: A key element for ultra-thin gate oxide reliability”, Microelectron. Reliab, vol. 45, no. 12, pp. 1809–1834, 2005. [ZHA 02] ZHANG G., CRESSLER J.D., NIU G. et al., “A New ‘mixed-Mode’ reliability Degradation Mechanism in Advanced Si and SiGe Bipolar Transistors.” IEEE Transactions on Electron Devices, vol. 49, pp. 2151–2156, 2002.

9 Estimation of Fatigue Damage of a Control Board Subjected to Random Vibration

9.1. Introduction On-board electronic systems are often exposed to different types of loading due to their operating environment. These loadings are represented mainly by thermal and vibration effects. In this chapter, the study is limited to vibration loads. In the framework of the FIRST-MFP project, an industrial application, the Valeo S97 demonstrator, of the effect under the influence of random vibrations is measured. The behavior of the control board for the DC– DC converter inverter of a hybrid vehicle is the subject of this study. The objective of this chapter is to estimate the fatigue damage at the soldered joints. The random stresses and the properties of the materials present a wide range of uncertainties. Experimental tests as well as numerical simulations will make it possible to study the uncertainties that influence the behavior of the structure. 9.2. Description of the methodology The proposed methodology consists of developing numerical models and carrying out experimental tests. The objective of this procedure is to compare the numerical with the experimental results, in order to validate the numerical model and to create a more efficient and economical numerical approach. This Chapter written by Mayssam JANNOUN, Younes AOUES, Emmanuel PAGNACCO, Abdelkhalak EL HAMI and Philippe POUGNET.

188

Reliability of High-Power Mechatronic Systems 1

methodology will allow us to avoid very expensive experimental tests in future which are sometimes not viable with microstructures. A finite element model of the concerned mechanical system is developed in order to apply a spectral analysis of random vibrations. Numerical and experimental modal analyses are carried out in order to recalibrate the mechanical properties of the materials, to calculate the modal basis and to validate the numerical model. After a spectral analysis of the random vibrations, the numerical results are compared with the experimental results of the Highly Accelerated Lifetime Testing (HALT). The HALT test is an accelerated life test that subjects the products to all axis random vibration levels in order to find the weak points of the design and fabrication processes. The procedure of comparison between numerical and experimental results is carried out in two steps. The first step is to apply the loads generated by the Qualmark Typhoon 3 table, used in the HALT tests, to verify the functioning of the finite element model in terms of random vibration spectral analysis. The second step concerns the application of the loads as an equivalent stationary and Gaussian vibration profile, representing the trajectory of measured vibration profiles in the first step. This equivalent profile will enable a vibration fatigue damage study using spectral methods to be carried out. 9.3. Finite element modeling Several dynamic problems that are considered complex need to be solved by analytical methods. The Finite Element Method (FEM) is a numerical method to be applied in this case. This method, also called Finite Element Analysis (FEA), allows for an approximate solution through the discretization of the complex system into several finite elements. For random solicitations, stochastic processes are necessary to solve the dynamic system. The stochastic module of finite element resolution is implemented in several advanced software, capable of controlling the random criterion of the different input information in order to ensure correct resolutions. The behavior of a mechanical system is described by its differential equation of motion expressed as a function of the dynamic variables as a function of time: +

+

=

[9.1]

X and W being respectively the random displacement processes and the external forces applied to the system. M, C and K are the mass, damping and stiffness matrices.

Estimation of Fatigue Damage of a Control Board

189

In the case of spectral analysis by random vibrations, the mechanical system is subjected to a linear behavior hypothesis. However, the related differential equation of motion uses a constant stiffness matrix. The linearization of the global equation of motion gives a set of linear equations of motion corresponding to finite elements. 9.3.1. Geometry, boundary conditions and mechanical properties of materials The mechanical structure of the system being studied consists of an electronic control board “PCB”, its electronic components, the bracket represented by an aluminum stiffening plate and the assembly screws. This system is mounted in an aluminum housing and attached at the level of the bracket plate with six mounting screws. The complexity of the system being studied is mainly due to its geometry and the diversity of materials, putting forward a complex equation of motion to be solved. To this end, the use of advanced structural analysis software allows one to introduce the finite elements method by simplifying the system and representing it in terms of masses and stiffness. The PCB is a laminated cross section composed of several layers of copper and FR4 composite materials created from polyimide and glass fibers. In order to simplify the modeling of PCB, the Equivalent Single Layer (ESL) method is used. The complex heterogeneous laminate is represented by a single statically equivalent lamina through homogenization, assuming that the displacement field across the thickness of the laminate can be considered as continuous [GUG 04]. PCB to bracket assembly screws

PCB Bracket

Bracket to casing assembly screws

DC/DC converter inverter

Casing

Figure 9.1. Geometry of the embedded system

190

Reliability of High-Power Mechatronic Systems 1

Figure 9.2. a) Bracket; b) PCB equipped with components; c) assembly system

The control “PCB” is equipped with electronic components and attached to the bracket by 13 mounted screws. In turn, the bracket is attached to the converter inverter housing with six assembly screws (see Figure 9.1). In terms of boundary conditions, two types of boundary conditions are applied in this study. Firstly, the structure is studied under free-free boundary conditions for modal analysis and then with fasteners to the housing at each of the six assembly screw holes, blocking the translation and rotation in all three directions for random vibration. The loads applied to the system are random and located at the attachment points (the six mounting screws holding the casing in place). The electronic components which are considered heavy are modeled as per their located masses at their position on the “PCB” (see Figure 9.3). Table 9.1 gives the mechanical properties of the materials used in the mechanical system before resetting the FE model. These values are collected at Valeo, in Cergy, from the industrial data and the internal reports are based on standards and standard specifications. Parts

Material

Bracket

Aluminum Copper compound+FR4 Steel Sn-3.5AG

PCB equivalent Assembly screws Solder

Young’s Modulus E (MPa) 75000

Density (kg/m3)

µ (Poisson’s Ratio)

2650

0.3

21000

2300

0.35

200000 49000

8000 7500

0.3 0.4

Table 9.1. Characteristics of the materials before calibration of the FE model

Estimation of Fatigue Damage of a Control Board

L.p.H.C. IHPL inductor

EAC

191

Lem Transformer

SMD Power inductor

Pulse transformer

Component

Mass (g)

Lem Transformer

25.5 26.22

EAC: Electrolytic Aluminium

5.6

L.p.H.c. inductor SMD Power inductor Pulse transformer

1.9 4.4

Microcontroller

4.5 3.1

Other components

42.38

Microcontroller

Figure 9.3. Heavy components and their masses

9.3.2. Modal analysis The modal analysis allows for the identification of the model’s natural frequencies as well as the mode shapes. To carry out a modal analysis, it is required to solve the equation of motion of the mechanical system by considering that there are no applied external forces. 9.3.2.1. Experimental conditions

modal

analysis

with

free-free

boundary

The experimental modal analysis was carried out to verify the mode shapes of the FE model in the numerical modal analysis. The objective of this step is to validate the numerical results while comparing between the natural experimental frequencies and the eigen frequencies. These tests were carried out at CEVAA using the 3D laser PSV-400 vibrometer and the piezoelectric ceramic (PICMA type PL055.301) as equipment. The objective of these tests is to calibrate the digital model so as to adjust the mechanical properties of the materials. In this analysis, free-free boundary conditions were applied to observe the simplest mechanical behaviors that the model can achieve. The experiment consists of covering the experimental part with a layer of talc which allows for a better optical measurement by laser. The laser beams generate a mesh on the measured surface thereby capturing the speed of displacement of the mesh nodes as a function of the artificial excitation of the piezoelectric ceramic.

1 See: http://www.pi-usa.us/products/PDF_Data/PI_PICMA_Piezo_Chips_and_Piezo_ Stacks_Cofired_Multilayer_Technology.pdf.

192

Reliability of High-Power Mechatronic Systems 1

Figure 9.4. View of the vibrometer on the configuration card equipped with component

9.3.2.1.1. Results of the modal analysis The experimental tests of the modal analysis were carried out in four steps in order to successively recalibrate the mechanical properties of the materials. The first step consists of detecting the mechanical properties of the materials in the equivalent section of the PCB card by testing the card alone without any electronic components. The second stage of testing is of the bracket, by itself. The third step is to measure the control card equipped with all the electronic components in order to adjust the mechanical properties of the soldered joints. The fourth step is to test the entire PCB board, equipped with electronic components and assembled to the bracket, in order to identify the mechanical properties of the assembly screws. FE models have been developed so as to perform numerical modal analysis for each of the steps presented above. The mechanical properties of the materials after resetting the FE models are given in Table 9.2.

Estimation of Fatigue Damage of a Control Board

193

Young’s Module E(MPa)

Density (kg/m3)

75000

2500

µ (Poisson’s Ratio) 0.3

25000

2300

0.35

Steel

210000

8000

0.3

Sn-3.5AG

49000

7500

0.4

Parts

Material

Bracket PCB equivalent Assembly screw Soldered joints

Aluminum Copper compound+FR4

Table 9.2. Characteristics of the materials after registration of the digital model

Figure 9.5. Example of the results for a modal analysis with free-range boundary conditions of a) PCB by itself and b) a PCB equipped with components and assembled to the stiffening plate. For a color version of the figure, see www.iste.co.uk/elhami/mechatronic1.zip

9.3.2.1.2. Validation of the results for the modal analysis The FE model for the PCB control board equipped with electronic components and assembled to the bracket is calibrated. An Anderson Darling (AD) statistical test is carried out to test the adequacy of a sample in terms of the differences between the numerical natural frequencies and the experimental natural frequencies with the normal distribution. This statistical test computes an AD test statistic in an equation [9.2] to compare it with a critical Acrit value as a function of a risk level α defined in Table 9.3. If the value of AD is greater than Acrit, the assumption of normality is rejected. AD is calculated [RAK 11] as follows: AD = − −

(2 − 1)[ln( ) + ln(1 −

)]

[9.2]

194

Reliability of High-Power Mechatronic Systems 1

n is the size of the sample and Fi the theoretical frequency of a centered and reduced normal distribution law. α 0.10 0.05 0.01

Acrit 0.631 0.752 1.035

Table 9.3. Critical value Acrit according to a level of risk α

The statistical test of conformity to the normal law shows that the points of the tested sample are distributed in the vicinity of the Henry line (see Figure 9.6), which verifies the acceptance of the normality hypothesis and the validation of the FE Model.

Figure 9.6. Normality test applied to differences between experimental and numerical natural frequencies noted Delta in graph

9.3.2.2. Modal analysis with boundary conditions applied to the HALT test bench Numerical and experimental modal analyses are carried out with different boundary conditions. It is considered in this section and in that which follows that the mechanical system consisting of a “PCB” equipped with electronic components and assembled to the casing, is mounted on the HALT test chamber (Qualmark). The Qualmark chamber is responsible for generating the random vibrations which will be described in detail shortly. These tests were carried out by CEVAA in the MB electronics laboratories. The same scanning laser vibrometer measurement procedure and the same equipment used in the previous section were used in this step, with the exception that the

Estimation of Fatigue Damage of a Control Board

195

artificial excitation was produced by an ENDEVCO 2301 impact hammer. The “PCB” control board is assembled to the bracket by thirteen mounted screws. The bracket, in turn, is mounted to the housing with an additional six mounted screws. The housing, with all parts, is attached to the HALT test chamber. These conditions are applied as boundary conditions of the calibrated FE model. Comparisons between the numerical and experimental natural frequencies (see Figure 9.7) show differences of 1.72% for the 1st mode, 1.32% for the second mode, 0.96% for the third mode and 0.14% for the fourth mode. The very small errors between the numerical and experimental results verify that the FE model is very representative to the real behavior of the mechanical system.

Figure 9.7. Example of the results of the modal analysis on a PCB equipped with all components attached to the HALT test chamber. For a color version of the figure, see www.iste.co.uk/elhami/mechatronic1.zip

196

Reliability of High-Power Mechatronic Systems 1

9.4. Spectral analysis of random vibrations After validating the FE model and performing the modal analysis, the spectral analysis of the random vibrations uses the calculated modal basis to solve the motion equation. The modal superposition method enables the transformation of a multi-degree-of-freedom (MDOF) system into several single-degree-of freedom (SDOF) systems, which make the resolution of a complex random dynamic system simpler [JIA 14]. The objective of this section is to compare between the acceleration power spectral densities “PSD” calculated in numerical analysis of random vibrations and the transformed PSD from the experimental acceleration time history measurements realized in the HALT tests. 9.4.1. Highly accelerated life tests (HALT) Highly accelerated life tests (HALTs) are tests generally performed for different types of structures. In particular they are used for dimensioning embedded electronic systems. In these tests, exposed parts are subjected to aggressive load conditions such as temperature, vibration and humidity. Measuring under these conditions represents the severe multi-physical operating environment affecting the reliability of the structure. In this study, the table “Qualmark Chamber Typhoon 3” was used. The vibratory movement of this table is controlled by a linear accelerometer, called a table control accelerometer, installed in the center of the downward face of the vibrating plate. The table is equipped with ten jackhammers, hydraulic pistons responsible for generating shocks in different directions on the vibrating plate. The vibrating plate rests on four springs allowing movement during the jackhammer’s operation. This movement will transfer the random signals generated by the successive shocks of the jackhammers to the experimental piece mounted on the vibrating plate [DOE 14]. The HALT tests were performed in combination with measurements from a 3D laser vibrometer to test this procedure and the efficiency of the laser vibrometer scanning while operating the HALT test chamber as this was the first time that such a combination of testing was carried out. These tests were carried out by the CEVAA in the MB electronics laboratories. The HALT test chamber is operated with the back door open so that the 3D laser vibrometer can sweep over the parts being tested. Several configurations have been tested of which two are of interest. The first configuration is the housing

Estimation of Fatigue Damage of a Control Board

197

on its own, and the second is the control board assembled to the bracket and fixed to the housing. In the first configuration, the time signals generated by the table are measured at the six fixation points of the bracket to the casing. In the same configuration, the same points are measured by three-directional accelerometers of PCB 356A02 in order to compare the two types of results. In the second configuration, the measurements of the time history signals are taken at several points of the mesh produced by the laser on the control board PCB.

Figure 9.8. The Qualmark TYPHOON 3 Table and the distribution of the ten jackhammers below the vibrating plate

Figure 9.9. a) Qualmark chamber; b) 3D laser vibrometer measures configuration 1 operating via the open rear door of the HALT test bench

9.4.1.1. Measurements by 3D laser vibrometer and accelerometer measurements on the HALT test bench Tests on time history signal measurements were carried out by the CEVAA in the laboratories of MB electronics. These tests consisted of

198

Reliability of High-Power Mechatronic Systems 1

measuring points on the Qualmark TYPHOON 3 table through two types of measurements, the three-direction PCB 356A02 accelerometers and the Polytec PSV-400 3D laser vibrometer while operating the HALT test chamber. The measurements show that the two measurements are almost equivalent when vibration levels are limited. On the other hand, when the vibration levels are higher, non-negligible peaks appear in the laser vibrometer measurements (see Figure 9.10). It is therefore recommended not to use the laser vibrometer in the measurements carried out on the HALT test chamber when it is in operational mode. In this case, and especially when the vibration levels are high, the accelerometer measurements are better suited.

s Figure 9.10. Example of measurement signals with the three-way accelerometer and 3D laser vibrometer. For a color version of the figure, see www.iste.co.uk/elhami/mechatronic1.zip

9.4.2. Numerical simulations In the previous paragraphs, an FE model calibrated in terms of the mechanical properties of the materials was developed and a modal analysis was carried out in order to obtain the modal basis. In this section we apply a spectral analysis of the random vibrations. It is considered that the signals measured in the first configuration of the HALT tests are the random loads that will be applied to the FE model. The objective of this study is to calculate the spectral response in acceleration PSD at the level of the control board and to compare the numerical and experimental results. 9.4.2.1. Measured signals and implementation in the FE model The signals being measured are the acceleration time history signals at the points of meshing, which are scanned by the laser vibrometer. For a spectral

Estimation of Fatigue Damage of a Control Board

199

analysis of random vibrations, the acceleration time history signals are transformed into acceleration power spectral densities using the discrete Fourier transform (DFT). In order to consider the correlation between the random PSD loads at the different measured points and to correctly recall the random stresses, a digital code has been developed which calculates the cross spectral density between the different spectral densities generated. 9.4.2.1.1. Power spectral density The power spectral density represents the amount of energy described by a time series when transformed into a spectral function. For a random stationary process, the autocorrelation function of a signal X(t) is defined by equation [9.3]: [PIT 01a] ( )= [ ( ) ( + ) ]

[9.3]

A stochastic process is said to be stationary in the strict sense if all its statistical properties do not change over time. The PSD Φxx (w) described in equation [9.4] represents the Fourier transform of the autocorrelation function Rxx ( ) as a function of the random variable X [PIT 01a], as follows: ( ) =

( )

[9.4]

It is assumed that the random processes that excite the structure in this study are Gaussian and stationary. In order to carry out a study of fatigue damage in that which follows, it is necessary to define the statistical properties of the PSD as follows: 1) The spectral moments defined by: | |

=

( )

[9.5]

With mi being the spectral moment of the order i of a stationary random process Φxx (w). The number of crossing through zero with a positive slope ν0 [RIC 45] is such that: =

(

)

[9.6]

200

Reliability of High-Power Mechatronic Systems 1

The mean number of maxima per unit of time ν1 [RIC 45] is such that: =

(

)

[9.7]

PSDs are used to estimate the statistical properties of earthquake measurements, wind speed, dynamic loading of road traffic, etc. When the motion is stationary, the statistical properties do not vary as a function of time and the spectral densities are identical. In the case of a stationary and Gaussian random variable W(t), where the sampling frequency is regular and starts with zero, a Gaussian time series [VEE 84], can be generated as: +∑

( )= with

=



(

(

)+

,

=



(

))

[9.8]

and Sj is the amplitude of

the PSD at the frequency wj; φj is a random variable uniformly distributed on the interval [0, 2π] and is the average of the signal x(t). 9.4.2.1.2. Cross-spectral density The cross-spectral density (CSD) Si j is defined in [9.9] so as to correlate between the power spectral densities Si i and Sj j calculated for the different measured points i and j. It is expressed as a function of a coherence function [VEE 84], such that: ∆ /

=

[9.9]

Where d is the degradation rate and it depends on the vertical and lateral distance between points, Δr is the geometric distance between the two points, w is the frequency and is the average of the PSD. The CSD [VEE 84] is given by: =

[9.10]

9.4.2.2. Results and interpretations When the FE model is subjected to random loads expressed by PSDs (acceleration, speed or displacement), the response of the system will also be in the form of a PSD. Several advanced structural analysis software are equipped with a spectral resolution option for random vibrations. These are able to calculate a transfer function H(w) which describes the transition from (w) to an output DSP (w) such that: an input PSD

Estimation of Fatigue Damage of a Control Board

( )=

∗(

).

( ).

( )

201

[9.11]

The first step in this study is to introduce PSD and CSD input loads in a single direction with a 1% damping coefficient. This was done for two main vibration levels of 10 Grms and 20 Grms. The response of the structure is taken at a point on the PCB control board in the vicinity of the electrolytic aluminum capacitor (see Figure. 9.3).

Figure 9.11. PSD num. and exp. in the vicinity of the capacitor with a damping coefficient of 1% and a vibration level of 10 Grms in one direction only

In terms of interpretations of these graphs (see Figure 9.11), it can be said that the two curves are comparable in resonance frequencies. The intensities of the PSDs cannot be compared since the input measurements were not made at the same time as the output measurements. Each type of PSD has information that cannot be present in other PSDs in terms of acceleration intensities. Only the resonance frequencies can be observed in this case study. It is noted that the resonance frequencies corresponding to the experimental measurements coincide with the resonance frequencies relative to the numerical analysis with a small calibration. This calibration may be caused by calibration of the FE model in the modal analysis phase.

202

Reliability of High-Power Mechatronic Systems 1

The second step in this approach is to introduce PSD and CSD loads in all three directions with a 1% damping coefficient. Note also that in this case (see Figure 9.12) the intensities of the PSD accelerations are not comparable. Only the resonance frequencies are to be compared for the reasons explained above.

Figure 9.12. PSD at one point in the vicinity of the capacitor by applying a damping coefficient of 1% with a vibration level of 10 Grms in three directions

Figure 9.13. PSD at a point in the vicinity of the capacitor by applying a damping coefficient of 1.4% with a vibration level of 10 Grms in three directions

Estimation of Fatigue Damage of a Control Board

203

The third step in this study is to introduce a damping coefficient of 1.4% (defined experimentally). We note in the results of this calculation that the peaks of the numerical PSD accelerations become closer to the experimental PSD accelerations peaks. It can be deduced that the damping coefficient can improve the results so as to approximate the behavior of the system during the experimental excitation. On the other hand, the information provided by the PSDs is always different, only the resonance frequencies are comparable. Calculations with random loadings in one direction are more advantageous, in terms of calculation, time and memory consumption, since they present the same expected information as the calculations with loadings in three directions. It can be seen from the above steps that the FE model gives numerical PSD response comparable to experimental generated PSD from time history measurements. In addition, not all numerically calculated resonance frequencies can be detected experimentally. Sometimes experimental measurements can pass on certain PSDs at certain frequencies without being able to detect them; hence the importance of always carrying out FE models in order to correctly determine the behavior of the structure. 9.5. Application of a stationary Gaussian random load In order to carry out a fatigue damage study using spectral methods, Dirlik and Single moment, at the solder joints connecting the electronic components to the PCB control board, stationary and normality tests are performed on the signals generated by the Qualmark table. It is found that 20% of the signals are not stationary and almost all are not Gaussian. However, in order to have PSD time signals useful for this study, it has been decided to change the loading signals applied to the structure. The design vibration profile is the equivalent PSD profile of PSD generated from measured time history signals of the S97 demonstrator applied as random loads. This equivalent profile is compared with the measurements made on the case by the accelerometers with a vibration level of 10 Grms. Although the equivalent profile does not provide all the information generated by the measured PSDs, it describes their trajectory (see Figure 9.14). For this purpose and for the sake of simplicity, it was assumed in the following that the equivalent profile represents the casing of the measured PSDs. The comparison that will be presented in this study is based on the numerical simulations resulting from the equivalent PSD applied with eight levels of vibrations for 20 minutes each on the one hand, and the

204

Reliability of High-Power Mechatronic Systems 1

experimental results of a step test on the other. The step test consists of applying a random vibration loading with vibration levels varying from 5 Grms to 40 Grms, in 5 Grms steps, with a loading time of 20 minutes. It is observed that when the 40 Grms vibration level is reached, an electronic component, the electrolytic aluminum capacitor, breaks off after 10 minutes of loading (see Figure 9.15).

Figure 9.14. PSD measured by accelerometers and PSD specifications. For a color version of the figure, see www.iste.co.uk/elhami/mechatronic1.zip

Figure 9.15. System failure as a result of the HALT 'step test' system after 2 hours and 40 minutes of total loading

Estimation of Fatigue Damage of a Control Board

205

9.5.1. FE model and sub-modeling technique In order to estimate fatigue damage at the soldered joints of an electronic component mounted on a PCB control board, a simplified FE model is developed in order to reduce the calculation time and to apply the submodeling technique. Indeed, the electronic components are modeled according to their masses; the only area of interest is the electrolytic aluminum capacitor zone (see Figure 9.3) that should be modeled by precision (mass of the component, pins and soldered joints). The “PCB” control board is assembled with the housing at the bracket by six assembly screws. The random loads are applied directly at these mounting points. The sub-modeling technique is a method used to study the fatigue damage of the critical zone concerned. It allows one to precisely calculate the stresses at the level of the soldered joint by the passage of a global model with a coarse mesh (118,894 nodes) to a local model with refined mesh (22,532 nodes). A modal analysis is carried out to calculate the modal basis necessary for any spectral analysis. The vibration profiles are applied with eight different vibration levels. Based on the calculated modal basis, a spectral random vibration analysis is performed to calculate the random response from the system in the form of a PSD. Using the sub-modeling technique, a section is applied to the component being studied. The displacement PSDs are calculated in the global model, in three directions of the nodes located on the local model section, and which are applied as boundary conditions to the local model. Finally, the PSDs of normal and shear stresses are calculated at the soldered joints of the local model. This makes it possible to calculate the PSD of the Von Mises stress assuming the case of biaxial stresses and based on the quadratic criterion of Von Mises [PRE 94], as per the following equation: ( )=

( )+

( )−

( ) +3

( ) [9.12]

where φsxsx(w) represents the PSD of the normal stresses in direction X, the PSD φsysy(w) of the normal stresses in the direction Y, φsxsy(w) the correlated spectral density between the two variables φsxsx(w) and φsysy(w), and φsxysxy(w) represents the PSD of shear stresses XY. In the results of the sub-modeling, a coincidence is noted between the resonance frequencies between both the global and local models with a more precise spectral response provided by the local model. The local model

206

Reliability of High-Power Mechatronic Systems 1

provides results with higher peaks than those observed in the global model. This comparison shows a good transfer of loads between the two models and validates the sub-modeling technique used in the spectral analysis of random vibrations. The Von Mises stress PSD computed by this technique is then used to calculate the fatigue damage of the soldered joint.

Figure 9.16. Example of PSD stresses calculated from the global and local model at the soldered joint corresponding to a vibration level of 40 Grms. For a color version of the figure, see www.iste.co.uk/elhami/mechatronic1.zip

Figure 9.17. Simplified global FE model of the PCB board with the critical component in 3D and a local FE model zooming the mesh at the critical component. For a color version of the figure, see www.iste.co.uk/elhami/mechatronic1.zip

Estimation of Fatigue Damage of a Control Board

207

9.6. Estimated fatigue damage In order to study the reliability and the lifetime of an electronic structure, it is recommended to subject it to different types of loads (vibration, thermal, moisture, etc.). In general, random loads are the dimensioning loadings of embedded electronic systems. It seems enough to subject the system to different levels of vibration in order to test its reliability. 9.6.1. Time domain study In order to calculate the fatigue damage by a time domain calculation, it is necessary to have the stresses time history response of the dynamic structure being studied. In this work, the time history responses of the equivalent Von Mises stress σ are studied: – a calculation of stress cycles is performed by the Rainflow-counting algorithm, then each non-zero average cycle is transformed into a zero average cycle and the number of fault cycles is calculated by: =

[9.13]

– the elementary damage of the different cycles di is accumulated by the Palmgren-Miner Law to finally make the Monte Carlo simulations for n simulations in order to estimate the mean damage from n statistically independent realizations such as: =∑

with

=

[9.14]

– ni being a half cycle of a stress σi, and b and C coefficients of the S-N curve as a function of the materials. 9.6.2. Frequency domain study The random response of the system being studied in terms of the stresses represented by the stress PSD is used to calculate fatigue damage. This study is restricted to two methods, the Dirlik Method and the SingleMoment Method. The Dirlik method is a half empirical method that determines the probability density of cycles generated by Rainflow-counting from PSDs. This method is based on the Monte Carlo simulations, it calculates the

208

Reliability of High-Power Mechatronic Systems 1

mean damage from four spectral moments m0, m1, m2 and m4 [PIT 01a], such that: ( )

[ ]= ( )=( =

(

= =

)

[9.15]

+

+

)/(

=

;

;

) =1−



;

=

; ;

=

.

(

)

;

=

where ds is the range and T is the loading time. The Single-Moment Method was developed by Lutes and Larsen. They proposed an empirical relation to calculate the average fatigue damage from a single spectral moment of the order 2/b [PIT 01a]. Preumont [PRE 94] uses this approach for FE analysis to identify and locate the level of damage in structures: [ ]=

2 Г(1 + )(

/

)

/

[9.16]

9.6.3. Calculation of fatigue damage and comparison of methods The Von Mises stress PSD calculated in the preceding paragraphs for the different vibration levels applied to the structure successively in the form of 20-minute load stages is calculated by time domain and frequency domain methods in order to estimate the average fatigue damage. Referring to the work of Cinar et al. [CIN 13] and by taking into account the unit of stresses in MPa, the properties of the material of solder joint SAC305 can be used in fatigue, b = 6.93 and C = 3.5828e12 MPa. Mean damages calculated by the time domain method combined with the Monte Carlo simulations as well as the Dirlik and Single-moment spectral methods are shown in Table 9.4.

Estimation of Fatigue Damage of a Control Board

Methods Time domain Approach and MC Simulations Dirlik Method Single-Moment Method

209

Mean damage

Standard deviation of the damage

1.55

0.0532

0.8 0.9

Not calculated Not calculated

Table 9.4. Mean damage estimated by different methods at a point in the critical area of the soldered joints

The numerical results estimate the cumulative damage corresponding to all experimentally produced vibration levels at the soldered joint. This estimation shows that the component studied is the object of an initial crack and a failure. In effect, it no longer passes the current to the control board, which has been verified experimentally following the HALT tests. In order to compare the methods, it can be said that spectral methods are the most efficient methods in terms of computation time. The error between the methods is related to the material properties of the S-N curve. Larsen et al. calculate the error of estimation of fatigue damage for different spectral methods in their studies [LAR 15]. Concerning computation time, the time domain method takes one week of calculations, the Dirlik method 14.45 seconds and the single-moment method 1.61 seconds. This seems very logical since the time domain method counts the cycles in combination with Monte Carlo simulations, whereas the Dirlik spectral method estimates fatigue damage from four spectral moments, and the single-moment spectral method uses a single spectral moment. 9.7. Conclusion In the framework of the FIRST-MFP project, the study of the behavior of the control board equipped with electronic components of a DC-DC converter inverter for a hybrid vehicle was presented. An FE model was developed to be used in several phases of this study. A numerical modal analysis and an experimental modal analysis were carried out in order to calibrate the FE model in terms of the mechanical properties of the materials. As well as a spectral analysis of the random vibrations, the HALT tests were implemented in order to validate the FE model in terms of random stresses. This study has found that the random single-direction loading model provides the same required information of the random loading FE model in all three

210

Reliability of High-Power Mechatronic Systems 1

directions. Whereas the FE model with single-direction loading is more advantageous in terms of memory consumption and computing time. On the other hand, the methodology of estimating fatigue damage was presented by two probabilistic methods; the time domain method of Rainflowcounting cycles combined with the Monte Carlo simulations as well as the Single-moment and Dirlik spectral methods. A numerical application was presented by comparing the results of the HALT experimental tests of the step test with the numerical simulations derived from the equivalent PSD vibration profile. The numerical calculations show a failure of the system justifying the experimental results. These numerical calculations also demonstrate the feasibility of the sub-modeling technique by respecting the modes of the local model while calculating the Von Mises stress PSD in the critical zone. This PSD is used to estimate fatigue damage in order to test the reliability of the system by time domain and spectral domain methods. The spectral approach used is limited to stationary and Gaussian processes in order to give an estimate of the mean fatigue damage in Dirlik and Single moment spectral methods. Both time domain and spectral approaches estimate very close values of the mean damage. Spectral methods are more economical and efficient, especially the Single-moment method, in terms of memory consumption and computation time. 9.8. Bibliography [CHE 07] CHEN Y.S., WANG C.S., YANG Y.J., “Combining vibration test with finite element analysis for the fatigue life estimation of PBGA components”, Microelectronics Reliability, vol. 48, no. 4, pp. 638–644, 2007. [CIN 13] CINAR Y., JANG J., JANG G. et al., “Effect of solder pads on the fatigue life of FBGA memory modules under harmonic excitation by using a global–local modeling technique”, Microelectronics reliability, vol. 53, no. 12, pp. 2043–2051, 2013. [DOE 14] DOERTENBACH N., “The Function & Purpose of Repetitive Shock vs. ElectroDynamic Vibration – Qualmark Corporation”, 2014 Annual Reliability and Maintainability Symposium (RAMS), Colorado Springs, United States, January 2014. [GUG 04] GUGNONI J., Identification par recalage modal et fréquentiel des propriétés constitutives de coques en matériaux composites, PhD thesis, Ecole Polytechnique Fédérale de Lausanne, Switzerland, 2004. [HAI 12] HAIJUN X., DEJIAN Z., ZHENGWEI L., “The Sub-Model Method for analysis of BGA Joint Stress and Strain During Random Vibration Loading”, 13th International Conference on Electronic Packaging Technology & High Density Packaging, pp. 1216–1221, Guangxi, China, August 2012.

Estimation of Fatigue Damage of a Control Board

211

[HOC 12] HOCK LYE PANG J., Lead Free Solder - Mechanics and reliability, Springer Heidelberg, New York, 2012. [JIA 14] JIA J., Essentials of Applied Dynamic Analysis, Springer Heidelberg, New York, 2014. [KIM 05] KIM Y., NOGUCHI H., AMAGAI M., “Vibration fatigue reliability of BGA-IC package with Pb-free solder and Pb-Sn solder”, Microelectronics Reliability, vol. 46, no. 2, pp. 459–466, 2005. [LAL 06] LALL P., GUPTE S., CHOUDHARY P. et al., “Solder-joint reliability in electronics under shock and vibration using explicit finite-element sub-modeling”, Proceedings of 56th Electronic Components and Technology Conference, San Diego, United States, May 2006. [LAR 15] LARSEN C.E., IRVINE T., “A review of spectral methods for variable amplitude fatigue prediction and new results”, 3rd International Conference on Material and Component Performance under Variable Amplitude Loading VAL 2015, Prague, Czech Republic, March 2015. [MAD 03] MADENCI E., GUVEN I., KILIC B., Fatigue Life Prediction Of Solder Joints In Electronic Packages With Ansys, Springer, New York, 2003. [PIT 01a] PITOISET X., Méthodes spectrales pour une analyse en fatigue des structures métalliques sous chargements aléatoires multiaxiaux, PhD thesis, ULB, Brussels,, 2001. [PIT 01b] PITOISET X., RYCHLIK I., PREUMONT A., “Spectral methods to estimate local multiaxial fatigue failure for structures undergoing random vibrations”, Fatigue & Fracture of engineering Materials & Structures, vol. 24, pp. 715–727, 2001. [PRE 94] PREUMONT A., PIEFORT V., “Predicting random high cycle fatigue life with finite elements”, ASME Journal of Vibration and Acoustics, vol. 16, pp. 245–248, 1994. [RAK 11] RAKOTOMALALA R., Tests de normalité, Techniques empiriques et tests statistiques, Internal report, Université Lumière Lyon 2, France, 2011. [RIC 45] RICE S.O., “Mathematical analysis of random noise”, Bell System Technical Journal, vol. 24, pp. 46–156, 1945. [STE 00] STEINBERG D.S., Vibration Analysis For Electronic Equipment – Third edition, John Wiley & Sons, 2000. [VEE 84] VEERS P.S., Modeling stochastic wind loads on vertical axis wind turbines, SANDIA report – SAND83-1909, Sandia National Laboratories, Albuquerque, New Mexico, 1984.

10 Study on the Thermomechanical Fatigue of Electronic Power Modules for Traction Applications in Electric and Hybrid Vehicles (IGBT)

The current trend in the field of rail transport is to integrate increasingly powerful power modules into increasingly smaller spaces. This poses problems, notably in terms of reliability, since during their operating cycles the semiconductor switches and their immediate environment are subjected to more severe thermomechanical stresses. This can lead to their destruction and thus the failure of the energy conversion function. The main objective of this chapter is to describe the digital approach to simulating the electrothermomechanical behavior of an IGBT power module in order to characterize these stresses. This digital modeling subsequently also served to formulate a reliable study for estimating the lifetime by thermomechanical fatigue of these electronic components.

10.1. Introduction The reliability of the power module is mainly related to that of its electronic components (IGBT power chips and diodes) (see Figure 10.1). These components undergo severe and varied stresses. Indeed, the electrical pulses due to the internal operations of these components cause thermal loadings inducing mechanical deformations which can go beyond the thresholds desired. Furthermore, these electronic components are subjected to Chapter written by Abderahman MAKHLOUFI, Younes AOUES, Abdelkhalak EL HAMI, Bouchaib RADI, Philippe POUGNET and David DELAUX.

214

Reliability of High-Power Mechatronic Systems 1

vibrations which can affect the state the solder joints and the electrical wire connections, causing damage by thermomechanical fatigue. To this end, knowledge of the mechanical response of the power module is vital for the electronics industry as these modules fulfill strategic functions, as is the case for electric and hybrid vehicles. Knowledge of the mechanical behavior of this power module requires the modeling of several physical phenomena. Multi-physical modeling aims at considering the interdependencies and interactions between different physical phenomena such as electrical, thermal and mechanical. In this study, multi-physical coupling is performed using ANSYS software.

Figure 10.1. Infineon power modules (VALEO demonstrator)

10.2. Presentation of the power module (IGBT) The electronic power components are static converters (inverters for the control of alternating motors) integrated with MOS (Metal-OxideSemiconductors) or IGBT (Insulated Gate Bipolar Transistors).

Study on Thermomechanical Fatigue

215

The IGBT modules (most often composed of silicon) now cover a wide array, ranging from ten to a few thousand amperes and 300 to 6500V (technological limit of silicon). IGBT modules are used, for example, in the control of electric motors, electric and hybrid vehicles or the management of energy sources.

Figure 10.2. Cross-section of the IGBT component

In one module, the IGBT chips are arranged in parallel in order to obtain the desired current while at the same time enjoying a satisfactory proportional yield. Most often, the module is presented in a plastic casing with a metallic base plate. The chips are connected to the metal connections by bond wires that rest upon a ceramic substrate (Figure 10.2). The latter is responsible for the electrical insulation from the base plate, which itself is (at its other end) in contact with the dissipator through a film or thermal grease. The module is secured to the heatsink by means of screws, the tightening torque of which must be respected. Solder joints are made between chips and the insulating ceramic and between the ceramic and the metal layers. Conventionally, a DCB (Direct Copper Bonding) method is used for a good adhesion of the copper to the ceramic surface. The strong currents which will pass through this component, when the latter is in the switched on mode, will lead to a heating of the power chip. This heat will be evacuated via the casing, which in turn causes it to become heated. Figure 10.2 shows the various materials used in the manufacture of a power component that will generate, during the temperature variations, thermomechanical stresses between the various materials constituting it. These variations in temperature and thus in stresses are at the origin of the reliability challenges in terms of the power chip or

216

Reliability of High-Power Mechatronic Systems 1

interconnections within the casing. Numerous studies have been carried out to understand the physical mechanisms of degradation and aging on the power module, an overview of which we shall present below. 10.3. Different modes of failure for power modules under the effect of thermal cycling The thermal cycling and the high ambient temperature in which the power modules can be located are crucial for both the active elements of the module and the assembly. In the following, we briefly describe the main failure modes that may affect power modules. For the component that was presented in Section 10.2, the experiment showed three zones which corresponded to these criteria. These are the two interfaces between the chip and the substrate and between the substrate and the base plate in terms of the solder joints. The third zone is at the interface between the aluminum metallization and the electrical wires (Figure 10.3).

Figure 10.3. Cross-section of the component

10.3.1. Breakdown of ceramic substrates IGBT modules are increasingly used in electrical energy conversion systems. Inside these modules, ceramic substrates are used to provide electrical insulation of the conductor tracks while contributing to the mechanical strength of the assembly and the removal of the calories generated by the electronic components during their operation. They are then metalized on both sides with copper or aluminum by direct bonding (direct

Study on Thermomechanical Fatigue

217

copper bonding, direct aluminum bonding) or by soldering with metal alloys (active metal brazing). In a large number of embedded systems (e.g. avionics), high temperature variations (passive and active cycling) are responsible for severe thermomechanical stresses due to the differences in the coefficients of the thermal expansion of the constituent materials and which lead to fatigue due to failure. Under these extremely severe conditions, the lifetime of the DCB substrates can be greatly reduced. The large amplitudes in temperature variations lead to a deformation of the copper of the DCB and a hardening of this metallization. As a result, during the repetition of temperature cycles, the mechanical stresses undergone by the ceramic are increased up to the worst case, the breaking point of the ceramic and eventually the separation of the copper from the ceramics [MIT 99, SCH 00]. To reduce the mechanical stresses on the edges of the copper and to delay its detachment, it is recommended to reduce the thickness of the metallization. Indeed, the less thick this is, the weaker the stresses imposed on the ceramic will need to be. This reduction in thickness must in particular be carried out on the upper face of the DCB while respecting the current densities acceptable to these metallizations [DUP 06, SCH 03]. 10.3.2. Solder fatigue: chip–substrate and substrate–base plate Solder degradation is the most common failure of IGBT modules. We have seen that the solder is used in two different places: on the interface between the chip and substrate and between the substrate – base plate. There are two types of failure associated with solders: the delamination phenomenon and the formation of holes. In this study, we are interested in the first phenomenon related to aging by thermal stresses and those generated by thermal cycling. The second outcome of the brazing process will not be studied in this work. The delamination phenomenon occurs at the periphery of the solder and then propagates towards the center when the solder is subjected to thermal cycling [KHA 04, MIT 99]. Figure 10.4 illustrates an acoustic microscopy photograph of the state of substrate solder for AL2O3 copper base plates of a high power module after 200,000 power injection cycles (the dark parts correspond to the healthy areas and the lighter parts to degraded areas).

218

Reliability of High-Power Mechatronic Systems 1

Figure 10.4. Acoustic analysis of the IGBT power module. For a color version of the figure, see www.iste.co.uk/elhami/mechatronic1.zip

10.3.3. Fatigue of the metallization in the aluminum of the component The metallization as well as the connection wires for the power component are located near the active zone of the component which, during operation, dissipates the most heat. The failure of these elements results from thermomechanical fatigue phenomena which are due to differences in thermal expansion between the silicon and aluminum. Indeed, the shear stresses between the connecting wire and the metallization of the power component are all the more intense when the temperature variations of the component are high. Two main mechanisms will lead to component failure: crack propagation between the connecting wire and the source metallization and the reconstruction of the grains of the aluminum metallization [RAE 09, WEI 08]. 10.4. The physical phenomena involved 10.4.1. Thermal phenomena Thermal energy exchanges between two systems (or subsystems) at two different temperatures are carried out according to three modes of transfer: – conduction; – convection; – radiation.

Study on Thermomechanical Fatigue

219

In the case of electronic circuits, the heat generated in the components (towards the rear surface) is evacuated mainly by conduction but also by convection and radiation on the upper or lateral parts. In the following we shall present, for the two types of heat transfer (conduction and convection), the general laws which govern them. 10.4.1.1. Heat equation of an isotropic medium The first principle of thermodynamics can be written between two successive instances: +

=

+

[10.1]

with: – the transformation in the medium of potential energy into calorific energy (joule effect); –

the internal energy variation;



the heat exchanges at the borders of the system;



the work exchanges at the borders of the system.

For a homogeneous element of volume and limited to a surface , where ( , ) is the voluminal heat output of the internal sources, one can write: =

− ( , )

[10.2]

10.4.1.2. Heat transfer by conduction The phenomenon of conduction can be described as the diffusion of temperature within one or more solids, as long as there is no fluid between them. In the cases that we will study, we shall assume that the different solids are perfectly bonded and that the presence of air between the surfaces is nonexistent (Figure 10.5).

T1>T2 Solid 1 Temperature T1

Heat Q

Solid 2 Temperature T2

Solid contact Figure 10.5. Heat transfer by conduction

220

Reliability of High-Power Mechatronic Systems 1

The heat density can be written in the form: =− ∗

[10.3]

with: – the thermal conductivity of the material in ANSYS; –

the temperature in K;



the vector for heat density.

.

called

in

The first equation of thermodynamics, described for the control of volume differences, is given as: +



+ div =

[10.4]

with the equation for a temperature rise at constant pressure: =

and

=

(



)

[10.5]

with: –

the basic energy in Joules;

– M the mass in kg; – = / the heat of the mass (specific heat) in . . being an infinitesimal quantity of heat in . . an infinitesimal variation of the temperature (K or °C); – the heat mass (specific heat) by . . being an infinitesimal . A infinitesimal variation of the temperature quantity of heat in . (K or °C); – Q the power dissipated (W. m–2); –

temperature after heat exchange in K;



initial temperature in K.

Study on Thermomechanical Fatigue

221

The Fourier law gives us the amount of heat in steady state for a surface: =− . .

.

[10.6]

It follows that the flow: =− . .

Φ=

[10.7]

and the heat density: =

Φ

− .

φ=

Φ

− λ.

[10.8]

with: .

– thermal conductivity of the material in ANSYS; –

basic time in s;



the temperature gradient in x km–1;

called k in

– Φ the heat flux in watts; – –

the density of heat in the section in

.

;

.

One obtains the equation of heat transfer: ∇

=

+

[10.9]

with: – Q power dissipated in joules; –

the coefficient of thermal conductivity ( .



density of the material (

.

.

);

).

10.4.1.3. Heat transfer by convection Convection is the heat exchange that takes place between a solid and a fluid. The phenomenon is applied to the surface of the solid in contact with

222

Reliability of High-Power Mechatronic Systems 1

the fluid, unlike conduction where everything happens inside the solid (s) (Figure 10.6). This transfer mechanism is governed by Newton’s law: ∗ n = h (T − T )

[10.10]

with: – h the coefficient for heat transfer; – T the temperature of the fluid at contact with the surface; – T the temperature of the model surface.

Fluid to Tb

Solid Ts

Figure 10.6. Transfer of heat by convection

10.4.2. Electrothermal phenomena Electrothermal phenomena correspond to the creation of heat from the flow of current from a body. This is the Joule effect. It is possible to draw a parallel between heat and electricity because the laws operate in the same way. There are several ways of electrically heating a part: by conduction, induction or via electric arcs. As far as this study is concerned, we will limit the effects of electrical conduction to the purpose of our model is to observe the behavior to the components when they are stressed. Moreover, the electro-thermo-mechanical coupling requires a lot of calculation time, so we will not add the effect of heat on voltage. The equations governing the Joule effect by electrical charging are shown below.

Study on Thermomechanical Fatigue

223

10.4.2.1. Ohm’s law According to Ohm’s law: =

[10.11]

with the voltage amperes) and: =

=

(in volts),

the resistance (in ohms), the current

²

(in

[10.12]

with the power (in watts) for a purely resistive system charged with a direct current. 10.4.2.2. Electric conduction There can be two types, direct or indirect. In both cases it is the Joule effect that brings about the energy to heat the part. – Direct conduction: current flows through the part to be heated. – Indirect conduction: the current passes through a heating resistor. This will bring heat to the part to be heated. In our case the tension is applied directly to the part. This is a case of direct conduction which is governed by the equations below. If we neglect the losses: =

[10.13]

where: = with

+

t

[10.14]

the initial temperature of the part.

If we do not neglect the losses: −Φ=

−h ( −

)=

[10.15]

224

Reliability of High-Power Mechatronic Systems 1

with Φ the heat flux in watts: Φ=h ( −

)

[10.16]

therefore: +

=

+

[10.17]

with: – Hf the exchange coefficient in W.m-2.K-1; –

the temperature gradient in

;

– Φ the heat flux in watts; –

basic time in s.

10.4.3. Mechanical phenomena The stresses undergone by the components being studied are essentially thermomechanical in origin. A succinct presentation of the principal laws of mechanical behavior is discussed in this section. 10.4.3.1. Elastic deformation Elastic behavior occurs when the stress does not exceed the elastic limit of the material (Re) and is translated by a linear deformation to the applied and reversible stress; the material regains its initial state during the cancellation of the stress and does not undergo fatigue. The stress-strain relation is described by a linear relation of Hooke’s law: = . with E modulus of elasticity (Pa) and ε the elastic deformation (Figure 10.6). 10.4.3.2. Elasto-plastic deformation When the stress exceeds the elastic limit, the material breaks if it is brittle (a good example being ceramic materials) or undergoes a plastic deformation if it is ductile. Plastic deformation is irreversible and independent of time. The material no longer regains its initial state after the stress has been canceled. This state is characterized by a tangential modulus and a tensile strength limit Rm (Figure 10.7) beyond which the material exhibits a

Study on Thermomechanical Fatigue

225

localized reduction of its cross-section (striction) where the plastic deformation is concentrated leading to the complete breakdown of the solid at this point if the applied stress exceeds the tensile strength limit.

Figure 10.7. Characteristic behavior of a material in traction

Plastic deformations are expressed by a stress power function, for example in the case of traction [CHA 05]: = = 0 if

if