Dissertation-Index - Writeup-V2

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 50

A Study of the Frequent Hardware Failure,

Replacement and its Monetary Impact Analysis

A dissertation submitted in partial fulfillment of the requirements


for the award of

PGPM (PT)

by

Name: - Santosh Aman

Roll No.:- 11PT1-077

Part-time Post Graduate Programme in Management

Management Development Institute

Gurgaon 122 007

October, 2013

1|Page
A Study of the Frequent Hardware Failure,
Replacement and its Monetary Impact Analysis

A dissertation submitted in partial fulfillment of the requirements


for the award of

PGPM (PT)

by

Name: Santosh Aman

Under the guidance of

Shri SUDEEP SAXENA Dr. NARAIN GUPTA


Designation: Regional Delivery Manager Designation: Assistant Professor
Organization: IBM MDI, Gurgaon

Part-time Post Graduate Programme in Management

Management Development Institute

Gurgaon 122 007

October, 2013

2|Page
Certificate of Approval

The following dissertation titled “A Study of the Frequent Hardware Failure, Replacement
and its Monetary Impact Analysis" is hereby approved as a substantive work/study in
management carried out and presented in a manner satisfactory to warrant its acceptance as a
prerequisite for the award of PGPM (PT) for which it has been submitted. It is understood that
by this approval the undersigned do not necessarily endorse or approve any statement made,
opinion expressed or conclusion drawn therein but approve the dissertation only for the
purpose it is submitted.

Dissertation Examination Committee for evaluation of dissertation

Name Signature

1. External Examiner _______________________ ___________________

3|P age
Certificate from Dissertation Advisory Committee

This is to certify that Mr. SANTOSH AMAN, a participant of the Part-time Post Graduate
Programme in Management has worked under our guidance and supervision. He is submitting
this dissertation titled “A Study of the Frequent Hardware Failure, Replacement and its
Monetary Impact Analysis “in partial fulfillment of the requirements for the award of the
PGPM (PT).
This dissertation has the requisite standard and to the best of our knowledge no part of it has
been reproduced from any other dissertation, monograph, report or book.

Faculty Advisor: NARAIN GUPTA Organizational Advisor: Shri SUDEEP SAXENA


Designation: Assistant Professor Designation: Regional Delivery Manager
Management Development Institute Organization: IBM
Gurgaon Address: New Delhi
Date: 4 / 10/ 13 Date: 4 /10 /13

4|P age
Abstract

A Study of the Frequent Hardware Failure, Replacement and


its Monetary Impact Analysis
by

SANTOSH AMAN

In today’s world everything is full with competition. We can see that how all industry market
became competitive and everyone is fighting for their sustainability and growth since dot-com
bubble boom started during late 1990s. Changes in environment composition and their issues
are becoming one of the major key challenges toward data centre operation and business
operation. Almost every organization having their data centre as per there business need and
operational automation. In the current scenario no one can avoid importance and benefit of
automated operation through the data centre. In this dissertation we will study, how the
environment affecting the DC (data centre), which is directly impacting business operations.

Data centre consists of information technology and electronic equipments. Nonstop data centre
operation and their reliability, completely dependent on smooth function of these equipments.
Every business needs 24x7 availability and access to IT resources. When frequent hardware
equipment failure happens due to the environmental factors, it impacts the customer service,
and the long term business. Instantaneously it starts increasing the downtime of the servers and
critical business applications, which is directly makes a monetary impact on the business.

The key research objectives of the dissertation are:

1. To investigate the causes of failure and thereby the replacement of the parts in the client
server room.

2. To delve into the key causes further using the problem solving tools, and explore the
possible solutions to address the main causes.

3. To analysis the monetary impact of the parts failure on the clients business, and impact of
parts replacements on the suppliers overall cost.

In this study we identified the key causes of hardware failure and their monetary impact with
respect to the customer (DC owner) and the supplier (Vendor/OEM). This is the live issue which
I have seen during my job experience. The study conducted within the premises of my one
customer/client ABC telecom, situated at Noida and their vendor XYZ Corp. Another third

5|Page
organization is also come into the picture to resolve the environmental issue that is Air purifier
company Purafil Inc.

To address the issue I have taken relevant live data which was limited in nature, made some
assumption and did qualitative and quantitative analysis. In the analysis starts, with finding key
causes, major percentage of failure reason, their monetary impact in both perspective
(Customer & vendor). Once complete the analysis part came at outcome that every stakeholder
in this reason will be at win-win situation after successfully addressing the issue – corrosion.

In brief, the dissertation was intended to investigate the key causes of frequent parts
replacements, and its monetary implications to the key stakeholders. The study was conducted
at XYZ Corp and ABC telecom premise. The relevant data was collected, and an investigation was
made by means of a problem solving tool ‘Pareto Analysis’. We observed that the majority of the
parts replacements have happened because of corrosion. We further investigate the causes of
the corrosion by means of another problem solving tool ‘Root Cause Analysis’. The RCA revealed
that the corrosion happens in the server room because of the environmental air entering into
the server room through the air conditioners. The air in the open environment is an external
factor and cannot be controlled. The air conditioners in-take of air may be controlled using
advance air purifying equipments. The dissertation concluded that the companies installing
their servers in such environments should implement solutions to control the server room air
composition by means of advance air purifying equipments.

We further extended the study to understand the monetary impact of the parts replacement and
corrosion on the supplier and client. Our analysis concluded that the supplier is offering an AMC
to the client with improper assessment of the total cost of the AMC. An analysis was made to
indicate what should be the price of the AMC if the environment under which the servers
operate is not controlled, and what should be the price of AMC if the air environment is a
controlled one. A separate analysis was made for client to understand if the AMC amount paid
by the client is worth it, what is the monetary impact of not implementing the air purifying
equipment solutions on the clients overall service business, and what amount of savings can be
fetched if the solutions are implemented.

6|P age
ACKNOWLEDGEMENT

This dissertation is a part of our program; Post Graduate Diploma Management at Management
Development Institute. It is written during July-September 2013. It was very interesting to learn
this way of study.

Thank God! We have done it. We would like to thanks my Faculty advisor Dr. Narain Gupta,
without their guidance and supervision; I cannot be able to write this dissertation in time. He
encouraged and pushed me, in right direction. I learnt much from my guide during work and
during supervision. I would also like to thank my industry guide who supported and guided in
all way when ever required.

I would like to say special thank to our chair Dr. Neelu S. Bhullar who guided us throughout the
academic session.

Management Development Institute, October 2013

7|Page
Table of Contents
List of Figures ...................................................................................................................................... 10
List of Tables ........................................................................................................................................ 11
Abbreviations ...................................................................................................................................... 12
Chapter 1. Introduction: ..................................................................................................................... 13
1.1 Problem statement ..................................................................................................................... 13
1.2 Research objective ...................................................................................................................... 13
1.3 Target industry ............................................................................................................................ 14
Chapter 2. Literature review .............................................................................................................. 14
2.1 International organization and committees ............................................................................... 14
2.2 Defined standards for data centres ........................................................................................... 15
2.3 Introduction of Purafil Inc. .......................................................................................................... 18
2.4 Affected industries in all over world ........................................................................................... 19
Chapter 3. Framework & Methodology .......................................................................................... 20
3.1 Research approach...................................................................................................................... 20
3.2 Research strategy........................................................................................................................ 20
3.3 Data collection methods & source .............................................................................................. 20
3.4 Sample data and their report (Live data) .................................................................................... 21
Chapter 4. Current study .................................................................................................................... 21
4.1 Definition of data centre, classifications..................................................................................... 21
4.2. Corrosion .................................................................................................................................... 22
4.2.1 What is electronic corrosion? .............................................................................................. 22
4.2.2 Types of corrosion................................................................................................................ 23
4.2.3 Causes of corrosion .............................................................................................................. 24
4.2.4 Sign of corrosion .................................................................................................................. 25
4.2.5 Sources of corrosive gaseous ............................................................................................... 26
Chapter 5 Qualitative and Quantitative analysis .............................................................................. 27
5.1 Does any equipment survive in GX environment? ..................................................................... 27
5.2 Why majorly copper and silver testing required (gold in some cases) ....................................... 27
5.3 Substitute material other than copper/silver/Gold .................................................................... 28
5.4 Data available (ABC telecom)...................................................................................................... 28
5.5 Limitations of data ...................................................................................................................... 31
5.6 Symptoms of hardware failure ................................................................................................... 31
5.7 Causes of hardware failure ......................................................................................................... 31

8|Page
5.8 Cause and effect diagram (fishbone diagram) ............................................................................ 32
5.9 Data analysis ............................................................................................................................... 32
5.10 Pareto chart analysis ................................................................................................................. 33
5.11 Root cause analysis of corrosion (why-why)............................................................................. 34
5.12 Who affected directly & indirectly ............................................................................................ 35
5.13 Solution ..................................................................................................................................... 35
5.14 Monetary comparison and cost expenses (ABC telecom perspective) .................................... 35
5.15 Monetary benefit and cost expenses (OEM perspective) ........................................................ 39
5.16 Win-Win condition (ABC telecom, XYZ Corp., Purafil Inc.) ....................................................... 42
Chapter 6. Measuring methodology (Purafil Inc.) ........................................................................ 43
6.1 Assessment ................................................................................................................................. 43
6.2 Real time assessment.................................................................................................................. 43
Chapter 7. Protection and control (By Purafil Inc.) .......................................................................... 44
7.1 Protection methodology ............................................................................................................. 44
7.2 Control process ........................................................................................................................... 45
Chapter 8. Findings / Outcomes ......................................................................................................... 46
8.1 Causes & impact - controllable ................................................................................................... 46
8.2 Causes & impact - partial controllable ........................................................................................ 46
8.3 Causes - Uncontrollable .............................................................................................................. 46
Chapter 9. Conclusion ......................................................................................................................... 47
Chapter 10. Recommendation ............................................................................................................ 47
References:........................................................................................................................................... 50

9|Page
List of Figures
Figure No. Description

Figure 1 History of Institution ............................................................................................................... 15


Figure 2 “Whisker” growth on circuit ................................................................................................... 23
Figure 3 Example of copper creep corrosion on a lead-free circuit board ........................................... 23
Figure 4 One of the failed HBA card...................................................................................................... 25
Figure 5 Copper creep corrosion on Memory module ......................................................................... 25
Figure 6 Hard Disk with Copper creep corrosion .................................................................................. 26
Figure 7 Fishbone diagram .................................................................................................................... 32
Figure 8 Pareto chart of cause of hardware failure .............................................................................. 33
Figure 9 Failure rate matrix................................................................................................................... 34
Figure 10 why-why analysis .................................................................................................................. 34
Figure 11 cost of downtime .................................................................................................................. 37
Figure 12 Monetary analysis (ABC telecom perspective) ..................................................................... 38
Figure 13 Pareto analysis cost of causes in hardware failure ............................................................... 40
Figure 14 Monetary analysis (XYZ Corp. perspective) .......................................................................... 41

10 | P a g e
List of Tables
Table No. Description

Table 1 - ISA STANDARD S71.04-1985 WITH REVISED CHANGES........................................................ 16


Table 2 DATA CENTRE CLASSIFICATIONS .............................................................................................. 22
Table 3 Live data ABC telecom .............................................................................................................. 28
Table 4 showing result before Purafil Inc. equipment Installation into DC environment .................... 30
Table 5 showing result after Purafil Inc equipment Installation into DC environment ........................ 31
Table 6 Below is the table showing cost incurred by OEM ................................................................... 39

11 | P a g e
Abbreviations
DC data centre

OEM original equipment manufacturer

IT information technology

AMC annual maintenance cost

ERP enterprise resource planning

ISA international society of automation

IEC international electrochemical commission

ASHRAE American society of heating, refrigerating and air-conditioning

RoHS Restriction of Hazardous Substances

EU European Union

Å angstroms

ISO International Organization for Standardization

RH relative humidity

Ca cause

C corrosion

F firmware

P power

M man

T total

HBA host bus adapter

CRAC Computer Room Air Conditioning

AHU air handling unit

12 | P a g e
Chapter 1. Introduction:
ABC Telecom Company situated at Noida near Yamuna bank. The corporate office along with
their DC (data centre) situated within same premises. The company using all type of IT
equipment like storage, server, network, library etc with ERP applications. Being a non IT
company they are highly dependent on hassle free operation of data centre. All corporate,
employee, business people, customer, service centres, outlet in all over the country using ERP
application to run business operations. The DC implemented and automated by XYZ corp. (OEM
& Vendor). All IT equipment supply, maintenance, warranty and AMC have been taken care by
XYZ corp.

1.1 Problem statement


ABC had shifted from some other location to Noida location during April 2011. After some
month operation, the hardware failing becomes started. Since January 2012, I observed that
hardware failure became so frequent and replacement of electronic component by OEM
happening again & again in very shorter period of time. Whether replaced electronic
equipment/component was different OR same that was irrelevant in terms of failure. The
reliability of smooth business operation was become questionable. The availability of ERP
application interrupting by several downtime of DC (Due to hardware failure) which directly
affecting business operations, profit and customer loyalty. In other side OEM was also facing
hardware failure issue with the same organization as well as with other organization
(customer) in this region. I was the person who was interacting with both ABC telecom and XYZ
corp. As discussed with vendor and other sources, corrosion is only main issue which occurs
due to surrounding environment and responsible to hardware failure and replacements. Every
DC environment is very crucial and important for well functioning of equipment. This live issue
became challenging for us (Myself, Customer, Vendor) to keep running business operation
efficiently and smoothly.

1.2 Research objective


1. To investigate the causes of failure and thereby the replacement of the parts in the client
server room.

2. To delve into the key causes further using the problem solving tools, and explore the
possible solutions to address the main causes.

3. To analysis the monetary impact of the parts failure on the clients business, and impact of
parts replacements on the suppliers overall cost.

13 | P a g e
1.3 Target industry
As we are discussing about data centre and IT industry, so our focus would be surrounded and
limited around electronic equipments of Information technology industry. Manufactured by
major OEM for example Sun Microsystems, HP, IBM, Cisco, Cray, EMC, Hitachi, Seagate, SGI,
NetApp.

Chapter 2. Literature review


Some literature and research already performed earlier globally on these platforms. Which
consider the critical points of current knowledge including substantive findings, as well as
theoretical and methodological contributions on these topic.

2.1 International organization and committees


Some international organization and committees reported this issue far before and set
standards and guidelines.

ISA (ISA Standard 71.04-1985)


The International Society of Automation (www.isa.org). Earlier known as instruments Society of
America. It is a leading, global, non-profit organization that is setting the standard for
automation. ISA develops standards, certifies industry professionals, provides education and
training, publishes books and technical articles, and hosts conferences and exhibitions for
automation professionals. Founded in 1945

IEC (Standard, IEC 60654-4 (1987-07)


International Electrotechnical Commission (http://www.iec.ch).Millions of devices that contain
electronics, and use or produce electricity, rely on IEC International Standards and Conformity
Assessment Systems to perform, fit and work safely together. Founded in 1906, the IEC
(International Electrotechnical Commission) is the world’s leading organization for the
preparation and publication of International Standards for all electrical, electronic and related
technologies. These are known collectively as “electrotechnology”. The IEC is one of three global
sister organizations (IEC, ISO, ITU) that develop International Standards for the world.

ASHRAE , ASHRAE Standard 127 (ASHRAE 2007) https://www.ashrae.org/


ASHRAE, founded in 1894. The Society focus on building systems, energy efficiency, indoor air
quality, refrigeration and sustainability within the industry. Through research, standards

14 | P a g e
writing, publishing and continuing education, ASHRAE shapes tomorrow’s built environment
today. ASHRAE was formed as the American Society of Heating, Refrigerating and Air-
Conditioning Engineers by the merger in 1959 of American Society of Heating and Air-
Conditioning Engineers (ASHAE) founded in 1894 and The American Society of Refrigerating
Engineers (ASRE) founded in 1904.
History of standards followed by different Institutions.

ISA (ISA Standard 71.04-1985)

Followed by IEC

IEC (Standard, IEC 60654-4 (1987-07)

Followed by ASHRAE

ASHRAE, ASHRAE Standard 127 (ASHRAE 2007)

Figure 1 History of Institution

European Union (Directive 2002/95/EC, RoHS )


RoHS, short for Directive on the restriction of the use of certain hazardous substances in
electrical and electronic equipment, was adopted in February 2003. The RoHS directive took
effect on 1 July 2006, and is required to be enforced and become law in each member state.

2.2 Defined standards for data centres


The optimum severity level is G1 (mild). At this level, corrosion is not a factor in determining
equipment reliability. As the corrosive potential of an environment increases, the severity level
will be classified as G2, G3 and GX (the most severe). The effects of humidity and temperature
are also quantified in this standard. High or variable relative humidity and elevated
temperatures may cause the acceleration of corrosion by gaseous contaminants. Relative
humidity of less than 50% is specified by the standard.

15 | P a g e
Table 1 - ISA STANDARD S71.04-1985 WITH REVISED CHANGES

ISA STANDARD S71.04-1985 WITH REVISED CHANGES

AIR Copper Silver


Class QUALITY CORROSION CORROSION Comments
Severity Reactivity Reactivity

An environment sufficiently well-controlled


such that corrosion is not a factor in
G1 Mild <300Å <200Å
determining equipment reliability.

An environment in which the effects of


corrosion are measurable and corrosion
G2 Moderate <1000Å <1000Å may be a factor in determining equipment
reliability.

An environment is which there is a high


probability that corrosive attack will occur.
These harsh levels should prompt further
G3 Harsh <2000Å <2000Å evaluation resulting in environmental
controls or specially design and packaged
equipment.

An environment in which only specially


designed and packaged equipment would
be expected to survive. Specifications for
GX Severe >2000Å >2000Å
equipment in this class are a matter of
negotiation between user and supplier.

Note: Å = angstroms which is a unit of length equal to 1/10,000,000,000 (one ten billionth) of a meter.

16 | P a g e
Recommended Operating Environment

Temperature 18°C (64.4°F) to 27°C (80.6°F)


Low-end moisture 5.5°C (41.9°F) dew point
High-end moisture 60% relative humidity or 15°C (59°F) dew point

Gaseous contamination
Severity level G1 as per ISA 71.04 (ISA 1985) which states that the reactivity rate of copper
coupons shall be less than 300 Å/month. In addition, the reactivity rate of silver coupons shall
be less than 200 Å/month. The reactive monitoring of gaseous corrosively should be conducted
approximately 2 in. (5 cm) in front of the rack on the air inlet side, at one-quarter and three-
quarter frame height off the floor or where the air velocity is much higher.

Particulate contamination
1. Data centres with or without air-side economizers must meet the cleanliness level of ISO class
8.
2. The deliquescent relative humidity of the particulate contamination should be more than
60%.
3. Data centres must be free of zinc whiskers.
4. For data centre without air-side economizer, the ISO class 8 cleanliness may be met simply by
the choice of the following filtration:
a. The room air may be continuously filtered with MERV 8 filters
b. Air entering a data centre may be filtered with MERV 11 or preferably MERV 13
filters.

5. For data centres with air-side economizers, the choice of filters to achieve ISO class 8
cleanliness depends on the specific conditions present at that data centre. In general, air
entering a data centre may require to be filtered using MERV 11 or preferably MERV 13 filters.

RoHS (Restriction of Hazardous Substances Directive)

The EU’s RoHS Directives restrict the use of six substances in the manufacture of various types
of electronic and electrical equipment: mercury (Hg), lead (pb), hexavalent chromius (Cr(VI)),
cadmium (cd), polybrominated biphenyls (PBB), Polybrominated diphenyl ether (PBDE).
Alternatives such as immersion silver (ImmAg) and organically coated copper (OCC) are

17 | P a g e
currently being used as replacement board finishes, however, ongoing research has shown that
printed circuit boards made using lead-free materials can be more susceptible to corrosion.
Companies selling a broad range of electrical goods in EU must now conform
RoHS. These rules, while laid down at European level. When exporting to Europe, it is essential
to comply with national law in each relevant country.
Classifying and subsequently monitoring gaseous contaminants is a proven
successful way to improve air quality and comply with ISA standard S71.04-1985 and RoHS.
This standards defines or characterizes environment in terms of their overall corrosion
potential.

2.3 Introduction of Purafil Inc.

ABOUT CORROSION CONTROL EXPERTS:


PURAFIL, INC. is a publicly held company owned by the Kaydon Corporation (NYSE:KDN) and
headquartered in Doraville, Georgia, United States of America. Purafil revolutionized the gas
phase air filtration industry in the early 1960s with the development of the world’s first active
oxidant-impregnated, air cleaning pellet called “Purafil.” 50 years later, Purafil remains a world
leader in the development of innovative monitoring and gas phase air filtration technologies
designed to control odorous, corrosive and toxic gases

Purafil pioneered many current environmental standards. By maintaining a hands-on approach,


Purafil plays an active role in the development of regulations for contamination control in
industrial, commercial, water-wastewater, clean room, preservation, ethylene, high purity, data
centre and power generation markets.

Purafil is a single source manufacturer of gas phase air filtration media and equipment. Purafil
not only manufactures the products, but also offers technical services specific to each market
such as: air quality assessment, circuit board failure analysis, media life analysis, consultation,
on-site gas testing, system startup, and other on-going services. This focus and expertise makes
Purafil more knowledgeable about the applications of gas phase technology than anyone else in
the field, and results in the use of Purafil products in nearly every part of the world. Purafil is
supported by a network of representative firms throughout the United States and in over 60
countries.

18 | P a g e
HISTORY OF INDUSTRY FIRSTS
• First to engineer, manufacture, and patent potassium and sodium permanganate-
impregnated media for oxidation of pollutants.
• First to engineer and manufacture a UL Classified, synthesized carbon media for
neutralization of corrosive airborne pollutants.
• First to develop and patent OnGuard® quartz crystal microbalance technology for
continuous, real-time monitoring of air quality.
• First to develop the Purafilter® pleated filter with gas phase and particulate filtration
capabilities for retrofitting existing air handlers.
• First to develop and patent the Posi-Track™ technology for improving efficiency in gas
phase air filtration systems.
• First to develop MediaPIK™ software for selection of Purafil media type and quantity
based on unique application requirements.
• First to engineer, manufacture, and patent Purafil SP, the only pellet to contain 12%
sodium permanganate for the oxidation of pollutants.
• First to receive the prestigious Frost & Sullivan Award for Global Gas Phase Air
Filtration Product Line Strategy.

2.4 Affected industries in all over world


As we are discussing about IT industry. But apart IT industry other industry of all over the
worlds are also affected due to environmental corrosion. For example Banking & Financial
Institution, Communications / Media , Consulting / IT Services Company, Datacom / IT equip.
manufacturer / supplier, Government Facility, Healthcare / Medical, Internet Data Center (IDC),
Internet Service Provider (ISP), Manufacturer (non-datacom / IT ), Research and Development
Laboratory, Retail Goods / Services, Telecommunication Company, Transportation & Logistics
Company, TV / Broadcasting Company.

In the last year alone Purafil has received corrosion-related inquiries from almost 150 locations
throughout the Asia-Pacific region, Europe, and North America. For the inquiries received in
2010, over 90% involved corrosion-related equipment failure. Almost all were from locations
where the ambient (outside) air was high in sulfur compounds (e.g., sulfur oxides, active sulfur
species) and the failure mechanism was identified as sulfur creep corrosion. Average corrosion
rates for outdoor air (where monitored) from a number of locations confirmed what was
already suspected – those locations with the highest rate of corrosion failures also had the some
of highest copper and silver reactivity rates.

19 | P a g e
Chapter 3. Framework & Methodology
3.1 Research approach
Qualitative and Quantitative methods are two main research approaches to choose when
conducting research.

In this study we have used a qualitative approach as the study characterizes a complete and
comprehensive view and understanding of phenomenon in its entirety. As this study is aimed at
gaining deeper understanding of the phenomenon under investigation and acquiring richer
knowledge of a complex situation, which also requires assessing abundant information, thus a
qualitative study is best suited for this.

While In these study quantitative methods results are based on live available data and some
assumptions. Because available data is limited in nature.

3.2 Research strategy


The research based on current industry issue. To find out some outcomes and results, i looked
upon previous article, journals, issue reported earlier with same properties across the world.
Also explored and studied the international organisation and their guidelines in real context.

Because the purpose and objective of this thesis / dissertation closely resemble the form of
research question “HOW” and “WHY”.

With addition to theoretical base also want to prove, there are some impact and losses in terms
of business operations in both side perspective Customer and Vendor.

3.3 Data collection methods & source


Initial data collection based on conversation and discussion with ABC telecom and XYZ corp.
persons. Later on discussed with Purafil Inc representative and communicated over the mail. All
concerned person shared their relevant data as per availability. I also searched article, journal,
and other documents over the internet.

20 | P a g e
Some Major sources of data:
ABC telecom
XYZ corp.
ISA
IEC
EU
ASHRAE
Purafil Inc.
Internet

3.4 Sample data and their report (Live data)


These researched based on live sample data which have been logged with OEM by customer
when ever hardware failed reported. Some data centre environment measurement reports are
also available which measured by Purafil Inc.

Chapter 4. Current study


4.1 Definition of data centre, classifications
A data center is a facility used to house computer systems and associated components, such as
telecommunications and storage systems. It generally includes redundant or backup power
supplies, redundant data communications connections, environmental controls (e.g., air
conditioning, fire suppression) and security devices.

IT operations are a crucial aspect of most organizational operations around the world. One of
the main concerns is business continuity; companies rely on their information systems to run
their operations. If a system becomes unavailable, company operations may be impaired or
stopped completely. It is necessary to provide a reliable infrastructure for IT operations, in
order to minimize any chance of disruption. Information security is also a concern, and for this
reason a data center has to offer a secure environment which minimizes the chances of a
security breach.

21 | P a g e
Table 2 DATA CENTRE CLASSIFICATIONS

Tier Level Requirements


• Single non-redundant distribution path serving the IT equipments
1 • Non-redundant capacity components
• Basic site infrastructure guaranteeing 99.671% availability

• Fulfils all Tier 1 requirements


2 • Redundant site infrastructure capacity components guaranteeing 99.741%
availability
• Fulfils all Tier 1 & Tier 2 requirements
3 • Multiple independent distribution paths serving the IT equipments
• All IT equipments must be dual-powered and fully compatible with the
topology of a site's architecture
• Concurrently maintainable site infrastructure guaranteeing 99.982%
availability
• Fulfils all Tier 1, Tier 2 and Tier 3 requirements
4 • All cooling equipment is independently dual-powered, including chillers and
Heating, Ventilating and Air Conditioning (HVAC) systems
• Fault tolerant site infrastructure with electrical power storage and
distribution facilities guaranteeing 99.995% availability

4.2. Corrosion
Corrosion of metals is actually a chemical reaction caused primarily by attack of gaseous
contaminants and is accelerated by heat and moisture. Rapid shifts in either temperature or
humidity cause small portions of circuits to fall below the dewpoint temperature, thereby
facilitating condensation of contaminants. Relative humidity above 50% accelerates corrosion
by forming conductive solutions on a small scale on electronic components. Microscopic pools
of condensation then absorb contaminant gases to become electrolytes where crystal growth
and electroplating occur. Above 80% RH, electronic corrosive damage will occur regardless of
the levels of contamination.

4.2.1 What is electronic corrosion?


In the context of electronic equipment, corrosion is defined as the deterioration of a base metal
resulting from a reaction with its environment. More specifically, corrosive gases and water
vapour coming into contact with a base metal result in the build up of various chemical reaction

22 | P a g e
products. As the chemical reactions continue, these corrosion products can form insulating
layers on circuits which can lead to thermal failure or short-circuits. Pitting and metal loss can
also occur.

4.2.2 Types of corrosion


Corrosion can be thought of two distinct type
A- First - Whisker Growth, in which microscopic metals crystals grow out of the surface
of the conductive metals, is caused by the presence of sulphide molecules e.g., silver
sulfide on a silver surface, which can migrate freely over the metallic surface and
collect at dendrite boundaries where nucleation takes place and sulfide crystals
grow out of the surface of the metal.
B- Second - second type being the more conventional corrosive attack where the acid
gases react with the metals themselves to form non-conductive salts.

Figure 2 “Whisker” growth on circuit

Figure 3 Example of copper creep corrosion on a lead-free circuit board

23 | P a g e
4.2.3 Causes of corrosion

Corrosion causes due to two reason, first Particulate contaminants and second Gaseous
contaminants. Both come under airborne contaminants.

Particulate contaminants - Failure modes due to dust include but are not limited to the
following, Mechanical effects include obstruction of cooling airflow ,interference with moving
parts, abrasion, optical interference, interconnect interference, or deformation of surfaces (e.g.,
magnetic media) and other similar effects. Chemical effects. Dust settled on printed circuit
boards can lead to component corrosion and/or to the electrical short circuiting of closely
spaced features. Electrical effects. Electrical effects include impedance changes and electronic
circuit conductor bridging.
Harmful dust in data centres is generally high in ionic content, such as sulphur and chlorine-
bearing salts. The source of this harmful dust is mainly outdoor dust. Coarse dust particles have
a mineral and biological origin, are formed mostly by wind-induced abrasion, and can remain
airborne for a few days.
One mechanism by which dust degrades the reliability of printed circuit boards involves the
absorption of moisture from the environment by the settled dust. The ionic contamination in the
wet dust degrades the surface insulation resistance of the printed circuit board and, in the
worst-case scenario, leads to electrical short circuiting of closely spaced features via ion
migration. the dust will absorb moisture, get wet, and promote corrosion and/or ion migration,
thereby degrading hardware reliability.

Gaseous contaminants - Sulfur-bearing gases, such as sulfur dioxide (SO2) and hydrogen
sulfide (H2S),are the most common gases causing corrosion of electronic equipment. it has been
shown that SO2 or H2S alone are not very corrosive to silver or copper, but the combination of
these gases with other gases such as nitrogen dioxide (NO2) and/or ozone (O3) are very
corrosive. The corrosion rate of copper is a strong function of relative humidity, while the
corrosion rate of silver has lesser dependence on humidity. Reduced forms of nitrogen
(ammonia (NH ), amines, ammonium ions (NH 4+)) occur mainly in fertilizer plants,
agricultural applications, and chemical plants. Copper and copper alloys are particularly
susceptible to corrosion in ammonia environments.
There are three types of gases that are the prime culprits in the corrosion of electronics:
acidic gases, such as hydrogen sulfide, sulfur and nitrogen oxides, chlorine, and hydrogen
fluoride; caustic gases, such as ammonia; and oxidizing gases, such as ozone and nitric acid.
Of the gases that can cause corrosion, the acidic gases are typically the most harmful.

24 | P a g e
4.2.4 Sign of corrosion
With either copper, silver, or composite materials, the end result is the same: a disruption of the
contact point. The severity of the environment (i.e., the types and levels of gases, humidity, and
temperature) will determine the speed in which these films are created and the level of
disruption of the flow of electrical current.
Specific components which are particularly sensitive to corrosion attack include: Edge
Connectors, Pin Connectors, Wire-wrap Connections, and Electrical Systems.

Figure 4 One of the failed HBA card

Figure 5 Copper creep corrosion on Memory module

25 | P a g e
Figure 6 Hard Disk with Copper creep corrosion

4.2.5 Sources of corrosive gaseous

 Oxidized forms of sulfur (SO2, SO3) are generated as combustion products of fossil fuels
and from motor vehicle emissions.
 The reaction with metals normally occurs when these gases dissolve in water to form
sulphurous and sulfuric acid (H2SO3 and H2SO4).
 Some common sources of reactive gas compounds (NO, NO2, N2O4) are formed as
combustion products of fossil fuels. In the presence of moisture, some of these gases
form nitric acid (HNO3)
 Active sulfur compound refers to hydrogen sulfide (H2S), elemental sulfur (S), and
organic sulfur compounds such as mercaptans (R-SH). When present at low ppb levels,
they rapidly attack copper, silver, aluminum, and iron alloys. The presence of moisture
and small amounts of inorganic chlorine compounds and/or nitrogen oxides greatly
accelerate sulfide corrosion.
 Chlorine (Cl2), chlorine dioxide (ClO2), hydrogen chloride (HCl), etc. In the presence of
moisture, these gases generate chloride ions that, in turn, attack most copper, tin, silver,
and iron alloys.

26 | P a g e
 In addition to ozone (O3), a list of examples would include the hydroxyl radical as well
as radicals of hydrocarbons, oxygenated hydrocarbons, nitrogen oxides, sulfur oxides,
and water. Ozone can function as a catalyst in sulfide and chloride corrosion of metals.
 Strong oxidants This includes ozone plus certain chlorinated gases (chlorine, chlorine
dioxide). Ozone is an unstable form of oxygen that is formed from diatomic oxygen by
electrical discharge or by solar radiation in the atmosphere.

Chapter 5 Qualitative and Quantitative


analysis
5.1 Does any equipment survive in GX environment?

In one study that looked at lead-free finishes, four alternate PCB finishes were subjected to an
accelerated mixed flowing gas corrosion test. Important findings can be summarized as follows:
 Immersion gold (ENIG) and immersion silver (ImmAg) surface finishes failed early in
the testing. These coatings are the most susceptible to corrosion failures and may make
the PCB the weak link with regards to the sensitivities of the electronic devices to
corrosion.
 None of the coatings can be considered immune from failure in an ISA Class G3
environment.
 The gold and silver coatings could not be expected to survive a mid to high Class G2
environment based on these test results.

5.2 Why majorly copper and silver testing required (gold in some cases)

Copper, silver and gold are some important functional material found in many
electrical/electronic devices. Examination of both silver and gold corrosion data have shown
instances of an environment which is non corrosive to copper being extremely corrosive to
silver and/or gold. It is because of results such as these that any testing which attempts to
predict electrical/electronic equipment reliability should incorporate copper, silver and gold
corrosion as determinants.

27 | P a g e
5.3 Substitute material other than copper/silver/Gold
Copper and silver is best conducting material, if supplier provide different material then he
need to make sure the conductivity, price, competition etc. Lot of R & D has been done on this
but being a competitive world and looking to conductivity and communication, it is difficult to
avoid 100% copper and silver free cards.

5.4 Data available (ABC telecom)


Below table showing some live and real data of hardware failure occurrence in ABC Telecom
Company. The all call number showing that ABC telecom reported to XYZ Corporation and
logged case to rectify the issue. Because XYZ is the vendor and AMC provider of all IT equipment
as already discussed earlier. Here logged call purely related with hardware failure (Non
hardware call filtered). Since Jan 2012 to June 2013, during the one and half year(18 month),
there are 71 case logged with OEM and out of these total 71 call, 54 happen due to corrosion, 9
occurs due to Firmware issue, 7 reported due to Power issue and last 1 issue recorded because
of MAN error.
Table 3 Live data ABC telecom

Call Nber Platform Date Prob related to Ca C F P M T


P4F0SSY Storage 12 January 2012 netapp ext hard disk c 1 1
P4FJVMB Storage 16 January 2012 netapp ext hard disk c 1 1
P4FK85V XSeries 10 February 2012 Blade server HBA c 1 1
P4FK86J XSeries 10 February 2012 Blade server HBA c 1 1
P4FHLB4 Storage 22 February 2012 netapp ext hard disk c 1 1
P4FHH7B XSeries 26 February 2012 Blade server HBA c 1 1
P4FX11X XSeries 29 February 2012 Blade server HBA c 1 1
P4FW89R Storage 12 March 2012 netapp ext hard disk c 1 1
P4FWWTS XSeries 15 March 2012 blade server HBA c 1 1
P4FYBBN XSeries 27 March 2012 blade server RAM F 1 1
P4FYPL8 Storage 26 March 2012 netapp ext hard disk c 1 1
P4FB9ZL XSeries 28 March 2012 blade centre I/o module F 1 1
P4FCGB1 XSeries 12 April 2012 blade battery power lost p 1 1
P4FCXG6 Storage 12 April 2012 netapp ext hard disk c 1 1
P46DPZ7 Storage 30 April 2012 netapp ext hard disk c 1 1
P46LNRX XSeries 17 May 2012 blade server RAM F 1 1
P46LN52 XSeries 17 May 2012 blade battery power lost p 1 1
P46L0VF Storage 17 May 2012 netapp ext hard disk c 1 1
P4658SY Storage 13 June 2012 netapp ext hard disk C 1 1
P46N9XF XSeries 19 June 2012 blade server RAM F 1 1
P46N9HK XSeries 19 June 2012 Blade server HBA c 1 1
P460FYY Storage 02 July 2012 netapp ext hard disk c 1 1
P4606H5 XSeries 02 July 2012 netapp ext hard disk c 1 1
P46010H Storage 02 July 2012 SW power module failed p 1 1
P4606J7 XSeries 02 July 2012 Blade server HBA c 1 1

28 | P a g e
P460RPW Storage 05 July 2012 SW power module failed p 1 1
P46JH2Z XSeries 10 July 2012 Blade server HBA c 1 1
P462HJV XSeries 16 July 2012 blade server hba c 1 1
P46TBGY XSeries 10 September 2012 blade server HBA c 1 1
P46TBGS XSeries 10 September 2012 blade server HBA c 1 1
P46T0LV Storage 06 September 2012 Com. Net app replaced c 1 1
P46MDWV XSeries 28 September 2012 blade server HBA c 1 1
P46CZBJ Storage 27 September 2012 netapp ext hard disk c 1 1
P41DKKS XSeries 11 October 2012 blade server RAM M 1 1
P4187KP XSeries 18 October 2012 Mother board F 1 1
P417MCM Storage 31 October 2012 netapp ext hard disk c 1 1
P416ZVM XSeries 07 November 2012 blade server mother board c 1 1
P4151CB XSeries 17 November 2012 Blade server HBA c 1 1
P41NFJN Storage 22 November 2012 netapp ext hard disk c 1 1
P41JH2P XSeries 11 December 2012 Blade server HBA c 1 1
P41JH24 XSeries 11 December 2012 blade server HBA c 1 1
P41JH20 XSeries 11 December 2012 blade server HBA c 1 1
P41PCXS Storage 31 December 2012 DS3400 hard disk 1 TB C 1 1
P41ZDMB XSeries 05 January 2013 FIRMWARE Mother board F 1 1
P41ZDS7 XSeries 05 January 2013 FIRMWARE Mother board F 1 1
P41W8NS XSeries 25 January 2013 blade server mother board F 1 1
P41W0L3 XSeries 28 January 2013 Blade server HBA c 1 1
P41W079 XSeries 28 January 2013 blade battery power lost p 1 1
P41TG4L XSeries 04 February 2013 blade battery power lost P 1 1
P41YBP7 Storage 10 February 2013 netapp ext hard disk c 1 1
P41BT9C XSeries 14 February 2013 blade server HBA c 1 1
P41CLDF XSeries 21 February 2013 FIRMWARE Mother board F 1 1
P41CHG7 Storage 25 February 2013 netapp ext Hard disk c 1 1
P41MC9P XSeries 02 March 2013 Blade server HBA c 1 1
P459G8H XSeries 14 March 2013 Blade server mother board c 1 1
P459C48 XSeries 15 March 2013 blade battery power lost p 1 1
P458LWP Storage 18 March 2013 netapp ext hard disk c 1 1
P45L5LK Storage 22 March 2013 netapp ext hard disk c 1 1
P45LHHX Storage 25 March 2013 netapp ext Hard disk c 1 1
P457GYS XSeries 31 March 2013 blade server RAM c 1 1
P4565KW XSeries 06 April 2013 Blade server HBA c 1 1
P4563WX Storage 08 April 2013 netapp ext hard disk c 1 1
P451H1V XSeries 14 April 2013 blade server mother board c 1 1
P457XZS Storage 30 March 2013 Com. Net app replaced c 1 1
P45J74Z Storage 06 May 2013 netapp ext hard disk c 1 1
P45JZDM Storage 07 May 2013 DS3400 hard disk 1 tb c 1 1
P452XTN XSeries 13 May 2013 blade server RAM c 1 1
P4535Y1 Storage 15 May 2013 netapp ext hard disk c 1 1
P45HXY9 XSeries 11 June 2013 blade server RAM c 1 1
P45XPLW Storage 13 June 2013 netapp ext hard disk c 1 1
P45WHTD XSeries 25 June 2013 blade server HBA c 1 0 0 0 1
T 54 9 7 1 71

29 | P a g e
Ca Cause
C Corrosion
F Firmware
P Power
M MAN
T Total

Along with above data, having two Data centre test coupon (copper/silver reactivity) report
which one showing DC environment before Installation of Purafil Inc air cleaning equipment
and another is showing after installation. Purafil air cleaning equipment came into operation at
23 November 2012. Below is the outcome of both report, one is before and another is after this
date.

Table 4 showing result before Purafil Inc. equipment Installation into DC environment

Sales Order #: 99547


CCC Panel #: P81268
Date In: 4/13/2012
Date Out: 5/19/2012
Days In Service:36
CCC Panel # P81268
ISA Class G2
Moderate
Copper Corrosion
738 Å/30 Days
Silver Corrosion
582 Å/30 Days

30 | P a g e
Table 5 showing result after Purafil Inc equipment Installation into DC environment

Sales Order #: 1000879


CCC Panel #: P83846
CCC Coupon #: 83846
Date In: 7/15/2013
Date Out: 8/14/2013
Days In Service: 30
CCC Panel # P83846
ISA Class G1
Mild
Copper Corrosion
120 Å/30 Days
Silver Corrosion
160 Å/30 Days

5.5 Limitations of data


There are very limited live data available and received from industry. So for analysis purpose I
will make some , will do analysis, brings some output & result .That will be applicable in real
scenario.

5.6 Symptoms of hardware failure


Symptoms of hardware failure means, how we can identify and confirmed that actually
hardware became failed. What are the symptoms of failure?
Symptoms are, when hardware not detecting in the system and not working
properly and showing error/alert into the log file. We can check it after login into the system.
Also check via bios list. There are other physical feature like led lights (showing orange color
led) available into the system, which will show you there is some issue with hardware.

5.7 Causes of hardware failure


I went with so many article and journals, who having research and information about hardware
failure. But the question is “How many reasons could be possible to fail hardware”. To
identify this,gone with several conversation with many OEM hardware engineers which are
working professional and having many years of experience.

31 | P a g e
Then came at result that only four reason causes to hardware failure.
These are:
1) Corrosion
2) Firmware
3) Power
4) Man

5.8 Cause and effect diagram (fishbone diagram)


So now after knowing the reasons of hardware failure, I analysed the cause and effects of all
reason.

Cause Effect
MAN Firmware
Bug in chip

Poor Documentation
Bug in old firmware

Not following standard


Not Proper Assembled practice Bug in board

Inadequate monitoring
Firmware upgrade error
Poor Training / lack of
technical knowledge
Hardware
Failure
Industry gases Lack of Cleanness
Working /storage practice
Multiple time power outage Gaseous contaminants inside the DC
( Corrosive gases )
Particulate contaminants
Voltage fluctuation (dust)
Out door dust
Electrical / electric gases
Humidity
Not proper earthing
Temperature
Power module failed
Fluctuation
Chloride Presence
Battery power failed
AC not proper functioning
Power Corrosion

Fig – Fishbone diagram of Hardware failure

Figure 7 Fishbone diagram

In the above figure all major causes and their sub-causes identified.

5.9 Data analysis


As we having live data that we can see above in 5.4 sections. I have classified all into four
reasons from the logged case. As we can see classified causes in fishbone diagram.

32 | P a g e
5.10 Pareto chart analysis
Here making a Pareto chart to identify the top reason of hardware failure.

80 120

110
70
98.6 100 100

60 88.74 90
54
80
50 76.06
70

Percent
40 60
Count

50
30
40

20 30

20
10 9
7
10
1
0 0
Corriosion Firmware Power MAN

Figure - Pareto Chart of Causes of Hardware failure

Figure 8 Pareto chart of cause of hardware failure

33 | P a g e
Responsible for 76 % failure, Problem bears due to 25 % of reason

FIRMWARE
CORROSION
Issue bears 25 %
Issue bears 25 %
Responsible 12.68 %
Responsible 76.06 %

POWER MAN
Issue bears 25 % Issue bears 25 %
Responsible 9.86 % Responsible 1.04 %

Figure 9 Failure rate matrix

After studying Pareto chart and failure rate matrix. We can clearly identify that corrosion is the
only reason which is responsible for major downtime, operational loss and hardware failure.

5.11 Root cause analysis of corrosion (why-why)


So here after came at the point 76 % issue and problem arises due to 25% of reason i.e.
Corrosion. This lead we to find out the root cause analysis. So the question, what is the root
cause of corrosion?

WHY-WHY ANALYSIS - Corrosion


Q. Why hardware failed ?
Ans. Because it is not working or not detecting into system
Q. Why it is not working ?
Ans. Because of corrosion issue?
Q. Why corrosion issue occurs ?
Ans. Because of environment factor( gases & dust) problem.
Q. Why environment causes corrosion ?
Ans. Corrosive gases reacts with electronic equipment and make them faulty.
Q. Why corrosive gases presence in the data centre ?
Ans. Because Data centre air quality not meeting the standard and gases entering
through external sources. Gaseous contaminants, Particulate contaminants,
Temperature control, Humidity fluctuations are the responsible factors.

Figure 10 why-why analysis

34 | P a g e
5.12 Who affected directly & indirectly
Corrosion affected directly to the customer who are suffering their operational profit, customer
loyalty due to non availability of their automated IT system in timely manner (24x7). As we
know the ABC is being a telecom company, they need 100% uptime to keep their business
running. Their ERP application using in throughout India by their three types of business user:
1) Mobile VAS (value added service) services.
2) Service centre across country
3) Employee (corporate users)

Indirectly OEM, XYZ corp. also affected because of corrosion issue. As they are responsible to
keep maintain and support their equipment. OEM sold their equipment and signed AMC (annual
maintenance cost) which is their business model to exist into the market. They are committed to
meet the standard and maintain BRAND value also suffering a major loss of revenue which
should not be the essential part of loss.

5.13 Solution
Now the condition is that neither we cannot change the entire environment overnight nor
shutdown our data centre. Because in the 21 century IT(Information technology) plays a role
like heartbeat of body. We cannot survive and sustain into any industry without life line (IT and
automated data centre).

To remove the revenue loss it is mandatory for both customer and OEM perspective. There
should be a solution came at front to control and reduce hardware failure. It will reduce the
operational loss of customer and replacement cost (new hardware) vendor.
To overcome this issue we need to clean and maintain DC environment. This is the only option
to both parties. Maintaining DC environment we need to install new air purifier
equipment/device.

As we analyzed there are two reports available after purafil Inc. device PPU 500V outlet. It has
been controlled the environment and bring down the reaction rate of Copper and Silver as per
ISA Standard 71.04-1985. Once the environment would be in controlled condition it becomes
beneficial in all perspective.

5.14 Monetary comparison and cost expenses (ABC telecom perspective)


Now because of mandatory objective of any organization which is least operational loss. Below
comparison of cost incurred annually by ABC telecom based on assumption.

35 | P a g e
Here we indentified what will happen if customer gone with international DC standard “or”
continue with current uncontrolled environment.

So, before going to compare controlled environment and uncontrolled environment monetary
impact, first we need to know and convert downtime hour impact into numeric value/figure
(cost of downtime). Then only we calculate the benefit of controlled environment.

Cost of Downtime Assumed (In one year)


No. of working days = 300
Operation hours in a day= 8
Total operation hour in a year = 2400
Downtime 1% = 2400 x 0.01 = 24 hour
Now,
Total Service Revenue Financial year 2012-2013 of ABC telecom is Rs. 24019 (in lacs )
Source : ABC telecom
(Audited Consolidated Results for the Financial Year ended June 30, 2013
 Supposing this service revenue generation came from 99% operational hour = Rs
24019. (in lacs)
 Estimated Revenue generation in 100 % operational hour=24019 * (2400/2376) = Rs.
24261.62 (in lacs)
Service Operation Loss at 1% downtime = Rs 24019.62 – Rs 24019 = Rs. 242.62 (in lacs)

Cost of downtime could be any figure with respect to organization to organization. Here my
assumption is, downtime of ERP application directly impacted to service revenue and
contribution profit from service to the organization. Because in frontend, ERP application not
available to end user & customer which leads loss. While in backend it happened due to
hardware failure. Life time value of the downtime saving is yet to be considered.

36 | P a g e
Cost of Downtime
Loss at every 1 % increase downtime

2600.00
2400.00 2426.16
2200.00 2183.55
2000.00
1940.93
1800.00
1698.31
1600.00
1400.00 1455.70
(in lacs)

1200.00 1213.08
1000.00 970.46
Rs

800.00
727.85
600.00
485.23
400.00
200.00 242.62
0.00
Percent 1 Percent 2 Percent 3 Percent 4 Percent 5 Percent 6 Percent 7 Percent 8 Percent 9 Percent 10

Figure 11 cost of downtime

In the above calculation, what are possibilities of service revenue loss for the current year
operation, if downtime gone up to 10%, out of total working hour in whole year.

37 | P a g e
Monitory comparison and cost expenses (ABC telecom perspective)
Cost incurred by Customer
Comparison -> If follow ISA standard or continue with Current Environment.

A C
OEM Air Purifier equipment
Cost of replaced parts ‘OR’ AMC  Installed equipment price would be
 If the cost of supplied/running device is Vs. around 10 lakh and AMC would approx
15% for G2 level environment.
100 lacs then AMC would be approx
15% ( 15 lacs)  Total cost first year- 10 (in lacs)
 Total cost one year- 15 (in lacs)  Second year AMC cost – 1.5 lakh
+
+
B D
Cost of Downtime Cost of Downtime, Since equipment will
cut down the corrosion down time
 Operational loss= 242.62 (in lacs) which is 76.06 % of total downtime
 Operational loss = 58.08 (in lacs)
= =
Total cost first year= A + B = (257.62) in Total cost first year = A + C + D = 83.08 (in
lacs lacs)
Total cost Second year = A+B= (257.62) Total Cost second year=A + C(AMC) + D
and may be more. = 74.58 (in lacs)

Figure 12 Monetary analysis (ABC telecom perspective)

On the basis of above calculation (assume 1% downtime), customer will get benefited within
current year and increasing gradually into next year. This will improve:
 customer loyalty
 brand name
 increase profit
 trust & relationship between company and consumer
 With this customer (ABC telecom) will negotiate lesser AMC charge from next
year.

Here I assumed fix operational profit in longer run i.e. 48.45 lakh. But practically if we apply this
theory, it will increase operational profit every year comparatively as the company grows every
year .This will build Brand name and customer loyalty. Here again some understood assumption

38 | P a g e
is, only applicable in case of regular growing company, positive & Idle market, and having none
global recession scenario.

5.15 Monetary benefit and cost expenses (OEM perspective)

Now here we are going to identify, what is the cost incurred by OEM, what could be the strategy
of OEM in both case (Uncontrolled environment DC and Controlled environment DC) with their
customer? To find some outcome, I assuming two scenarios which will automatically answered
after calculation.
With Uncontrolled environment:
XYZ corp. continue AMC contract in with current price.
OR
XYZ corp. should continue AMC contract with increase price.

AND in Controlled Environment:


XYZ corp. could continue AMC contract with reduced price (if applicable)
OR
XYZ corp. continue AMC contract in controlled environment with non negotiated price.

Table 6 Below is the table showing cost incurred by OEM

Failure
count in Item
Approx 18 based M
Part Price Rs month total cost C C cost F F cost P P cost M Cost
Mother
board 52000 8 416000 3 156000 5 260000 0 0 0 0
HBA card 40000 21 840000 21 840000 0 0 0 0 0 0
RAM 9000 7 63000 3 27000 3 27000 0 0 1 9000
controller 320000 2 640000 2 640000 0 0 0 0 0 0
disk 450 Gb 32000 23 736000 23 736000 0 0 0 0 0 0
disk 1 Tb 35000 2 70000 2 70000 0 0 0 0 0 0
SW switch 260000 2 520000 0 0 0 0 2 520000 0 0
Battery 500 5 2500 0 0 0 0 5 2500 0 0
I/O module 11000 1 11000 0 0 1 11000 0 0 0 0
Total 71 3298500 54 2469000 9 298000 7 522500 1 9000

39 | P a g e
Here we did cost calculation in each manner. i.e.
Total cost of hardware failed or replaced by OEM.
Total cost on the basis of all four causes of hardware failure summarized in below.

Cause C cost Rs P cost Rs F cost Rs Man Rs Total Rs.


Cost in1.5 year in Rs 2469000 522500 298000 9000 3298500
Cost in 1 year in Rs 1646000 348333.33 198666.67 6000 2199000.00
Cost in lacs 16.46 3.48 1.99 0.06 21.99

On the basis of cost incurred by OEM due to Corrosion, power, Firmware, and Man. Here I
analyzing major cause and reason with the help of Pareto chart. This will prove again previous
identification – corrosion is the major issue which needs to address at very sound level.

20 120.00

18
16.46
99.73 100 100.00
16
90.69
14
80.00
12
74.85

10 60.00
Cost in lacs

8
40.00
6 Percent

4 3.48
20.00
1.99
2
0.06
0 0.00
Corrosion cost Power cost Firmware cost Man Cost

Pareto Chart , cost of Causes in Hardware failure

Figure 13 Pareto analysis cost of causes in hardware failure

Here on the basis of Pareto chart analysis, again we came at the point of 74.85% of hardware
failure cost coming from corrosion issue. Just because of uncontrolled environment at customer

40 | P a g e
data centre. Obviously XYZ corp. (OEM) bearing major chunk of over costing in some particular
region, where client data centre located and they have been providing AMC support
consistently.

Now the basis of analyzed data (costing), we are going to prove benefit and loss of XYZ Corp.
what will be their strategy of AMC in case of controlled environment OR uncontrolled
environment with customer? How corrosion issue impacting monetary value OR business
profit?

This below comparison will prove the scenarios as discussed above.

Monitory benefit and cost expenses (XYZ Corp.perspective)


Cost incurred by OEM
Comparison -> If continue AMC in uncontrolled Env or Controlled Env

A C
Uncontrolled Environment Controlled Environment
AMC charged to customer AMC charged to customer
 If the cost of supplied/running device  If the cost of supplied/running device
is 100 lacs then AMC would be Vs. is 100 lacs then AMC would be
approx 15 % ( 15 lacs) approx 15% ( 15 lacs)
 Total AMC charged in 1 year= 15 (in  Total AMC charged in 1 year = 15 (in
lacs) lacs)
+ +
B D
Now Cost of replacement , Since
Cost of Replacement Controlled env. will cut down i.e.
 Business loss = 21.99 (in lacs) corrosion cost is 74.85% of total cost
 Business loss = 5.53 (in lacs)

= =
Total business loss in year = A - B = Total business profit in year = C - D =
6.99 (in lacs) 9.47 (in lacs)
and may be more.

Figure 14 Monetary analysis (XYZ Corp. perspective)

41 | P a g e
On the basis of above analysis Uncontrolled vs. Controlled. We can say that the in the case of

First scenario, OEM strategy in uncontrolled environment.


XYZ corp. continue AMC contract with increase price.

With Second scenario, OEM strategy in controlled environment


XYZ corp. continue AMC contract in with reduced price (if application)

After proving both scenarios, a leading question arises:

Q: What price XYZ corp. should offer to customer for AMC in uncontrolled and controlled
environment?
Ans: With above outcome it is clearly identified that, there should be different price with
different customer.
As we calculated live data of ABC telecom, there is some specific replacement pattern
and incurred cost history. Need complete study of each and every customer.
So pricing strategy would be customized and different in case of each customer.

5.16 Win-Win condition (ABC telecom, XYZ Corp., Purafil Inc.)


Here we saw all three business organization doing their business with different domain. But
ABC telecom and XYZ Corp. are directly affected due to one reason corrosion (Environment
factor) and associated with another domain organization Purafil Inc.

XYZ Corp Business based on– IT solution, Automation, AMC


ABC telecom Business based on – Non Stop Business Operation (24x7)
Purafil Inc Business based on- Clean Environment assurance & AMC

So from the all side prospective, everyone will be in win-win situation if we identify and
rectify the one of the environment issue- CORROSION.

42 | P a g e
Chapter 6. Measuring methodology
(Purafil Inc.)
6.1 Assessment
Air Quality Assessments:
• Outside Air Quality
• Inside Air Quality
• air handling unit (AHU) Chemical Filtration
Air Handling Assessments:
Computer Room Air Conditioning (CRAC) Unit,
Makeup (Outdoor, Fresh) Air Handler and Air-Side Economizers
• Particulate Filtration Data
• Temperature Data
• Relative Humidity Data
Room Integrity Assessments:
• Door Fan Test
• Air Leak Test (Smoke Test)
• Room Differential Pressure Test
Electronic Equipment Assessments:
• Review of Manufacturer’s Specifications
• Electronic Board Analysis

6.2 Real time assessment


Purafil’s OnGuard® 3000 real-time reactivity monitor provides a continuous measure of
atmospheric corrosion levels which can identify trends, enabling preventative action before
severe damage occurs. OnGuard readings have been demonstrated to accurately correlate to
ISA’s air quality classification scheme. Place an OnGuard monitor at the outlet of your data
center or telecom facility’s HVAC ((heating, ventilation, and air conditioning) system or in the
room of concern to establish air quality baselines and identify specific corrosion related events.

43 | P a g e
Chapter 7. Protection and control (By
Purafil Inc.)
7.1 Protection methodology
CRAC, Makeup or Air-Side Economizer Retrofits:
• Combination Particulate and Gas Phase Filters
• Compact Gas Phase Filters
• Gas Phase Filtration Modules
Additional Equipment with Particulate and Gas Phase Filtration:
• Recirculation and/or Pressurization Cabinets
• Equipment for Outside Air Handling Systems
• Scrubbers (Compatible with Cabinets)
• Under-Floor Air Filtration

Assuming a data center’s HVAC system is already equipped with adequate particulate filtration;
gaseous air cleaning can be used in conjunction with the existing air handling systems. Purafil
gas phase air filters or filtration systems employing one or more of our adsorbent and/or
chemisorbent media can effectively reduce gaseous contaminants to well below specified levels.
Properly applied, gaseous air cleaning also has the potential for energy savings.

44 | P a g e
7.2 Control process

45 | P a g e
Chapter 8. Findings / Outcomes
We saw that when we started the topic to investigate and study, although there was some idea
before studying the issue but don’t had deeper knowledge on subject and their several impacts.
Now at this stage we can say that, there is answer and solution.

8.1 Causes & impact - controllable


 Corrosion can be controllable, because in the section 5.4, we had two live report data.
One report came before the installation of Air purifier equipment and another was
clearly stated that the data centre environment is under control and falling under ISA
class G1.
 Temperature
 Humidity
 Power, This cause also not discussed in this paper, but with redundant and multiple
sources of power supply, it would be also put under controllable cause.
 Man, Skilled resource follows the standard practice
 Monetary Impact can control arises due to corrosion, Temp., Humidity, Power, Man– If
we address the major issue soundly.

8.2 Causes & impact - partial controllable


 Firmware is the partially controllable. As we never discussed about their impact and
control factor in this paper. But discussed about their sub causes in cause and effect
diagram. Also firmware of hardware update is nonstop ongoing activity. Every time new
hardware design have been developed and their code need to be upgrade time to time.
Elimination of 100% failure reason is difficult. But the failure rate and cost can be
reduced by following precautionary measures.

 Monetary Impact due to Firmware – It is difficult to quantify, forecast monetary impact.

8.3 Causes - Uncontrollable


 External environment is totally uncontrollable. Until or unless environmental
institutions and Government do not intervene strongly in this issue. Need to make
stronger policy for urban area.

46 | P a g e
Chapter 9. Conclusion
The dissertation was intended to investigate the key causes of frequent parts replacements, and
its monetary implications to the key stakeholders. The study was conducted at XYZ Corp. and
ABC telecom premise. The relevant data was collected, and an investigation was made by means
of a problem solving tool ‘Pareto Analysis’. We observed that the majority of the parts
replacements have happened because of corrosion. We further investigate the causes of the
corrosion by means of another problem solving tool ‘Root Cause Analysis’. The RCA revealed
that the corrosion happens in the server room because of the environmental air entering into
the server room through the air conditioners. The air in the open environment is an external
factor and cannot be controlled. The air conditioners in-take of air may be controlled using
advance air purifying equipments. The dissertation concluded that the companies installing
their servers in such environments should implement solutions to control the server room air
composition by means of advance air purifying equipments. We further extended the study to
understand the monetary impact of the parts replacement and corrosion on the supplier and
client. Our analysis concluded that the supplier is offering an AMC to the client with improper
assessment of the total cost of the AMC. An analysis was made to indicate what should be the
price of the AMC if the environment under which the servers operate is not controlled, and what
should be the price of AMC if the air environment is a controlled one. A separate analysis was
made for client to understand if the AMC amount paid by the client is worth it, what is the
monetary impact of not implementing the air purifying equipment solutions on the clients
overall service business, and what amount of savings can be fetched if the solutions are
implemented.

Chapter 10. Recommendation


To Company/Client (Data centre owner)
We recommend to the client that they should implement advance solutions of air purifying
equipments, and notice the long terms savings that can be fetched from the service business.
The dissertation supports the argument for one year analysis basis. The life time value of the
service business is much more that it is visible for one year.

47 | P a g e
Those company who are going with new DC setup:
 Need awareness of corrosion facts.
 Those companies establishing new DC setup should mention corrosion problem during
their planning phase.
 Should take advice from industry experts.
 Establish their DC in non corrosive region.
 If some constraints restricting them to establish DC at corrosive region, then follow ISA
standard, also install air purifier since beginning.
Those company who have already running DC setup:
 Need awareness of corrosion facts and analyze monetary impact.
 Should take advice from industry experts. Explore solutions to control the server room
environment like Purafil Inc.
 Move their DC in non corrosive region “OR” follow ISA standard, install air purifier as DC
component.
 Analyze which above one step will be less costly and viable to the company.

To OEM / Vendor /Supplier of electronics equipments

We recommend that the supplier to re-assess the price of AMC offered to the client, and a
detailed investigation of the total cost of AMC should be conducted.
In more detail:
 Made awareness of corrosion to the new and current customer as a component of
Marketing.
 Develop individual CRM and accountability of every customer with respect to corrosion.
 Make list of failed hardware, keep track records (number of failure, type, cost, reason,
frequency, replacement repetition with same hardware, repetition interval, history, and
pattern).
 Prepare reports and mail them quarterly.
 Present the report at the time of AMC renewal.
 Structured Reporting helps in justification to customer and leads to sustainable.
 Structured reporting support to stick on (scenario) as discussed in section 5.15.
First scenario, OEM strategy –
XYZ corp. should continue AMC contract in uncontrolled environment with increase
price.

48 | P a g e
Second scenario, OEM strategy –
XYZ corp. could continue AMC contract in controlled environment with reduced price or at same
price.

To air purifier company (eg. Purafil Inc.)


The companies operating in the market for air purifying equipment solutions are recommended
to conduct a details analysis of the monetary impact on supplier’s part, and client part. They
should conduct an awareness show, and pitch to the supplier and client to increase their
awareness about the long term benefits of implementing their solutions.

49 | P a g e
References:
 ISA (1985) Standard ISA-71.04-1985 – changes required for protection of today’s process
control, Chris Mullar, Grant Crosley
 Muller, C.O., “Combination Corrosion Coupon Testing Needed for Today’s Control
Equipment, Purafil Inc.
 Control of corrosive gases to avoid electrical equipment failure, Christopher O. Mullar
 2011 gaseous and particulate contamination for data centre, ASHRAE 2011.
 EU. 2003. Restriction of Hazardous Substances Directive (RoHS). European Union.
 “Humidity and Corrosion” (1987), Purafil, Inc.
 Corrosion – the unseen enemy, Maia Samb, purfil Inc.
 Data centre brochure, Purafil Inc
 Corrosion control for mission critical facilities, Technical brochure TB-1800
 Corrosion Monitoring by Purafil, Inc.
 IBM research report, corrosion control for data centres, L.J klein, P.J. Singh, M Schappert,
Marc Griffel, H.F. hamann.
 Hienonen, Risto & Lahtinen, Reima. Corrosion and climatic effects in electronics. Espoo
2007. VTT Publications

50 | P a g e

You might also like