Professional Documents
Culture Documents
Dissertation-Index - Writeup-V2
Dissertation-Index - Writeup-V2
Dissertation-Index - Writeup-V2
PGPM (PT)
by
October, 2013
1|Page
A Study of the Frequent Hardware Failure,
Replacement and its Monetary Impact Analysis
PGPM (PT)
by
October, 2013
2|Page
Certificate of Approval
The following dissertation titled “A Study of the Frequent Hardware Failure, Replacement
and its Monetary Impact Analysis" is hereby approved as a substantive work/study in
management carried out and presented in a manner satisfactory to warrant its acceptance as a
prerequisite for the award of PGPM (PT) for which it has been submitted. It is understood that
by this approval the undersigned do not necessarily endorse or approve any statement made,
opinion expressed or conclusion drawn therein but approve the dissertation only for the
purpose it is submitted.
Name Signature
3|P age
Certificate from Dissertation Advisory Committee
This is to certify that Mr. SANTOSH AMAN, a participant of the Part-time Post Graduate
Programme in Management has worked under our guidance and supervision. He is submitting
this dissertation titled “A Study of the Frequent Hardware Failure, Replacement and its
Monetary Impact Analysis “in partial fulfillment of the requirements for the award of the
PGPM (PT).
This dissertation has the requisite standard and to the best of our knowledge no part of it has
been reproduced from any other dissertation, monograph, report or book.
4|P age
Abstract
SANTOSH AMAN
In today’s world everything is full with competition. We can see that how all industry market
became competitive and everyone is fighting for their sustainability and growth since dot-com
bubble boom started during late 1990s. Changes in environment composition and their issues
are becoming one of the major key challenges toward data centre operation and business
operation. Almost every organization having their data centre as per there business need and
operational automation. In the current scenario no one can avoid importance and benefit of
automated operation through the data centre. In this dissertation we will study, how the
environment affecting the DC (data centre), which is directly impacting business operations.
Data centre consists of information technology and electronic equipments. Nonstop data centre
operation and their reliability, completely dependent on smooth function of these equipments.
Every business needs 24x7 availability and access to IT resources. When frequent hardware
equipment failure happens due to the environmental factors, it impacts the customer service,
and the long term business. Instantaneously it starts increasing the downtime of the servers and
critical business applications, which is directly makes a monetary impact on the business.
1. To investigate the causes of failure and thereby the replacement of the parts in the client
server room.
2. To delve into the key causes further using the problem solving tools, and explore the
possible solutions to address the main causes.
3. To analysis the monetary impact of the parts failure on the clients business, and impact of
parts replacements on the suppliers overall cost.
In this study we identified the key causes of hardware failure and their monetary impact with
respect to the customer (DC owner) and the supplier (Vendor/OEM). This is the live issue which
I have seen during my job experience. The study conducted within the premises of my one
customer/client ABC telecom, situated at Noida and their vendor XYZ Corp. Another third
5|Page
organization is also come into the picture to resolve the environmental issue that is Air purifier
company Purafil Inc.
To address the issue I have taken relevant live data which was limited in nature, made some
assumption and did qualitative and quantitative analysis. In the analysis starts, with finding key
causes, major percentage of failure reason, their monetary impact in both perspective
(Customer & vendor). Once complete the analysis part came at outcome that every stakeholder
in this reason will be at win-win situation after successfully addressing the issue – corrosion.
In brief, the dissertation was intended to investigate the key causes of frequent parts
replacements, and its monetary implications to the key stakeholders. The study was conducted
at XYZ Corp and ABC telecom premise. The relevant data was collected, and an investigation was
made by means of a problem solving tool ‘Pareto Analysis’. We observed that the majority of the
parts replacements have happened because of corrosion. We further investigate the causes of
the corrosion by means of another problem solving tool ‘Root Cause Analysis’. The RCA revealed
that the corrosion happens in the server room because of the environmental air entering into
the server room through the air conditioners. The air in the open environment is an external
factor and cannot be controlled. The air conditioners in-take of air may be controlled using
advance air purifying equipments. The dissertation concluded that the companies installing
their servers in such environments should implement solutions to control the server room air
composition by means of advance air purifying equipments.
We further extended the study to understand the monetary impact of the parts replacement and
corrosion on the supplier and client. Our analysis concluded that the supplier is offering an AMC
to the client with improper assessment of the total cost of the AMC. An analysis was made to
indicate what should be the price of the AMC if the environment under which the servers
operate is not controlled, and what should be the price of AMC if the air environment is a
controlled one. A separate analysis was made for client to understand if the AMC amount paid
by the client is worth it, what is the monetary impact of not implementing the air purifying
equipment solutions on the clients overall service business, and what amount of savings can be
fetched if the solutions are implemented.
6|P age
ACKNOWLEDGEMENT
This dissertation is a part of our program; Post Graduate Diploma Management at Management
Development Institute. It is written during July-September 2013. It was very interesting to learn
this way of study.
Thank God! We have done it. We would like to thanks my Faculty advisor Dr. Narain Gupta,
without their guidance and supervision; I cannot be able to write this dissertation in time. He
encouraged and pushed me, in right direction. I learnt much from my guide during work and
during supervision. I would also like to thank my industry guide who supported and guided in
all way when ever required.
I would like to say special thank to our chair Dr. Neelu S. Bhullar who guided us throughout the
academic session.
7|Page
Table of Contents
List of Figures ...................................................................................................................................... 10
List of Tables ........................................................................................................................................ 11
Abbreviations ...................................................................................................................................... 12
Chapter 1. Introduction: ..................................................................................................................... 13
1.1 Problem statement ..................................................................................................................... 13
1.2 Research objective ...................................................................................................................... 13
1.3 Target industry ............................................................................................................................ 14
Chapter 2. Literature review .............................................................................................................. 14
2.1 International organization and committees ............................................................................... 14
2.2 Defined standards for data centres ........................................................................................... 15
2.3 Introduction of Purafil Inc. .......................................................................................................... 18
2.4 Affected industries in all over world ........................................................................................... 19
Chapter 3. Framework & Methodology .......................................................................................... 20
3.1 Research approach...................................................................................................................... 20
3.2 Research strategy........................................................................................................................ 20
3.3 Data collection methods & source .............................................................................................. 20
3.4 Sample data and their report (Live data) .................................................................................... 21
Chapter 4. Current study .................................................................................................................... 21
4.1 Definition of data centre, classifications..................................................................................... 21
4.2. Corrosion .................................................................................................................................... 22
4.2.1 What is electronic corrosion? .............................................................................................. 22
4.2.2 Types of corrosion................................................................................................................ 23
4.2.3 Causes of corrosion .............................................................................................................. 24
4.2.4 Sign of corrosion .................................................................................................................. 25
4.2.5 Sources of corrosive gaseous ............................................................................................... 26
Chapter 5 Qualitative and Quantitative analysis .............................................................................. 27
5.1 Does any equipment survive in GX environment? ..................................................................... 27
5.2 Why majorly copper and silver testing required (gold in some cases) ....................................... 27
5.3 Substitute material other than copper/silver/Gold .................................................................... 28
5.4 Data available (ABC telecom)...................................................................................................... 28
5.5 Limitations of data ...................................................................................................................... 31
5.6 Symptoms of hardware failure ................................................................................................... 31
5.7 Causes of hardware failure ......................................................................................................... 31
8|Page
5.8 Cause and effect diagram (fishbone diagram) ............................................................................ 32
5.9 Data analysis ............................................................................................................................... 32
5.10 Pareto chart analysis ................................................................................................................. 33
5.11 Root cause analysis of corrosion (why-why)............................................................................. 34
5.12 Who affected directly & indirectly ............................................................................................ 35
5.13 Solution ..................................................................................................................................... 35
5.14 Monetary comparison and cost expenses (ABC telecom perspective) .................................... 35
5.15 Monetary benefit and cost expenses (OEM perspective) ........................................................ 39
5.16 Win-Win condition (ABC telecom, XYZ Corp., Purafil Inc.) ....................................................... 42
Chapter 6. Measuring methodology (Purafil Inc.) ........................................................................ 43
6.1 Assessment ................................................................................................................................. 43
6.2 Real time assessment.................................................................................................................. 43
Chapter 7. Protection and control (By Purafil Inc.) .......................................................................... 44
7.1 Protection methodology ............................................................................................................. 44
7.2 Control process ........................................................................................................................... 45
Chapter 8. Findings / Outcomes ......................................................................................................... 46
8.1 Causes & impact - controllable ................................................................................................... 46
8.2 Causes & impact - partial controllable ........................................................................................ 46
8.3 Causes - Uncontrollable .............................................................................................................. 46
Chapter 9. Conclusion ......................................................................................................................... 47
Chapter 10. Recommendation ............................................................................................................ 47
References:........................................................................................................................................... 50
9|Page
List of Figures
Figure No. Description
10 | P a g e
List of Tables
Table No. Description
11 | P a g e
Abbreviations
DC data centre
IT information technology
EU European Union
Å angstroms
RH relative humidity
Ca cause
C corrosion
F firmware
P power
M man
T total
12 | P a g e
Chapter 1. Introduction:
ABC Telecom Company situated at Noida near Yamuna bank. The corporate office along with
their DC (data centre) situated within same premises. The company using all type of IT
equipment like storage, server, network, library etc with ERP applications. Being a non IT
company they are highly dependent on hassle free operation of data centre. All corporate,
employee, business people, customer, service centres, outlet in all over the country using ERP
application to run business operations. The DC implemented and automated by XYZ corp. (OEM
& Vendor). All IT equipment supply, maintenance, warranty and AMC have been taken care by
XYZ corp.
2. To delve into the key causes further using the problem solving tools, and explore the
possible solutions to address the main causes.
3. To analysis the monetary impact of the parts failure on the clients business, and impact of
parts replacements on the suppliers overall cost.
13 | P a g e
1.3 Target industry
As we are discussing about data centre and IT industry, so our focus would be surrounded and
limited around electronic equipments of Information technology industry. Manufactured by
major OEM for example Sun Microsystems, HP, IBM, Cisco, Cray, EMC, Hitachi, Seagate, SGI,
NetApp.
14 | P a g e
writing, publishing and continuing education, ASHRAE shapes tomorrow’s built environment
today. ASHRAE was formed as the American Society of Heating, Refrigerating and Air-
Conditioning Engineers by the merger in 1959 of American Society of Heating and Air-
Conditioning Engineers (ASHAE) founded in 1894 and The American Society of Refrigerating
Engineers (ASRE) founded in 1904.
History of standards followed by different Institutions.
Followed by IEC
Followed by ASHRAE
15 | P a g e
Table 1 - ISA STANDARD S71.04-1985 WITH REVISED CHANGES
Note: Å = angstroms which is a unit of length equal to 1/10,000,000,000 (one ten billionth) of a meter.
16 | P a g e
Recommended Operating Environment
Gaseous contamination
Severity level G1 as per ISA 71.04 (ISA 1985) which states that the reactivity rate of copper
coupons shall be less than 300 Å/month. In addition, the reactivity rate of silver coupons shall
be less than 200 Å/month. The reactive monitoring of gaseous corrosively should be conducted
approximately 2 in. (5 cm) in front of the rack on the air inlet side, at one-quarter and three-
quarter frame height off the floor or where the air velocity is much higher.
Particulate contamination
1. Data centres with or without air-side economizers must meet the cleanliness level of ISO class
8.
2. The deliquescent relative humidity of the particulate contamination should be more than
60%.
3. Data centres must be free of zinc whiskers.
4. For data centre without air-side economizer, the ISO class 8 cleanliness may be met simply by
the choice of the following filtration:
a. The room air may be continuously filtered with MERV 8 filters
b. Air entering a data centre may be filtered with MERV 11 or preferably MERV 13
filters.
5. For data centres with air-side economizers, the choice of filters to achieve ISO class 8
cleanliness depends on the specific conditions present at that data centre. In general, air
entering a data centre may require to be filtered using MERV 11 or preferably MERV 13 filters.
The EU’s RoHS Directives restrict the use of six substances in the manufacture of various types
of electronic and electrical equipment: mercury (Hg), lead (pb), hexavalent chromius (Cr(VI)),
cadmium (cd), polybrominated biphenyls (PBB), Polybrominated diphenyl ether (PBDE).
Alternatives such as immersion silver (ImmAg) and organically coated copper (OCC) are
17 | P a g e
currently being used as replacement board finishes, however, ongoing research has shown that
printed circuit boards made using lead-free materials can be more susceptible to corrosion.
Companies selling a broad range of electrical goods in EU must now conform
RoHS. These rules, while laid down at European level. When exporting to Europe, it is essential
to comply with national law in each relevant country.
Classifying and subsequently monitoring gaseous contaminants is a proven
successful way to improve air quality and comply with ISA standard S71.04-1985 and RoHS.
This standards defines or characterizes environment in terms of their overall corrosion
potential.
Purafil is a single source manufacturer of gas phase air filtration media and equipment. Purafil
not only manufactures the products, but also offers technical services specific to each market
such as: air quality assessment, circuit board failure analysis, media life analysis, consultation,
on-site gas testing, system startup, and other on-going services. This focus and expertise makes
Purafil more knowledgeable about the applications of gas phase technology than anyone else in
the field, and results in the use of Purafil products in nearly every part of the world. Purafil is
supported by a network of representative firms throughout the United States and in over 60
countries.
18 | P a g e
HISTORY OF INDUSTRY FIRSTS
• First to engineer, manufacture, and patent potassium and sodium permanganate-
impregnated media for oxidation of pollutants.
• First to engineer and manufacture a UL Classified, synthesized carbon media for
neutralization of corrosive airborne pollutants.
• First to develop and patent OnGuard® quartz crystal microbalance technology for
continuous, real-time monitoring of air quality.
• First to develop the Purafilter® pleated filter with gas phase and particulate filtration
capabilities for retrofitting existing air handlers.
• First to develop and patent the Posi-Track™ technology for improving efficiency in gas
phase air filtration systems.
• First to develop MediaPIK™ software for selection of Purafil media type and quantity
based on unique application requirements.
• First to engineer, manufacture, and patent Purafil SP, the only pellet to contain 12%
sodium permanganate for the oxidation of pollutants.
• First to receive the prestigious Frost & Sullivan Award for Global Gas Phase Air
Filtration Product Line Strategy.
In the last year alone Purafil has received corrosion-related inquiries from almost 150 locations
throughout the Asia-Pacific region, Europe, and North America. For the inquiries received in
2010, over 90% involved corrosion-related equipment failure. Almost all were from locations
where the ambient (outside) air was high in sulfur compounds (e.g., sulfur oxides, active sulfur
species) and the failure mechanism was identified as sulfur creep corrosion. Average corrosion
rates for outdoor air (where monitored) from a number of locations confirmed what was
already suspected – those locations with the highest rate of corrosion failures also had the some
of highest copper and silver reactivity rates.
19 | P a g e
Chapter 3. Framework & Methodology
3.1 Research approach
Qualitative and Quantitative methods are two main research approaches to choose when
conducting research.
In this study we have used a qualitative approach as the study characterizes a complete and
comprehensive view and understanding of phenomenon in its entirety. As this study is aimed at
gaining deeper understanding of the phenomenon under investigation and acquiring richer
knowledge of a complex situation, which also requires assessing abundant information, thus a
qualitative study is best suited for this.
While In these study quantitative methods results are based on live available data and some
assumptions. Because available data is limited in nature.
Because the purpose and objective of this thesis / dissertation closely resemble the form of
research question “HOW” and “WHY”.
With addition to theoretical base also want to prove, there are some impact and losses in terms
of business operations in both side perspective Customer and Vendor.
20 | P a g e
Some Major sources of data:
ABC telecom
XYZ corp.
ISA
IEC
EU
ASHRAE
Purafil Inc.
Internet
IT operations are a crucial aspect of most organizational operations around the world. One of
the main concerns is business continuity; companies rely on their information systems to run
their operations. If a system becomes unavailable, company operations may be impaired or
stopped completely. It is necessary to provide a reliable infrastructure for IT operations, in
order to minimize any chance of disruption. Information security is also a concern, and for this
reason a data center has to offer a secure environment which minimizes the chances of a
security breach.
21 | P a g e
Table 2 DATA CENTRE CLASSIFICATIONS
4.2. Corrosion
Corrosion of metals is actually a chemical reaction caused primarily by attack of gaseous
contaminants and is accelerated by heat and moisture. Rapid shifts in either temperature or
humidity cause small portions of circuits to fall below the dewpoint temperature, thereby
facilitating condensation of contaminants. Relative humidity above 50% accelerates corrosion
by forming conductive solutions on a small scale on electronic components. Microscopic pools
of condensation then absorb contaminant gases to become electrolytes where crystal growth
and electroplating occur. Above 80% RH, electronic corrosive damage will occur regardless of
the levels of contamination.
22 | P a g e
products. As the chemical reactions continue, these corrosion products can form insulating
layers on circuits which can lead to thermal failure or short-circuits. Pitting and metal loss can
also occur.
23 | P a g e
4.2.3 Causes of corrosion
Corrosion causes due to two reason, first Particulate contaminants and second Gaseous
contaminants. Both come under airborne contaminants.
Particulate contaminants - Failure modes due to dust include but are not limited to the
following, Mechanical effects include obstruction of cooling airflow ,interference with moving
parts, abrasion, optical interference, interconnect interference, or deformation of surfaces (e.g.,
magnetic media) and other similar effects. Chemical effects. Dust settled on printed circuit
boards can lead to component corrosion and/or to the electrical short circuiting of closely
spaced features. Electrical effects. Electrical effects include impedance changes and electronic
circuit conductor bridging.
Harmful dust in data centres is generally high in ionic content, such as sulphur and chlorine-
bearing salts. The source of this harmful dust is mainly outdoor dust. Coarse dust particles have
a mineral and biological origin, are formed mostly by wind-induced abrasion, and can remain
airborne for a few days.
One mechanism by which dust degrades the reliability of printed circuit boards involves the
absorption of moisture from the environment by the settled dust. The ionic contamination in the
wet dust degrades the surface insulation resistance of the printed circuit board and, in the
worst-case scenario, leads to electrical short circuiting of closely spaced features via ion
migration. the dust will absorb moisture, get wet, and promote corrosion and/or ion migration,
thereby degrading hardware reliability.
Gaseous contaminants - Sulfur-bearing gases, such as sulfur dioxide (SO2) and hydrogen
sulfide (H2S),are the most common gases causing corrosion of electronic equipment. it has been
shown that SO2 or H2S alone are not very corrosive to silver or copper, but the combination of
these gases with other gases such as nitrogen dioxide (NO2) and/or ozone (O3) are very
corrosive. The corrosion rate of copper is a strong function of relative humidity, while the
corrosion rate of silver has lesser dependence on humidity. Reduced forms of nitrogen
(ammonia (NH ), amines, ammonium ions (NH 4+)) occur mainly in fertilizer plants,
agricultural applications, and chemical plants. Copper and copper alloys are particularly
susceptible to corrosion in ammonia environments.
There are three types of gases that are the prime culprits in the corrosion of electronics:
acidic gases, such as hydrogen sulfide, sulfur and nitrogen oxides, chlorine, and hydrogen
fluoride; caustic gases, such as ammonia; and oxidizing gases, such as ozone and nitric acid.
Of the gases that can cause corrosion, the acidic gases are typically the most harmful.
24 | P a g e
4.2.4 Sign of corrosion
With either copper, silver, or composite materials, the end result is the same: a disruption of the
contact point. The severity of the environment (i.e., the types and levels of gases, humidity, and
temperature) will determine the speed in which these films are created and the level of
disruption of the flow of electrical current.
Specific components which are particularly sensitive to corrosion attack include: Edge
Connectors, Pin Connectors, Wire-wrap Connections, and Electrical Systems.
25 | P a g e
Figure 6 Hard Disk with Copper creep corrosion
Oxidized forms of sulfur (SO2, SO3) are generated as combustion products of fossil fuels
and from motor vehicle emissions.
The reaction with metals normally occurs when these gases dissolve in water to form
sulphurous and sulfuric acid (H2SO3 and H2SO4).
Some common sources of reactive gas compounds (NO, NO2, N2O4) are formed as
combustion products of fossil fuels. In the presence of moisture, some of these gases
form nitric acid (HNO3)
Active sulfur compound refers to hydrogen sulfide (H2S), elemental sulfur (S), and
organic sulfur compounds such as mercaptans (R-SH). When present at low ppb levels,
they rapidly attack copper, silver, aluminum, and iron alloys. The presence of moisture
and small amounts of inorganic chlorine compounds and/or nitrogen oxides greatly
accelerate sulfide corrosion.
Chlorine (Cl2), chlorine dioxide (ClO2), hydrogen chloride (HCl), etc. In the presence of
moisture, these gases generate chloride ions that, in turn, attack most copper, tin, silver,
and iron alloys.
26 | P a g e
In addition to ozone (O3), a list of examples would include the hydroxyl radical as well
as radicals of hydrocarbons, oxygenated hydrocarbons, nitrogen oxides, sulfur oxides,
and water. Ozone can function as a catalyst in sulfide and chloride corrosion of metals.
Strong oxidants This includes ozone plus certain chlorinated gases (chlorine, chlorine
dioxide). Ozone is an unstable form of oxygen that is formed from diatomic oxygen by
electrical discharge or by solar radiation in the atmosphere.
In one study that looked at lead-free finishes, four alternate PCB finishes were subjected to an
accelerated mixed flowing gas corrosion test. Important findings can be summarized as follows:
Immersion gold (ENIG) and immersion silver (ImmAg) surface finishes failed early in
the testing. These coatings are the most susceptible to corrosion failures and may make
the PCB the weak link with regards to the sensitivities of the electronic devices to
corrosion.
None of the coatings can be considered immune from failure in an ISA Class G3
environment.
The gold and silver coatings could not be expected to survive a mid to high Class G2
environment based on these test results.
5.2 Why majorly copper and silver testing required (gold in some cases)
Copper, silver and gold are some important functional material found in many
electrical/electronic devices. Examination of both silver and gold corrosion data have shown
instances of an environment which is non corrosive to copper being extremely corrosive to
silver and/or gold. It is because of results such as these that any testing which attempts to
predict electrical/electronic equipment reliability should incorporate copper, silver and gold
corrosion as determinants.
27 | P a g e
5.3 Substitute material other than copper/silver/Gold
Copper and silver is best conducting material, if supplier provide different material then he
need to make sure the conductivity, price, competition etc. Lot of R & D has been done on this
but being a competitive world and looking to conductivity and communication, it is difficult to
avoid 100% copper and silver free cards.
28 | P a g e
P460RPW Storage 05 July 2012 SW power module failed p 1 1
P46JH2Z XSeries 10 July 2012 Blade server HBA c 1 1
P462HJV XSeries 16 July 2012 blade server hba c 1 1
P46TBGY XSeries 10 September 2012 blade server HBA c 1 1
P46TBGS XSeries 10 September 2012 blade server HBA c 1 1
P46T0LV Storage 06 September 2012 Com. Net app replaced c 1 1
P46MDWV XSeries 28 September 2012 blade server HBA c 1 1
P46CZBJ Storage 27 September 2012 netapp ext hard disk c 1 1
P41DKKS XSeries 11 October 2012 blade server RAM M 1 1
P4187KP XSeries 18 October 2012 Mother board F 1 1
P417MCM Storage 31 October 2012 netapp ext hard disk c 1 1
P416ZVM XSeries 07 November 2012 blade server mother board c 1 1
P4151CB XSeries 17 November 2012 Blade server HBA c 1 1
P41NFJN Storage 22 November 2012 netapp ext hard disk c 1 1
P41JH2P XSeries 11 December 2012 Blade server HBA c 1 1
P41JH24 XSeries 11 December 2012 blade server HBA c 1 1
P41JH20 XSeries 11 December 2012 blade server HBA c 1 1
P41PCXS Storage 31 December 2012 DS3400 hard disk 1 TB C 1 1
P41ZDMB XSeries 05 January 2013 FIRMWARE Mother board F 1 1
P41ZDS7 XSeries 05 January 2013 FIRMWARE Mother board F 1 1
P41W8NS XSeries 25 January 2013 blade server mother board F 1 1
P41W0L3 XSeries 28 January 2013 Blade server HBA c 1 1
P41W079 XSeries 28 January 2013 blade battery power lost p 1 1
P41TG4L XSeries 04 February 2013 blade battery power lost P 1 1
P41YBP7 Storage 10 February 2013 netapp ext hard disk c 1 1
P41BT9C XSeries 14 February 2013 blade server HBA c 1 1
P41CLDF XSeries 21 February 2013 FIRMWARE Mother board F 1 1
P41CHG7 Storage 25 February 2013 netapp ext Hard disk c 1 1
P41MC9P XSeries 02 March 2013 Blade server HBA c 1 1
P459G8H XSeries 14 March 2013 Blade server mother board c 1 1
P459C48 XSeries 15 March 2013 blade battery power lost p 1 1
P458LWP Storage 18 March 2013 netapp ext hard disk c 1 1
P45L5LK Storage 22 March 2013 netapp ext hard disk c 1 1
P45LHHX Storage 25 March 2013 netapp ext Hard disk c 1 1
P457GYS XSeries 31 March 2013 blade server RAM c 1 1
P4565KW XSeries 06 April 2013 Blade server HBA c 1 1
P4563WX Storage 08 April 2013 netapp ext hard disk c 1 1
P451H1V XSeries 14 April 2013 blade server mother board c 1 1
P457XZS Storage 30 March 2013 Com. Net app replaced c 1 1
P45J74Z Storage 06 May 2013 netapp ext hard disk c 1 1
P45JZDM Storage 07 May 2013 DS3400 hard disk 1 tb c 1 1
P452XTN XSeries 13 May 2013 blade server RAM c 1 1
P4535Y1 Storage 15 May 2013 netapp ext hard disk c 1 1
P45HXY9 XSeries 11 June 2013 blade server RAM c 1 1
P45XPLW Storage 13 June 2013 netapp ext hard disk c 1 1
P45WHTD XSeries 25 June 2013 blade server HBA c 1 0 0 0 1
T 54 9 7 1 71
29 | P a g e
Ca Cause
C Corrosion
F Firmware
P Power
M MAN
T Total
Along with above data, having two Data centre test coupon (copper/silver reactivity) report
which one showing DC environment before Installation of Purafil Inc air cleaning equipment
and another is showing after installation. Purafil air cleaning equipment came into operation at
23 November 2012. Below is the outcome of both report, one is before and another is after this
date.
Table 4 showing result before Purafil Inc. equipment Installation into DC environment
30 | P a g e
Table 5 showing result after Purafil Inc equipment Installation into DC environment
31 | P a g e
Then came at result that only four reason causes to hardware failure.
These are:
1) Corrosion
2) Firmware
3) Power
4) Man
Cause Effect
MAN Firmware
Bug in chip
Poor Documentation
Bug in old firmware
Inadequate monitoring
Firmware upgrade error
Poor Training / lack of
technical knowledge
Hardware
Failure
Industry gases Lack of Cleanness
Working /storage practice
Multiple time power outage Gaseous contaminants inside the DC
( Corrosive gases )
Particulate contaminants
Voltage fluctuation (dust)
Out door dust
Electrical / electric gases
Humidity
Not proper earthing
Temperature
Power module failed
Fluctuation
Chloride Presence
Battery power failed
AC not proper functioning
Power Corrosion
In the above figure all major causes and their sub-causes identified.
32 | P a g e
5.10 Pareto chart analysis
Here making a Pareto chart to identify the top reason of hardware failure.
80 120
110
70
98.6 100 100
60 88.74 90
54
80
50 76.06
70
Percent
40 60
Count
50
30
40
20 30
20
10 9
7
10
1
0 0
Corriosion Firmware Power MAN
33 | P a g e
Responsible for 76 % failure, Problem bears due to 25 % of reason
FIRMWARE
CORROSION
Issue bears 25 %
Issue bears 25 %
Responsible 12.68 %
Responsible 76.06 %
POWER MAN
Issue bears 25 % Issue bears 25 %
Responsible 9.86 % Responsible 1.04 %
After studying Pareto chart and failure rate matrix. We can clearly identify that corrosion is the
only reason which is responsible for major downtime, operational loss and hardware failure.
34 | P a g e
5.12 Who affected directly & indirectly
Corrosion affected directly to the customer who are suffering their operational profit, customer
loyalty due to non availability of their automated IT system in timely manner (24x7). As we
know the ABC is being a telecom company, they need 100% uptime to keep their business
running. Their ERP application using in throughout India by their three types of business user:
1) Mobile VAS (value added service) services.
2) Service centre across country
3) Employee (corporate users)
Indirectly OEM, XYZ corp. also affected because of corrosion issue. As they are responsible to
keep maintain and support their equipment. OEM sold their equipment and signed AMC (annual
maintenance cost) which is their business model to exist into the market. They are committed to
meet the standard and maintain BRAND value also suffering a major loss of revenue which
should not be the essential part of loss.
5.13 Solution
Now the condition is that neither we cannot change the entire environment overnight nor
shutdown our data centre. Because in the 21 century IT(Information technology) plays a role
like heartbeat of body. We cannot survive and sustain into any industry without life line (IT and
automated data centre).
To remove the revenue loss it is mandatory for both customer and OEM perspective. There
should be a solution came at front to control and reduce hardware failure. It will reduce the
operational loss of customer and replacement cost (new hardware) vendor.
To overcome this issue we need to clean and maintain DC environment. This is the only option
to both parties. Maintaining DC environment we need to install new air purifier
equipment/device.
As we analyzed there are two reports available after purafil Inc. device PPU 500V outlet. It has
been controlled the environment and bring down the reaction rate of Copper and Silver as per
ISA Standard 71.04-1985. Once the environment would be in controlled condition it becomes
beneficial in all perspective.
35 | P a g e
Here we indentified what will happen if customer gone with international DC standard “or”
continue with current uncontrolled environment.
So, before going to compare controlled environment and uncontrolled environment monetary
impact, first we need to know and convert downtime hour impact into numeric value/figure
(cost of downtime). Then only we calculate the benefit of controlled environment.
Cost of downtime could be any figure with respect to organization to organization. Here my
assumption is, downtime of ERP application directly impacted to service revenue and
contribution profit from service to the organization. Because in frontend, ERP application not
available to end user & customer which leads loss. While in backend it happened due to
hardware failure. Life time value of the downtime saving is yet to be considered.
36 | P a g e
Cost of Downtime
Loss at every 1 % increase downtime
2600.00
2400.00 2426.16
2200.00 2183.55
2000.00
1940.93
1800.00
1698.31
1600.00
1400.00 1455.70
(in lacs)
1200.00 1213.08
1000.00 970.46
Rs
800.00
727.85
600.00
485.23
400.00
200.00 242.62
0.00
Percent 1 Percent 2 Percent 3 Percent 4 Percent 5 Percent 6 Percent 7 Percent 8 Percent 9 Percent 10
In the above calculation, what are possibilities of service revenue loss for the current year
operation, if downtime gone up to 10%, out of total working hour in whole year.
37 | P a g e
Monitory comparison and cost expenses (ABC telecom perspective)
Cost incurred by Customer
Comparison -> If follow ISA standard or continue with Current Environment.
A C
OEM Air Purifier equipment
Cost of replaced parts ‘OR’ AMC Installed equipment price would be
If the cost of supplied/running device is Vs. around 10 lakh and AMC would approx
15% for G2 level environment.
100 lacs then AMC would be approx
15% ( 15 lacs) Total cost first year- 10 (in lacs)
Total cost one year- 15 (in lacs) Second year AMC cost – 1.5 lakh
+
+
B D
Cost of Downtime Cost of Downtime, Since equipment will
cut down the corrosion down time
Operational loss= 242.62 (in lacs) which is 76.06 % of total downtime
Operational loss = 58.08 (in lacs)
= =
Total cost first year= A + B = (257.62) in Total cost first year = A + C + D = 83.08 (in
lacs lacs)
Total cost Second year = A+B= (257.62) Total Cost second year=A + C(AMC) + D
and may be more. = 74.58 (in lacs)
On the basis of above calculation (assume 1% downtime), customer will get benefited within
current year and increasing gradually into next year. This will improve:
customer loyalty
brand name
increase profit
trust & relationship between company and consumer
With this customer (ABC telecom) will negotiate lesser AMC charge from next
year.
Here I assumed fix operational profit in longer run i.e. 48.45 lakh. But practically if we apply this
theory, it will increase operational profit every year comparatively as the company grows every
year .This will build Brand name and customer loyalty. Here again some understood assumption
38 | P a g e
is, only applicable in case of regular growing company, positive & Idle market, and having none
global recession scenario.
Now here we are going to identify, what is the cost incurred by OEM, what could be the strategy
of OEM in both case (Uncontrolled environment DC and Controlled environment DC) with their
customer? To find some outcome, I assuming two scenarios which will automatically answered
after calculation.
With Uncontrolled environment:
XYZ corp. continue AMC contract in with current price.
OR
XYZ corp. should continue AMC contract with increase price.
Failure
count in Item
Approx 18 based M
Part Price Rs month total cost C C cost F F cost P P cost M Cost
Mother
board 52000 8 416000 3 156000 5 260000 0 0 0 0
HBA card 40000 21 840000 21 840000 0 0 0 0 0 0
RAM 9000 7 63000 3 27000 3 27000 0 0 1 9000
controller 320000 2 640000 2 640000 0 0 0 0 0 0
disk 450 Gb 32000 23 736000 23 736000 0 0 0 0 0 0
disk 1 Tb 35000 2 70000 2 70000 0 0 0 0 0 0
SW switch 260000 2 520000 0 0 0 0 2 520000 0 0
Battery 500 5 2500 0 0 0 0 5 2500 0 0
I/O module 11000 1 11000 0 0 1 11000 0 0 0 0
Total 71 3298500 54 2469000 9 298000 7 522500 1 9000
39 | P a g e
Here we did cost calculation in each manner. i.e.
Total cost of hardware failed or replaced by OEM.
Total cost on the basis of all four causes of hardware failure summarized in below.
On the basis of cost incurred by OEM due to Corrosion, power, Firmware, and Man. Here I
analyzing major cause and reason with the help of Pareto chart. This will prove again previous
identification – corrosion is the major issue which needs to address at very sound level.
20 120.00
18
16.46
99.73 100 100.00
16
90.69
14
80.00
12
74.85
10 60.00
Cost in lacs
8
40.00
6 Percent
4 3.48
20.00
1.99
2
0.06
0 0.00
Corrosion cost Power cost Firmware cost Man Cost
Here on the basis of Pareto chart analysis, again we came at the point of 74.85% of hardware
failure cost coming from corrosion issue. Just because of uncontrolled environment at customer
40 | P a g e
data centre. Obviously XYZ corp. (OEM) bearing major chunk of over costing in some particular
region, where client data centre located and they have been providing AMC support
consistently.
Now the basis of analyzed data (costing), we are going to prove benefit and loss of XYZ Corp.
what will be their strategy of AMC in case of controlled environment OR uncontrolled
environment with customer? How corrosion issue impacting monetary value OR business
profit?
A C
Uncontrolled Environment Controlled Environment
AMC charged to customer AMC charged to customer
If the cost of supplied/running device If the cost of supplied/running device
is 100 lacs then AMC would be Vs. is 100 lacs then AMC would be
approx 15 % ( 15 lacs) approx 15% ( 15 lacs)
Total AMC charged in 1 year= 15 (in Total AMC charged in 1 year = 15 (in
lacs) lacs)
+ +
B D
Now Cost of replacement , Since
Cost of Replacement Controlled env. will cut down i.e.
Business loss = 21.99 (in lacs) corrosion cost is 74.85% of total cost
Business loss = 5.53 (in lacs)
= =
Total business loss in year = A - B = Total business profit in year = C - D =
6.99 (in lacs) 9.47 (in lacs)
and may be more.
41 | P a g e
On the basis of above analysis Uncontrolled vs. Controlled. We can say that the in the case of
Q: What price XYZ corp. should offer to customer for AMC in uncontrolled and controlled
environment?
Ans: With above outcome it is clearly identified that, there should be different price with
different customer.
As we calculated live data of ABC telecom, there is some specific replacement pattern
and incurred cost history. Need complete study of each and every customer.
So pricing strategy would be customized and different in case of each customer.
So from the all side prospective, everyone will be in win-win situation if we identify and
rectify the one of the environment issue- CORROSION.
42 | P a g e
Chapter 6. Measuring methodology
(Purafil Inc.)
6.1 Assessment
Air Quality Assessments:
• Outside Air Quality
• Inside Air Quality
• air handling unit (AHU) Chemical Filtration
Air Handling Assessments:
Computer Room Air Conditioning (CRAC) Unit,
Makeup (Outdoor, Fresh) Air Handler and Air-Side Economizers
• Particulate Filtration Data
• Temperature Data
• Relative Humidity Data
Room Integrity Assessments:
• Door Fan Test
• Air Leak Test (Smoke Test)
• Room Differential Pressure Test
Electronic Equipment Assessments:
• Review of Manufacturer’s Specifications
• Electronic Board Analysis
43 | P a g e
Chapter 7. Protection and control (By
Purafil Inc.)
7.1 Protection methodology
CRAC, Makeup or Air-Side Economizer Retrofits:
• Combination Particulate and Gas Phase Filters
• Compact Gas Phase Filters
• Gas Phase Filtration Modules
Additional Equipment with Particulate and Gas Phase Filtration:
• Recirculation and/or Pressurization Cabinets
• Equipment for Outside Air Handling Systems
• Scrubbers (Compatible with Cabinets)
• Under-Floor Air Filtration
Assuming a data center’s HVAC system is already equipped with adequate particulate filtration;
gaseous air cleaning can be used in conjunction with the existing air handling systems. Purafil
gas phase air filters or filtration systems employing one or more of our adsorbent and/or
chemisorbent media can effectively reduce gaseous contaminants to well below specified levels.
Properly applied, gaseous air cleaning also has the potential for energy savings.
44 | P a g e
7.2 Control process
45 | P a g e
Chapter 8. Findings / Outcomes
We saw that when we started the topic to investigate and study, although there was some idea
before studying the issue but don’t had deeper knowledge on subject and their several impacts.
Now at this stage we can say that, there is answer and solution.
46 | P a g e
Chapter 9. Conclusion
The dissertation was intended to investigate the key causes of frequent parts replacements, and
its monetary implications to the key stakeholders. The study was conducted at XYZ Corp. and
ABC telecom premise. The relevant data was collected, and an investigation was made by means
of a problem solving tool ‘Pareto Analysis’. We observed that the majority of the parts
replacements have happened because of corrosion. We further investigate the causes of the
corrosion by means of another problem solving tool ‘Root Cause Analysis’. The RCA revealed
that the corrosion happens in the server room because of the environmental air entering into
the server room through the air conditioners. The air in the open environment is an external
factor and cannot be controlled. The air conditioners in-take of air may be controlled using
advance air purifying equipments. The dissertation concluded that the companies installing
their servers in such environments should implement solutions to control the server room air
composition by means of advance air purifying equipments. We further extended the study to
understand the monetary impact of the parts replacement and corrosion on the supplier and
client. Our analysis concluded that the supplier is offering an AMC to the client with improper
assessment of the total cost of the AMC. An analysis was made to indicate what should be the
price of the AMC if the environment under which the servers operate is not controlled, and what
should be the price of AMC if the air environment is a controlled one. A separate analysis was
made for client to understand if the AMC amount paid by the client is worth it, what is the
monetary impact of not implementing the air purifying equipment solutions on the clients
overall service business, and what amount of savings can be fetched if the solutions are
implemented.
47 | P a g e
Those company who are going with new DC setup:
Need awareness of corrosion facts.
Those companies establishing new DC setup should mention corrosion problem during
their planning phase.
Should take advice from industry experts.
Establish their DC in non corrosive region.
If some constraints restricting them to establish DC at corrosive region, then follow ISA
standard, also install air purifier since beginning.
Those company who have already running DC setup:
Need awareness of corrosion facts and analyze monetary impact.
Should take advice from industry experts. Explore solutions to control the server room
environment like Purafil Inc.
Move their DC in non corrosive region “OR” follow ISA standard, install air purifier as DC
component.
Analyze which above one step will be less costly and viable to the company.
We recommend that the supplier to re-assess the price of AMC offered to the client, and a
detailed investigation of the total cost of AMC should be conducted.
In more detail:
Made awareness of corrosion to the new and current customer as a component of
Marketing.
Develop individual CRM and accountability of every customer with respect to corrosion.
Make list of failed hardware, keep track records (number of failure, type, cost, reason,
frequency, replacement repetition with same hardware, repetition interval, history, and
pattern).
Prepare reports and mail them quarterly.
Present the report at the time of AMC renewal.
Structured Reporting helps in justification to customer and leads to sustainable.
Structured reporting support to stick on (scenario) as discussed in section 5.15.
First scenario, OEM strategy –
XYZ corp. should continue AMC contract in uncontrolled environment with increase
price.
48 | P a g e
Second scenario, OEM strategy –
XYZ corp. could continue AMC contract in controlled environment with reduced price or at same
price.
49 | P a g e
References:
ISA (1985) Standard ISA-71.04-1985 – changes required for protection of today’s process
control, Chris Mullar, Grant Crosley
Muller, C.O., “Combination Corrosion Coupon Testing Needed for Today’s Control
Equipment, Purafil Inc.
Control of corrosive gases to avoid electrical equipment failure, Christopher O. Mullar
2011 gaseous and particulate contamination for data centre, ASHRAE 2011.
EU. 2003. Restriction of Hazardous Substances Directive (RoHS). European Union.
“Humidity and Corrosion” (1987), Purafil, Inc.
Corrosion – the unseen enemy, Maia Samb, purfil Inc.
Data centre brochure, Purafil Inc
Corrosion control for mission critical facilities, Technical brochure TB-1800
Corrosion Monitoring by Purafil, Inc.
IBM research report, corrosion control for data centres, L.J klein, P.J. Singh, M Schappert,
Marc Griffel, H.F. hamann.
Hienonen, Risto & Lahtinen, Reima. Corrosion and climatic effects in electronics. Espoo
2007. VTT Publications
50 | P a g e