Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Reflecting on the Methods Used in KLD

Research

Kenneth J. Hatten
Boston University

James P. Keeler
Kenyon College

William L. James
Hofstra University

Kyungho Kim
Ajou University

Authors Note

Correspondence for this article can be sent to James P. Keeler, Professor Emeritus of Economics,
Ascension Hall, Kenyon College, Gambier OH, 43022.

Contact: Phone (740) 501-5396


Email: keeler@kenyon.edu

* Corresponding Author: keeler@kenyon.edu


METHODS USED IN KLD RESEARCH

Bios

Kenneth J. Hatten is a Professor of Markets Public Policy and Law at the Questrom School of
Business, Boston University. He earned his PhD in Strategic Management from Purdue
University. His research interests are strategic management, technology change, and the financial
and managerial control of financial institutions. His research has been published in the Harvard
Business Review, the Journal of Industrial Economics, the Academy of Management Journal,
and the Strategic Management Journal.

James P. Keeler is a Professor Emeritus at Kenyon College. He received his PhD from Indiana
University. His primary research interest is cost conditions in the airline industry, and empirical
estimates of the Austrian business cycle theory. His research has appeared in the International
Journal of Transport Economics, Journal of Business and Economic Statistics and The Quarterly
Journal of Austrian Economics.

William L. James is the Robert E. Brockway Distinguished Professor of Marketing and


International Business at the Zarb School of Business at Hofstra University. He earned his PhD
in Marketing from the Krannert Graduate School of Management of Purdue University in 1981.
His research interests are primarily in the fields of advertising and strategic management. His
research has appeared in marketing and management journals including the Journal of Marketing
Research, the Strategic Management Journal and the Journal of Advertising Research.

Kyungho Kim is an Assistant Professor at Ajou University. He received his DBA from Boston
University. His primary research interests include corporate social responsibility, environmental
sustainability and international business strategy

2
METHODS USED IN KLD RESEARCH

Abstract
The data published by Kinder, Lydenberg, Domini Inc. (KLD) has been seen as the de
facto standard in corporate social responsibility (CSR) research since the early 1990s. Herein we
ask: Have the methods used to analyze KLD data always been appropriate? We begin with the
methodological implications of research based on the KLD index (TS-TC) and its successors,
total strengths (TS) and total concerns (TC). We then address the fact that CSR research is
usually based on an implicit assumption that industry does not matter. The paper next assesses
the methodological constraints that should be observed in efforts to develop an ultimate CSR
index, a latent variable to explain KLD’s observations. Finally, we use quantile regression to
assess the power of nine KLD environmental variables as explicators of environmental
performance. The results suggest that KLD offers few consequential findings when appropriate
methodological protocols are observed. The paper concludes with suggestions for future
research.

Keywords: KLD index, corporate social responsibility, environment, quantile regression

3
METHODS USED IN KLD RESEARCH

Introduction
In research, the validity of any conclusion depends on the credibility of the data and the
analyst’s methodology. Did the analyst use the right data and methods? With these questions in
mind, we reflect on research on corporate social responsibility (CSR) and corporate social
performance (CSP) that for many years has relied largely on KLD indices. Chen and Delmas
(2011), in surveying the CSR/CSP literature, found that almost 80 percent of research between
1997 and 2009 relied on aggregations of KLD “strengths and concerns” rankings, in which
“concerns” refers to environmental threats and “strengths” to commitments made that promise to
ameliorate such threats. Furthermore, in general, CSR/CSP research aggregates industries
without testing whether the effects found are homogeneous from one industry to another.

This reliance on indices may stem from difficulties in addressing the question: Can a
corporation do good (for society) and do well (financially)? When investigating the relationship
between financial performance (FP) and CSP, researchers have used relatively straightforward
measures of FP such as the return on assets or return on equity (ROA and ROE) and market
value — whether their focus is short or long term. Perhaps researchers feel they need a similar
single measure of CSR? Reliance on multi-industry data without testing for industry differences
may be due to researchers’ wish to avoid the complexity of heterogeneity tests and a sense that in
the CSR arena, data are scarce.

At least until 2006, CSR researchers with an empirical bent relied almost exclusively on
“The KLD index,” that is, on TS minus TC (TS-TC). Barnett and Salomon’s (2012) study was
one of the last to use TS-TC. Recently, scholars have moved to using the two measures TS and
TC independently of each other. In this, they have followed Mattingly and Berman (2006) who
argued that combining social strengths and weaknesses in empirical research could obscure the
countervailing effects of each variable on a dependent variable. They argued that total strengths
and total concerns should be looked at independently, except when their convergence can be
demonstrated empirically. TS and TC have been examined in many recent papers as dependent
and control variables (Chatterji, Levine, & Toffel, 2009; McGuire, Dow, & Ibrahim, 2012;
Walls, Berrone, & Phan, 2012; Zyglidopoulos, Georgiadis, Carroll, & Siegel, 2012).

Of course, instead of one index we have two, but they may be ill-used. In this paper, we
assess the validity of the methods typically used to access the information carried by the KLD
data. Rather than echoing the well-argued criticisms articulated by Chatterji et al. (2009) and
Mattingly (2015), we simply note that few have deeply examined the structure and information
content of the KLD data or the methods that CSR researchers typically use to analyze it. The
KLD data have often been treated as if they were measured at the interval or ratio level when in
fact, they are measured at the nominal level, with binary 0/1 observations of particular strengths
and weaknesses. Essentially, we ask, have we appreciated KLD, as is, and recognized the
constraints it imposes on researching the methods used for analysis?

Reassessing the Methods: The Plan


Currently, researchers usually summarize the KLD raw data along with the KLD index
(TS-TC), as well as the two indices, TS and TC, in tables of means, standard deviations, and
correlations. We will discuss the implications of this common analysis for the choice of methods

4
METHODS USED IN KLD RESEARCH

in further research and particularly the use of the KLD, TS, and TC indices. We will then closely
examine KLD data noting the frequency of repeated patterns of observed variables (1s) and not-
observed variables (0s), and then test the degree of association of those variables. Each of these
steps is designed to reveal the characteristics of the KLD data and the methodological constraints
they put on further statistical analysis. In this light, we will assess the likely success of efforts
like Carroll, Primo, and Richter (2016) to develop an ultimate CSR index that might serve as a
universal proxy for CSR. Then, following strict research protocols, we will assess the effects of
nine KLD environmental variables as independent variables in a multi-industry exploratory study
explaining environmental performance. Finally, we suggest paths for future research that avoid
the problems and constraints highlighted above, problems we believe have compromised CSR
research to date.

KLD Environmental Data


Our research question is: Are the methods of analysis typically used appropriate for the
KLD data? When first used in CSR research in 1991, KLD data described approximately 650
publicly-traded U.S. companies in terms of several socially responsible attributes, labelled
concerns or strengths. KLD, a socially responsible investment fund manager, employed an
independent, trained research staff annually to tap public corporate documents (e.g., annual
reports, company websites, firms’ CSP reporting) and other data sources and to read, review, and
evaluate the data. Recently, the number of U.S. firms covered has risen to over 3,000. KLD
generates approximately 80 ratings of 0 or 1 as concern and strength factors at the end of each
calendar year for each company it monitors. Zero indicates a particular strength or concern was
not observed; 1 indicates it was observed.

To address our research questions, we will focus on a subset of the full KLD dataset, nine
environmental variables — five labelled “concerns” — hazardous waste, regulatory problems,
ozone-depleting chemical emissions, substantial emissions, and agricultural chemicals — and
four “strengths” — beneficial products/services, pollution prevention, recycling, and clean
energy. We have matched the KLD observations describing firms in three manufacturing
industries, Standard Industrial Classification (SIC) 31, 32, and 33, with EPA data on toxic
releases (TR) for each of these industries. Manufacturing industry SIC 31 includes
Food/Beverage/Tobacco, Textiles, Apparel, and Leather; SIC 32 includes Paper, Printing and
Publishing, Petroleum, Chemical, Plastics and Rubber, and Stone/Clay/Glass/Cement; and SIC
33 includes Primary Metals, Fabricated Metals, Machinery, Computers and Electronic Products,
Electric Equipment, Transportation Equipment, Furniture, and Miscellaneous Manufacturing.
These manufacturing sectors were chosen because they offer a wide range of environmental
“concerns” and “strengths.” They were among the few industries where the number of KLD
observations available was sufficient for estimation, so the choice of industries was limited by
degrees of freedom. In addition, we chose to focus on the relationship between Environmental
Performance, measured by toxic releases of chemicals and particulates. These three industries
are the largest contributors to total pollution, and we expect that the production relationship
would be substantially different for other sectors such as Retail Trade, Finance Insurance and
Real Estate or Public Administration. The data used in this study cover 18 years, 1991–2008, and
comprise an unbalanced panel of approximately 2,500 firm years. KLD’s observations of the

5
METHODS USED IN KLD RESEARCH

environmental concerns and strengths of specific corporations do not change radically year by
year in mature industries such as SIC 31, 32, and 33.

Basic Characteristics
Table 1 below lists the means, standard deviations, and minimum and maximum values
of variables referenced in this study, as well as the correlations between each of the nine KLD
strengths and concerns and the KLD index (TS-TC), and TS and TC, across our three industries
from 1991–2008.

Table 1: Correlation Matrix (Obs = 2412 Firm*Years)


Mean s.d Min Max HW RP ODC SE AC BPS PP RC CE TS TC TS-TC TR

HW .27 .44 0 1 1.00

RP .32 .47 0 1 .34* 1.00

ODC .02 .15 0 1 .13* .08* 1.00

SE .27 .44 0 1 .29* .28* .13* 1.00

AC .06 .25 0 1 .18* .17* .47* .13* 1.00

BPS .12 .33 0 1 -.05* -.08* -.03 -.06* -.05* 1.00

PP .09 .28 0 1 .18* .12* .00 .12* .09* -.01 1.00

RC .08 .28 0 1 -.05* .04* -.04 .08* -.07* -.07* .01 1.00

CE .09 .28 0 1 .15* .12* .02 .15* .08* .10* .17* -.01 1.00

TS .38 .62 0 3 .10* .09* -.02 .12* .02 .54* .54* .42* .59* 1.00

TC .95 1.12 0 5 .71* .71* .37* .67* .48* -.09* .19* .01 .19* .13* 1.00

[TS-TC] -.56 1.21 -5 3 -.61* -.61* -.35* -.56* -.43* .36* .10* .21* .13* .39* -.86* 1.00

TR MM 4.3 22 .007 568 .12* .11* .17* .15* .12* -.04* .05* .07* .00 .03 .20* -.17* 1.00

Sales B 11.3 28.2 0.5 425 .32* .28* .02 .27* .02 -.02 .09* .01 .23* .14* .36* -.26* .08*

Concerns: Hazardous Waste (HW), Regulatory Problems (RP), Ozone Depleting Chemicals (ODC), Substantial
Emissions (SE), Agricultural Chemicals (AC),), Strengths: Beneficial Products/Services (BPS), Pollution
Prevention (PP), Recycling (RC), Clean Energy (CE), Total Strengths (TS), Total Concerns (TC), KLD Index (TS – TC),
Toxic Releases (TR, unit: lbs. MM), Sales (Billion US$), * p <0.05.

The most important fact here is that the correlation between TC and TS (r TC, TS)) within
our three-industry sample is only 0.13. We can also note that the five KLD environmental
concerns and the four environmental strengths, the components of TS and TC in this study, are
themselves correlated at very low levels. Factor Analysis or Principal Components methods
(Carroll, Primo and Richter, 2013, and Goss and Roberts, 2010) rely on high correlations of the
raw variables for the ability to interpret the constructed indexes which we believe will be
difficult with these low correlations in KLD data.

Moreover, as shown in Table 1, the single concerns and strengths are not only weakly
correlated, but some pairs have positive or negative signs which are at odds with the assumptions

6
METHODS USED IN KLD RESEARCH

of common practice. For example, the strength pollution prevention is positively correlated with
all five concerns; the strength recycling is positively correlated with two concerns, and
negatively with the strength beneficial products and services. Such unexpected contrary
indications are important because common practice implicitly assumes that every strength is
positively correlated with every other strength. Thus, signs at odds with common practice point
to the possibility that untested summing of KLD strengths or concerns (even separately) as
commonly practiced, is likely to be in error.

Because low correlations can have important implications for the choice of methods for
further research, we checked if the weak correlations of concerns and strengths in Table 1 were
unique to our sample or representative of other samples. To do so, we compared Table 1 with
Table IV of Chatterji et al. (2009) which reports the correlation of every KLD environmental
concern and strength with every other. The comparison yielded Table 2, affirming that low
correlations characterize the KLD environmental data.

Table 2. Characteristics of the Correlation Matrices


This Study (9*9) Chatterji et al 2006
(14*14)

Sample size 2412 3831


No Correlations >|0.30| 2 4
No Correlations >.40 1 2
MAX Correlation .47 0.44
MIN Correlation -0.08 -0.04
Median Correlation between .05 and .10 between .05 and .10
Mode .101 to .15 -.049 to .00
(8) (10)
r TS, TC 0.13 .25
Note: Here, the Min, Max, Median, and Modal Correlations refer to correlations between particular KLD Concerns
and Strengths. TS, TC, and the KLD Index (TS-TC) were excluded from these counts.

Indexing in Practice
CSR research has long been tied to indices as Entine (2003) once observed, primarily
sourced from KLD. The often-cited KLD index is TS-TC. To an increasing degree, after
Mattingly and Berman (2006), this once favored KLD index has become two-headed, TS and
TC. It is important to understand how practice changed and why; before Mattingly and Berman
(2006), most researchers used KLD with the separate observations equally weighted.

Mattingly and Berman (2006) commented on this early practice:


Virtually all of the prior research using the KLD data has subtracted the weaknesses from
the strengths to derive a composite indicator of a firm’s social action. In a significant
departure from prior research — constituting a major contribution of our project — we
sum separately the indicators of strength and weakness [emphasis added], leaving them

7
METHODS USED IN KLD RESEARCH

separate for the EFA (Exploratory Factor Analysis). We reject the assumption that
strengths and weaknesses necessarily covary in opposing directions and prefer, instead, to
examine empirically whether they do. (p. 28)

After 2006, it became common practice in CSP and EP research to follow their
recommendation to separately sum KLD strengths and concerns and split the traditional KLD
index. Unfortunately, thereafter, they have followed the long-established practices of the field
summing without testing that the two sums TS and TC are valid. Here we cite an earlier paper,
by Rowley and Berman (2000), commenting on such practices:

Thus, by aggregating multiple dimensions into a composite measure, much of the


meaning and richness of the data is lost, and comparisons across firms (and
studies) are more difficult. Second, there is a question of whether all the
dimensions comprising the measure should receive the same weight. (p. 403)

In this light, consider what researchers are doing when they use the sums TS and TC in a
regression study. For example, take environmental performance and the effects of just two KLD
strengths on that performance. If the researcher treats each strength as unique, we would write
the following equation for a two-strength (Si) study of environmental performance (EP):

EP = b0 + b1*S1 + b2*S2 + e (1)

If the researcher followed what has become common practice since 2006, and aggregated the two
strengths, weighing them equally, equation 1 would become:

EP = a0 + a1(S1 + S2) + v (2)

Running model (2) with its single a1 coefficient means the researcher is imposing the
restriction on Equation 1, that b1 = b2. This ignores Rowley and Berman (2000). Whether in fact
b1 = b2 is, indeed, testable. Therefore, at a minimum, and independent of the problem of
attributing meaning to the estimated β values, an unrestricted regression, model (1), should be
run and the validity of the restriction, b1 = b2 (or in most KLD research b1 = b2 = b3 = …bn) should
be tested before using constructs such as TS and TC in further analyses. Imposing a restriction
that is not true introduces specification error.

Recent research (Albuquerque, Rui, Koskinen,and Zhang (2018), Bae, Kee-Hong, El


Ghoul, Guedhami, Kwok, and Zheng (2018), and Jo and Harjoto (2012)) has recognized a
problem caused by changes in the maximum number of KLD items over time. Those changes
could artificially increase the CSR score by adding a Strength variable or by eliminating a
Concern variable. These researchers have adjusted the sums of Strengths and of Concerns by
dividing by the maximum number of KLD items in the year of the observation. This puts each
measure on a common scale: “scaling alleviates the concern of a changing number of strengths
and concerns over time and across firms” (Albuquerque et al., 2018, p. 17). Their adjustment is
designed to address that particular problem. Because each of the three analyses then calculates
the CSR measure as the Strength sum minus the Concern sum, the measure continues to incur the
two problems we raise: the aggregate CSR measure is not a unique value but may correspond to

8
METHODS USED IN KLD RESEARCH

many different combinations of KLD item values, and it imposes a regression assumption of
equal coefficients.

Goss and Roberts (2011) remind us that the KLD variables are binary (i.e. nominal
variables), indicating only the presence or absence of a condition or influence. An aggregate
score such as net CSR = Total Strengths – Total Concerns gives the appearance of a cardinal
measure, (Goss and Roberts, 2011, p. 1797) but it is not a unique measure. One firm may have
more KLD Strengths than another firm. The other firm with fewer Total Strengths may commit
more resources to the Strengths it does have, and in fact exhibit greater Corporate Social
Responsibility. Any index that represents multiple KLD variables assumes equal or at least
known weights, which we find unlikely. Adding and subtracting KLD variables encounters a
Composition Error: what is true of a component (the presence of a recycling program indicates
greater social responsibility than the absence of one) is not necessarily true of the aggregate,
because the net CSR score is not a cardinal measure or even an ordinal measure of the extent of
resources devoted to socially responsible behavior. We believe that the aggregate scores such as
Total Strengths, Total Concerns and net CSR have low construct validity. When the original
KLD binary variables are used individually, the regression coefficient has the completely
legitimate interpretation as the effect of the presence (but not the extent) of an influence on the
dependent variable, ceteris paribus.

Factor Analysis and KLD’s Critical Characteristics

In Table 1, TS and TC were correlated at 0.13 but we note that the movement of many
researchers to two indices in lieu of the single KLD index was anchored in factor analysis. From
Mattingly and Berman (2006), it is clear, however, that these authors did not observe the
practical guidelines for the use of factor analysis as a data consolidation scheme.

Tabachnick and Fidell (2007) state that PCA (sometimes recognized as a form of factor
analysis) should not be attempted unless the data are generally correlated at 0.30 or above.
Lattin, Carroll, and Green (2003) suggest that researchers turn to alternate methods where
correlations in the ‘loadings matrix’ are less than 0.40. Experience has shown that low
correlations usually yield low loadings and make interpretation difficult. Others point out that
EFA only makes sense if there is a reasonable prime facie case that one or a few variables could
explain our observations (Bartholomew, Steele, Moustaki, & Galbraith, 2008).
Mattingly and Berman (2006) document their use of untested equally weighted strengths
and concerns using the KLD data. But there is more that warrants reexamination. They reduced
80 raw KLD variables to a set of twelve TS and TC indices (six sets of strengths and six being
weaknesses). Of 66 TSi and TCi correlation coefficients, only seven were above 0.30: the largest
two, at 0.53, one at 0.45, another at 0.40, two at 0.38, and one at 0.32 (Mattingly and Berman,
2006). Assessing these results and the other 60 lower correlation coefficients led Mattingly and
Berman to conclude that the separately summed indices were weakly related at most.

But this conclusion should now be reviewed in light of Tabachnick and Fidell (2007) and
Lattin et al. (2003). Since the TSi and TCi correlations reported by Mattingly and Berman (2006)
were very low, we can now conclude that their factor analysis (EFA) was ill advised, albeit it led

9
METHODS USED IN KLD RESEARCH

to a change in common practice — dropping the use of the KLD index and separately analyzing
the sums of strengths and concerns.

Of course, whenever TSi and TCi indices are used in exploratory factor analysis, the
untested assumption of equal weights for the constituent primary KLD variables holds and the
original data are restricted in use. Consider the indices that Mattingly and Berman (2006) used.
In fact, they selected sets of five or six different variables for each of their twelve indices, si,
strengths or ci, concerns and summed the observed 0 and 1 values of these variables for each
corporation i in their dataset as illustrated in equation 3 below:

TSi = si1 + si2 + ….+ si6 (3)

Recognizing that EFA is closely related to regression analysis, we can consider EFA
from two points of view. First, following Mattingly and Berman (2006), consider the sum of a
particular set of strengths, TSi, in equation 3, but now in terms of two factors, f1 and f2, and the
factor loadings (aij), the following equation holds:

TSi = ai0 + ai1f1 + ai2f2 (4)

Second, if we now write out the equations to explain these two factors but now using
several KLD indices (the TSii), we have the following:

f1 = c11 TS1 + C12 TS2 + ….C1n TSn (5)

f2 = c21 TS1 + C22 TS2 + ….C2n TSn (6)

where the Cij are estimated factor-score coefficients.

Clearly in equations 4, 5, and 6, for every TSi and, thus, in every separately summed set
of strengths, every strength is restricted to an equal weight inside each sum no matter how it is
used in subsequent analyses, and no matter the investment required to develop any one strength
or the penalties borne because of any one concern. Even more telling, the factors and loadings
for each summed index must be considered as not interpretable given the myriad but (to the
analyst) unknown ways for each variable in which each separately observed 1 or non-observed 0
contributes to every sum TSi whether a sum of strengths or concerns.

KLD’s Binary Observations and the Search for a Unified Index


There is yet another serious problem to be addressed when we move away from
Mattingly and Berman’s (2006) summed indices and attempt to use raw KLD data, that is,
corporation after corporation observations of each of the primary KLD variables si (or ci), not the
summed indices TSi (or TCi) discussed in the section above.

As almost every recently published paper notes, the KLD data are binary. Because the
KLD data are binary, we can argue that using metric models such as those specified in equations
(4), (5), and (6) to identify a latent factor or factors to ‘explain’ the raw observations of each
strength or concern is in error. In other words, assuming that the KLD data are metric and acting

10
METHODS USED IN KLD RESEARCH

as if metric models apply is an error because the use of a metric model is based on false
assumptions (Bartholomew, Steele, Moustaki, and Galbraith, 2008).

Metric computations will yield factors, say here, f1 and f2 that, like the error ei, in
equation 7, can take any value with factors and error being independent of each other, and, so,
corollarilly the raw si (and corollarilly ci) could also have any value.

si = ai0 + ai1f1 + ai2f2 +ei (7)

All very well were the data metric, but the raw data — the observations for each variable
‘si’ and ‘ci’ are binary — and can only have the value 1 or 0. Thus, Bartholomew et al. (2008)
explain, “Therefore, the linear factor model is invalid for categorical variables in general and for
binary variables in particular” (p. 211).

Given this fact, where KLD is the primary data source, researchers might set out to
identify a latent variable that could carry the full information carried by the KLD data — perhaps
an unobservable variable they could label ‘CSR’. Before moving to further analysis,
Bartholomew et al. (2008) suggest researchers first establish the case for ‘expecting’ that
common factors could account for all relationships between the original observations.

Is such an explanation possible in the context of KLD research? Let’s look more closely
at the data describing our three industries SIC 30, 31, and 32. Table 3 shows that the counts of
strengths and concerns for industries 31, 32, and 33 reveal that the number of concerns is
substantially greater than the number of strengths by a considerable margin. Despite the fact that
the KLD readers were looking for evidence of five concerns and four strengths, they seem to
have found concerns about twice as often as strengths; the data are skewed.

Table 3. KLD Observations of Concerns and Strengths


SIC 31, 32 and 33 (2,539 case/years)
Concerns Number
Observed
Hazardous Waste 670
Regulatory Problems 794
Ozone Depleting Chemicals 55
Substantial Emissions 766
Agricultural Chemicals 158
Total 2344
Strengths
Observed
Better Products and Services 300
Pollution Prevention 220
Recycling 205
Clean Energy 227
Total 952

11
METHODS USED IN KLD RESEARCH

To further assess the skewed character of the KLD data in our three-industry sample
(encompassing the KLD data on companies in SIC 31, 32, and 33), we sorted the data by
strengths, and the, concerns, and counted them. Our count, shown in Table 4, revealed that that
34.6 percent of our three-industry KLD database was “not observed.” It also showed that 65.1
percent was either not observed (0) or characterized by only one observed strength or concern.
The high proportion of the dataset involving no or one observation suggests that the KLD
strengths and concerns are, in general, not closely associated with one another. This lack of
association makes it unlikely that any one latent variable could explain every KLD observation
in this sample. Only 17 percent of our observations were of corporations with at least one
strength and at one concern. We labeled this group mixed (see Table 4).

Table 4. Character of the KLD Environmental data


SIC 31, 32 and 33 [2539 Case Years]

Concerns or Strengths % of Cumulative


Observed cases %

Neither Strength nor 34.6% 34.6%


Concern Observed
Only One Strength 13.2% 47.8%
Observed
Only One Concern 17.3% 65.1%
Observed
Multiple Concerns 17.8% 82.9%

Mix of Strengths and 17.0% 99.9%


Concerns Observed

We then explored the patterns of the KLD observations firm by firm, and year by year,
again following Bartholomew et al. (2008). These nine variables have 29 (512) possible
combinations. To keep the task manageable and our report simple, we first looked at the
strengths (24 or 16 possible combinations), and then the concerns (25 or 32 possible
combinations). A one (1) indicates that a particular strength was observed. Zero (0) indicates it
was not. The observation pattern 1101 in Table 5 indicates that two strengths, beneficial products
and services and pollution prevention, were observed as reported in columns one and two.
Recycling, a third strength, was not observed, as reported in column 3, while clean energy was
observed as reported in column 4. The patterns of concerns in Table 6 have a similar structural
logic.

12
METHODS USED IN KLD RESEARCH

Table 5. Repeated Observed Patterns: Strengths Only


(14 of 16 possible patterns) [2539 case/years]
Observed # Observations Patterns of Strengths
Pattern Observed
With High Frequency

1110 3
1101 8
1100 14
1010 8
1001 41

1000 226 Beneficial Products &


Services
0111 6
0110 11
0101 42
0100 136 Pollution Prevention

0011 12
0010 165 Recycle
0001 118 Clean Energy

The complete sets of results of these steps in our analysis are reported in Tables 5 and 6.
For the strengths, as Table 5 shows, high frequency counts were found but only ‘one strength at a
time’ (i.e., not in clusters, as expected) and two of the 16 possible patterns (12.5 percent) were
never observed. For concerns, as Table 6 shows, of the 32 possible patterns that might have been
observed, nine (28 percent) were never observed, percentages that are consistent with reports by
Bartholomew et al. (2008). We found only three high-frequency patterns where two concerns
were observed clustered together and one pattern which included three (3). As Table 6 shows,
the other high-frequency “patterns” involved only one concern.

13
METHODS USED IN KLD RESEARCH

Table 6. Repeated Observed Patterns: Concerns Only


Of the 32 possible patterns, only 23 were observed(2,539 case/years)
Observed #Observations Patterns of Concerns
Patterns Observed
with High Frequency

11111 23
11101 5
11011 27
11010 182 Hazardous Waste, Reg. Problems,
Substantial. Emissions
11001 11

11000 141 Hazardous Waste, Reg. Problems


10110 7
10011 3
10010 82 Hazardous Waste, Subs. Emissions
10001 22

10000 167 Hazardous Waste


01101 3
01011 16
01010 109 Reg. Problems, Subs. Emissions
01001 15

01000 262 Reg. Problems


00111 5
00101 9
00100 3
00011 5

00010 208 Subs. Emissions


00001 15

00000 1218 No Concerns Observed

Tables 5 and 6 clearly show that in the 1991–2008 dataset, which included hundreds of
corporations, repeated patterns of observations of multiple strengths and concerns are rare. This
finding should raise additional doubts about the merits of a quest for a ubiquitous latent CSR
factor.

Next, following Bartholomew et al. (2008) and their recommended final step to develop a
case for the effort to develop a single CSR index from the KLD raw data, we tested the
associations of the KLD variables, pair by pair. The results, shown in Table 7, were estimated on
the whole three-industry sample. In contrast, the results in Table 8 were estimated separately,
industry by industry.

14
METHODS USED IN KLD RESEARCH

Table 7. Associations of KLD (0/1) Concerns and Strengths


ALL Hazard Reg. Ozone Subs Agric BPS Pollut. Recycle Clean
Waste Prob. Depl. Emissions Chem Prevent. Energy
31, 32, 33
n=2,539
Hazardous * .000 .000 .000 .000 .022 .000 .012 .000
Reg Prob * .000 .000 .000 .000 .000 .031 .000
Ozone Depl * .000 .000 .138 .915 .084 .323
Emissions * .000 .005 .000 .000 .000
Agric Chem * .027 .000 .000 .000
BPS * .816 .003 .000
Pollution * .571 .000
Prev
Recycle * .923
Clean Energy *
Note:(p value of χ2)

In Table 7, the three-industry pool, it seemed that most KLD variables had significant
associations with others. Twenty-nine (29) of the 36 χ2 tests in Table 7 were significant with p =
0.000 indicating likely relationships. Only seven of the 36 χ2 tests point to no association, four of
these involving ozone-depleting chemicals.

Importantly, when we reviewed the same KLD pair to pair associations across each of
our three industries (SIC 31, 32, and 33), the results were quite different: There were fewer
associated pairs. The results of Table 8 suggest, first, that variable-to-variable associations varied
industry to industry. Second, the patterns of p = .000 results industry by industry indicate that
there is no one KLD variable, either strength or concern, that is related to every other KLD
strength or concern within any industry. This is true although in Table 8 for both Industry 31 and
Industry 33, there were only 15 unassociated pairs of the full set of thirty-six (36), while in
Industry 32 there were only 12 unassociated pairs. BPS was not associated with any other KLD
variable in Industry 31 although agricultural chemicals was associated with the other four
concerns in that industry. Recycling had limited associations with other KLD variables in
Industry 32 although it seems that hazardous waste and regulatory problems may be related to
every KLD variable except recycling, again, in that industry. Clean energy seems related to
every other concern and strength except ozone depleting chemicals. Ozone depleting chemicals is
related to only agricultural chemicals in Industry 33.

The industry-by-industry investigation of Table 8 suggests, therefore, that the search for a
new unifying single index based on the KLD data, even if a separate index is calculated for
particular industries one by one, is likely to be fruitless. Further, note that in our three-industries
study, only small numbers of the KLD variables that appeared to be related in some way to
others were observed in combination among the higher-frequency patterns presented in Tables 5
and 6. This fact also suggests that any search for a verifiable unified, that is, single KLD index is
likely to be fruitless. Recall that Table 5 reported no repeated patterns of strengths in
combination with other strengths. And, although Table 6 (exclusively focused on concerns)
showed as many as three concerns in one repeated pattern (182 observations of 2,539 possible),
the greatest number was for unobserved (1218/2539) with another 637/2539 being one concern
at a time. Beneficial products and services was “absent” in Industry 31. In Industry 32, recycling

15
METHODS USED IN KLD RESEARCH

was absent while in Industry 33, ozone depleting chemicals had the largest number of no
‘associations’.

Table 8.Industry Specific Associations of the KLD Environmental Variables


31 n=199 Hazardous Reg. Ozone Subs Agric BPS Pol. Recycle Clean
Waste Prob. Depl Emissions Chem Prev. Energy
Hazardous * .276 nr .136 .000 .447 nr .088 .088
Reg Prob * nr .000 .000 .106 nr .173 .006
Ozone Depl. * nr nr nr nr nr nr
Emissions * .038 .219 nr .053 .006
Agric Chem * .478 nr .111 .111
BPS * nr .822 .324
Pol. Prev. * nr nr
Recycle * .315
Clean *
Energy
No Association: 15

32 n=1148 Hazardous Reg. Ozone Subs Agric BPS Pol. Recycle Clean
Waste Prob. Depl Emissions Chem Prev. Energy
Hazardous * .000 .000 .000 .000 .002 .000 .078 .000
Reg Prob * .004 .000 .000 .000 .001 .614 .000
Ozone Depl. * .000 .000 .338 .845 .186 .086
Emissions * .000 .081 .000 .120 .000
Agric Chem * .231 .000 .003 .001
BPS * .186 .287 .009
Pol. Prev. * .183 .000
Recycle * .024
Clean *
Energy
No Association: 12
33 n= 1182 Hazardous Reg Ozone Subs Agric BPS Pol. Recycle Clean
Waste Prob Depl Emissions Chem Prev. Energy
Hazardous * .000 .314 .000 .051 .830 .000 .398 .000
Reg Prob * .620 .000 .000 .493 .000 .000 .000
Ozone Depl. * .256 .000 .167 .365 .315 .298
Emissions * .000 .044 .151 .000 .003
Agric Chem * .060 .594 .171 .000
BPS * .333 .003 .000
Pol. Prev. * .001 .000
Recycle * .009
Clean *
Energy
No Association: 15
Note: (p value of χ2 )

To summarize, our assessments of the KLD data relevant to SIC 31, 32 and 33, as
reported in Table 8, indicate that a search for a universal latent factor applicable across the
multiple industries that are included in the KLD’s database is likely to be fruitless for the
following five reasons:
 Too many of the recorded KLD observations for each variable are 0s (not observed).

16
METHODS USED IN KLD RESEARCH

 Repeated patterns of paired and multiple variable observations are rare.


 With respect to strengths, the only “patterns” repeated in large numbers were not actually
patterns, but observations of individual strengths.
 The identified variable-to-variable associations or relationships differed by industry.
 Significant pair-wise associations industry by industry appear to have no relationship to
the pattern of the primary KLD observations.

Despite our skepticism that a unified CSR index can be found, we appreciate that Carroll
et al. (2016) saw the situation differently and set out to create an “improved” unified social
responsibility index. They named it the D-Social–KLD index. It was developed using instrument
response theory (IRT) to consolidate the information content of the KLD entire database, 2.7
million actual observations for 650 to 3,100 corporations and 80 indicators, after allowing for
about 6 million missing observations (See Carroll et al., 2016: 70). Their data set overlaps with
ours for 1991 – 2008, and they have an additional four years of data.

To establish the bona fides of the improvement they set out to achieve, Carroll et al.
(2016) compared their D-Social-KLD index to the original KLD index [TS-TC], following
precedents set by Sharfman (1996) and, later, by Hart and Sharfman (2012). Notwithstanding
these two precedents, this choice seems unfortunate since the classic KLD index [TS-TC] as
used has been statistically flawed as we have argued earlier, its being composed of the untested
sums of both strengths and concerns. To some extent, their choice of a direct “test” is
understandable. Rather than examining the 280 possibilities they were working with to establish
patterns of decisions, their frequency and variable-by-variable associations as part of the
justification would be formidable.

Given our skepticism of the prospects for a unified index derived from the KLD data, we
examined Carroll et al. (2016) carefully. Some of their remarks stand out. In particular, we noted
that Carroll et al. (2016) could not provide ex post explanations for the relative sizes of the
indices of several specific pairs of corporations. Attendant problems associated with an inability
to interpret their index also become clearer when Carroll et al. (2016) offer explanations for
‘observed’ changes in the D-Social index scores for particular corporations over time. First,
according to Carroll et al. (2016), Walmart’s D-Social-KLD index rose to a very high level in
2005 after Hurricane Katrina. Apple’s D-Social-KLD score rose when Steve Jobs returned to
lead the company in 1997. Interestingly, Exxon/Mobil had a low score in Chen and Delmas
(2011) while its D-Social-KLD score ranked alongside such high scorers as Apple, IBM, and
GE, but slightly below Walmart (Carroll et al., 2016). Each explanation offered by Carroll et al.
(2016) seems quite reasonable at face value being specific to each compared pair of corporations.
Nevertheless, it is difficult to see how these unique ex post explanations which draw on sources
outside the KLD dataset can be extended as ground for new theory development or corporate
advisement.

Carroll et al. (2016) detected a “slight downturn” in the value of their D-Social-KLD
index for “many” firms during 2009 – 2012, exhibited in their Figure 2, which they attributed to
business cycle influences (p.75). We note that our shorter time period includes two complete
business cycles and the start of the 2008-2009 recession. Because Carroll et al. (2016) did not
report their pre-estimation assessments of the KLD dataset or the factor loadings of each of the

17
METHODS USED IN KLD RESEARCH

eighty (80) KLD variables they employed, the full rationale for their effort is unclear,
considerable information is lost, and one might surmise that their difficulties may simply indicate
that their D-Social–KLD index, like the original KLD index, cannot be interpreted. Our results
and Tables 5, 6, and 8 in particular suggest, that at least to some extent, this failure might be
better explained by the authors’ failure to consider industry differences.

Thus, moving ahead to conclude this phase of our own assessment of indexing the KLD
data, we decided to focus the next phase of our analysis on single industries and to analyze
strengths and concerns separately to see if we could identify useful strength and concern indices.
To this end, like Carroll et al. (2016), we used instrument response theory, not to create a set of
industry specific indices, but to explore the relationships of the four strengths to an unobserved
latent strength (LS) within three industries. We followed this with an exploration of the
relationships of our five observed concerns with an unobserved latent concern (LC).

To further separate our approach from Carroll et al. (2016), we decided to also explore
industry behavior across time. Rather than assuming one index prevailing across time, we sought
to understand the two latent variables LS and LC in three consecutive time periods, 1991 to
1996, 1997 to 2002, and 2003 to 2008. Given the shifts in environmental laws, social attitudes
towards pollution, and technological innovations, this seemed quite reasonable.

As Table 9 shows, there were no significant results for Industry 31 when we examined
the loadings of the KLD variables on the latent strength and concern variable. This is probably
because the number of observations available for Industry 31 was too small, a fact attesting to
yet another challenge when using KLD for industry studies.

18
METHODS USED IN KLD RESEARCH

Table 9. Factor Loadings of KLD Strengths and Concerns on the unobserved latent variables
LS and LC*

Industry 1991-1996 1997-2002 2003-2008

32 LS Loading Constant Diff Loading Constant Diff Loading Constant Diff


BPSs nsr nsr nsr nsr 0.76 -2.36 3.11
Pol. Prev. nsr nsr -1.22 -1.99 -1.63 2.07 -3.34 1.61
Recycle nsr nsr nsr nsr -0.85 -3.15 -3.71
Clenergy nsr nsr nsr nsr nsr nsr

LC
Hazwaste 1.21 -0.02 0.02 nsr nsr 2.56 -1.89 0.74
Reg Prob 0.91 0.42 -0.46 0.85 -0.24 0.28 1.47 -0.72 0.49
Ozone
Depl. nsr nsr nsr nsr nsr 1.05 -4.21 4.01
Emissiosn 0.84 -0.48 0.57 0.71 0.63 -0.89 1.95 -1.38 0.71
Agri
Chem 1.41 -2.03 1.44 1.42 -2.88 2.03 0.91 -2.85 3.13
33 Ls
BPSps 0.9 -2.87 3.19 -1.03 -1.4 -1.36 0.66 -2.27 3.44
Pol. Prev. 0.83 -1.59 1.92 -1.21 -2.81 -2.32 1.42 -3.74 2.63
Recycle nsr nsr 335 -292.9 0.87 1.55 -3.77 2.43
Clenergy nsr nsr -5.18 -7.44 -1.44 2.02 -4.01 1.99

Lc
Hazwaste nsr nsr 0.8 -0.75 0.94 2.41 -3.49 1.45
Reg Prob nsr nsr 0.98 -0.9 0.92 1.74 -2.78 1.60
Ozone
Depl. -4.49 -8.14 -1.81 nsr nsr Omitted
Emissiosn -1.43 1.71 1.20 1.1 -1.5 1.36 1.44 1.93 -1.34
Agri
Chem -31.7 -47.98 -1.51 -1.79 -3.8 -2.12 0.52 -5.53 10.63
nsr – non significant result
* Convergent Solutions in bold type. Other solutions failed to converge after 100 iterations so, in general, these
results must be viewed with caution. LS and LC scores in non-convergent solutions oscillated less than 6%.

However, we can report limited results for Industries 32 and 33 as Table 9 shows. Note
that only three of the twelve models estimated for Industry 32 and 33 (six each) converged on a
solution. The rest did not (albeit after 100 iterations) and so these non-results should be read with
caution.

The three sets of fully convergent results in Table 9 show that the loadings of the
individual KLD strengths and concerns do vary both across time and by industry, sometimes
with different signs among the four strengths and, similarly, among the five concerns. Because
industries have different technologies and different environmental postures and, as noted above,

19
METHODS USED IN KLD RESEARCH

because of shifting laws, attitudes, and technological innovations, such differences should be no
surprise.

What do these results suggest for further research using the KLD data? First, the different
results across industry and time mean that there is no way to establish a credible unified CSR
multi-industry index that holds across time. Second, where we acknowledged the binary nature
of the KLD observations, the limited number of convergent results reported (three of nine where
an optimum solution was obtained) means that although industry specific studies using KLD are
possible, they are likely to prove difficult.

KLD Observations as Independent Variables


Given the results reported thus far, we ask, can we use the raw KLD data as independent
variables in carefully constructed regression studies and expect to learn more about the
relationships between corporate social responsibility, corporate social performance,
environmental performance, and corporate financial performance?

Here we will explore the simple relationship between the nine KLD environmental
variables of Table 3 and environmental performance (EP). Because it seems reasonable to
assume that EP in one year is usually correlated with EP in prior years, we will take care to
manage serial correlation in this study. We will also test for differences in these relationships
across industries and whether these relationships vary with EP as implied by Hart and Ahuja
(1996). Such differences are rarely attended to in CSR research although exceptions include
Hughes (2000); Clarkson, Li, and Richardson (2004); and Clarkson, Li, Richardson, and Vasvari
(2011) who, as most readers appreciate, did not use KLD data. We will use quantile regression to
estimate our model because it is especially suited to exploring relationships that vary with the
level of the dependent variable as implied by the literature cited earlier in this paragraph
(Koenker and Hallock, 2001).

Our proxy measure of EP is toxic releases. And, so, we specify a simple model, Equation
8, as follows:

Ln TR/Sales = f (five KLD concerns, and 4 KLD strengths) (8)

In other research (Hatten, Keeler, James and Kim (2018)), we report estimation results
for environmental performance with extensive control variables. The control variables did not
have a large effect on the estimated coefficients for the KLD variables. Our purpose here is to
explore the relationship between Environmental Performance (toxic releases) and the
Environment components of CSR at different levels of pollution emissions. We chose to
estimate a simple model with only the KLD variables, to focus on the differences in KLD
variable coefficients. We believe the estimates were well designed to explore differences in the
EP – CSR relationship at different levels of pollution.

Toxic releases (TR) is an output-based measure consistent with King and Lenox (2002).
It is an inverse measure of EP — a smaller TR indicates a better EP. Note that we use “ln
TR/Sales” to control for scale and technology differences, which are substantial across this

20
METHODS USED IN KLD RESEARCH

study’s numerous cases. Chatterji et al. (2009) used TR to study 37 industries using dummy
variables, following accepted practice and assuming equal effects across industries. We take a
different path and test for industry differences.

To use the TR data, we aggregated toxic releases of each firm’s separate reporting
facilities to the firm level. U.S. corporations annually report their toxic releases to the EPA,
facility by facility, and these data are available to researchers in the EPA toxic release inventory
(EPA TRI) database. For this study, as before, our focus is on SIC industries 31, 32, and 33.

First, we checked for the presence of serial correlation in the complete sample and within
each of the three industries. We asked: Is TR at year t, a function of prior TR? Clearly, the
residuals of an estimate of equation 1 were serially correlated as indicated by the very substantial
increase in R2 when the first lag of the dependent variable (DVL1) was added to equation (1): for
Industry 31, R2 increased from 0.03 to 0.56; for Industry 32, from 0.10 to 0.89; and for Industry
33, from 0.12 to 0.82. We labeled this model equation (9):

Ln TR/Sales = f (DVL1, five KLD concerns, 4 KLD strengths) (9)

Because we used panel data, after a Hausman test, we adopted a fixed-effect model with
robust standard errors. It was clear from the Wald tests that the relationship specified in equation
8 varied across Industries 31, 32 and 33 and so our results refer only to those specific industries.
Finally, we explored the heterogeneity of the relationship specified in Equation 9. In a
multi-industry study, Hart and Ahuja (1996) reported that the relationship between FP and EP
varied with EP. Clarkson, Li, and Richardson (2004) and Clarkson, Li, Richardson and Vasvari
(2011) reported similar results. Thus, we expected that the relationships between toxic releases
and the KLD variables would vary with the level toxic releases. To this end, as noted, we turned
to a simultaneous quantile regression, specifying nine quantiles from quantile 10 (Q10) to Q90
for the dependent variable.

Quantile regression allows us to determine the extent to which the TR-KLD relationship
varies with TR more exactly than by creating one or two arbitrary cohorts or by partitioning the
data. Sound practice allows separation on levels of an independent or explanatory variable, not
on a dependent variable. Dividing or partitioning a sample on an independent variable, an
example of exogenous sample selection, and OLS, provides unbiased and consistent estimators
of the β coefficients in such cases (Wooldridge, 2013). Here, for example, if we were committed
to an OLS approach, we might separate large from small firms or growing from declining firms.
Hart and Ahuja (1996) employed this research strategy using FP as the dependent variable,
splitting their sample on values of an annual emission efficiency index, into high- and low-level
polluters. Then they estimated separate functions for the high- and the low-level polluters. They
concluded that the determinants of financial performance differed between the two groups.

Quantile regression in contrast to OLS allows the researcher to avoid partitioning the data
on the dependent variable and the bias that this generates (Koenker and Hallock, 2001; Cameron
and Trivedi, 2009). Quantile regression considers the full dataset. Essentially, quantile regression
allows a real test of the hypothesis that the relationship of EP with several independent variables,
in our case, the nine KLD environmental variables, is invariant with EP. Instead of dividing the

21
METHODS USED IN KLD RESEARCH

sample by percentiles of the dependent variable, the entire sample is used to estimate the
regression model for each quantile. Quantile regression defines the residuals as absolute values
relative to a chosen percentile value, assigning different weights to the residuals when estimating
the equation parameters’ percentage, quantile by quantile. Overestimates and underestimates
from the quantile medians are given different weights based on the quantiles (%) chosen but
include every observation in each estimate.

Considering this and our earlier finding that the error term in the TR relationship equation
(8) was serially correlated, we estimated the industry quantile regressions with the lagged
dependent variable (DVL1) included in the model. The results, shown in Table 10, are specific to
the 0.2 and 0.8 quantiles of the dependent variable, ln (TR/Sales), or Q20 and Q80, respectively.
They indicate that, as expected, the relationship specified in Equation 8 varies with Toxic
Releases concerns, industry by industry.

22
METHODS USED IN KLD RESEARCH

Table 10. Quantile Regression Results for Q20 and Q80


(Only the KLD variables that were significant @ p ≤ 0.05 are in bold print)(500 repetitions)

DV is ln TR/Sales Industry 31 Industry 32 Industry 33

Q20
R2 .765 .791 .722
β P β p β p
DV Lag 1 .977 .000 1.010 .000 .992 .000
Hazardous Waste .109 .596 .117 .002 .189 .011
Regulatory Problems .315 .021 .059 .126 .091 .287
Ozone Depl. Chemicals *collinear -.065 .409 -.226 .859
Substantial Emissions -.028 .889 .034 .388 .077 .346
Agricultural Chemicals -.195 .386 .035 .562 .183 .419
Bene. Prod. & Services -.214 .779 .051 .463 .076 .592
Pollution Prevention *collinear .010 .863 .023 .816
Recycling .385 .044 .139 .007 .057 .677
Clean Energy .410 .196 -.084 .372 .004 .963
Constant -.589 .004 -.569 .000 -.694 .000

Q80
R2 .792 .753 .740
DV Lag I .924 .000 .958 .000 .921 .000
Hazardous Waste .243 .248 -.039 .292 -.195 .599
Regulatory Problems .003 .980 .010 .749 -.159 .627
Ozone Depl.Chemicals *collinear .045 .616 -.634 .573
Substantial Emissions .043 .794 -.034 .267 .015 .082
Agricultural Chemicals -.326 .295 -.007 .929 -.847 .541
Beneficial Product & .693 .344 .046 .508 -.088 .614
Services
Pollution Prevention *collinear -.019 .632 -.291 .103
Recycling .063 .640 .071 .325 -.071 .110
Clean Energy .329 .286 -.001 .988 -.205 .418
Constant .386 .000 .377 .000 .176 .000

Note that in Table 10, few KLD strengths and concerns are significant. In addition, it is
clear that for Industry 31, the strength, recycling, has a positive relationship with toxic releases
in Q20, not the expected negative relationship that common practice attributes to KLD-labeled
strengths. The concern, regulatory problems, has a positive sign in Q20, again in Industry 31.
That sign conforms to prior expectations and common use. In Q20, Industry 32 has two
significant coefficients, one for hazardous waste and the other for recycling; the latter variable
has a sign that does not conform to common use. For Industry 33 in Q20, only hazardous waste
was significant at p ≤ 0.05 in Q80. Substantial emissions was marginally significant at p = 0.082.
The signs of these two variables conform to common use.

23
METHODS USED IN KLD RESEARCH

At this point, note the very large β coefficients estimated for the first lag of the dependent
variable, ln (TR/Sales), in both Q20 and Q80. They are ≈ 1.0 across the three industries, SIC 31,
32, and 33. These large estimated β coefficients suggest a high probability that the robust
standard errors estimated for the other explanatory variables in this study could be understated,
exaggerating the apparent significance of the few KLD variables identified. As is well known, in
these circumstances, the β estimates are unbiased (Wooldridge, 2013).

What do the results mean? Given the small number of significant results (reported in
Table 10), we see no compelling evidence of a strong relationship between EP and the nine KLD
environmental variables (concerns and strengths) in this study. Thus, although the results include
a few significant KLD variables, in light of the full set of results and the estimated β coefficients
of the first lags of the dependent variable that are near 1.0, it may well be that no single KLD
variable is significantly related to EP in this research context.

Our results, and especially the very large estimated β coefficients of the lagged dependent
variable, suggest that serial correlation is a major characteristic of the EP world, which is
deserving of further attention in research aiming to identify consistent estimators of any
explanatory variables.

Rather than overburdening the reader with large sets of statistical results (available on
request), Table 11 summarizes the significant results for the complete set of nine quartiles, one
set for each of the three industries studied, for each of the nine KLD variables. Table 11 enables
readers to assess the ability of KLD to explain toxic releases in our three industries, given the
severe autoregressive nature of EP. Table 11 also invites readers to reconsider the limited extent
to which the signs of the KLD strengths and concerns conform to common use in this study and
what these results imply for their own research.

24
METHODS USED IN KLD RESEARCH

Table 11. Summary of Regression Results for Nine Quantiles


ln TR/Sales = f( DVL1, five KLD concerns, 4 KLD strengths)…..(2)
(β/p)

Industry 31 Industry 32 Industry 33

Quantile Significant KLD Significant KLD Significant KLD


Variables Variables Variables

10% Hazardous Waste Hazardous Waste


(.154/.042) (.548/.000)
Regulatory Problems
(.174/.067)
Recycling x
(.297/.002)
20 Regulatory Problems Hazardous Waste Hazardous Waste
(.315/.010) (.117/.001) (.189/.014)
Recycle x Recycling x
(.385/.037) (.139/.007)
Clean Energy x
(.410/.046)
30 Regulatory Problems Hazardous Waste Substantial Emissions
(.196/.040) (.085/.003) (.116/.052)
Recycle x Regulatory Problems
(.266/.073) (.061/.070)
Clean Energy x Beneficial Products x
(.308/.012) (.109/.037)
Recycling x
(.097/.067)
40 Beneficial Products x Substantial Emissions
(.074/.048) (.073/.098)
Recycling x
(.105/.045)
50 Recycling x
(.102/.015)
60 Recycling x
(.068/.092)
70

80 Pollution Prevention
(-.152/.075)
90 Clean Energy x Pollution Prevention
(.603/.079) (-.288/.020)

Referring to Industry 31, Food, Textiles and Leather, in Table 11, we find 17 significant
coefficients across nine quantiles for the complete set of KLD variables at p ≤ 0.05 (with another
eight at 0.05 < p ≤ 0.10). We note that all but one of the 14 strengths that appear to be significant

25
METHODS USED IN KLD RESEARCH

at p ≤ 0.10 have unexpected signs based on common use. Recycling and clean energy, as well as
beneficial products and services in Industries 31 and 32 have positive signs. Common use gives
them negative signs. Other exceptions were found in Industry 33 in Q80 and Q90 where
pollution prevention has a negative sign. In Q80, pollution prevention is significant at p = 0.075
whereas in Q90, it is significant at p = 0.02, a stronger result.

The largest number of significant results was found in Industry 32, Paper, Petroleum,
Plastics, and Cement manufacturing. As Table 11 shows, hazardous waste is significant in Q10 –
Q30, among firms with better EP. Recycling was significant across a wide swathe of the EP
spectrum, Q10 – Q60. These results suggest that in Industry 32, at least, recycling is a corporate
commitment associated with more pollution, not less. Its positive sign in each quantile is
contrary to the implicit assumption of common practice that recycling is negatively related to
toxic releases. Industry 33, Primary Metals and Transport Equipment, is the only industry where
the two KLD variables, substantial emissions and pollution prevention, are significant. It is also
the only industry where every significant KLD variable had the expected sign. Comparisons of
the patterns of significance across our three industries make it clear that industry differences are
material.

The evidence afforded by a total of 17 significant results at p ≤ 0.05 out of the 243
estimated coefficients in Table 11 is not strong enough for us to certify KLD a reliable source of
corporation-specific environmental data. First, these limited results have to be weighed in light
of the implications of the consistent set of estimated β ≈ 1.0 for the lagged dependent variables
(DVL1). Those results should raise questions about the reliability of the specific results that
appear to be significant. Second, we had to weigh the implications of the fact that in this study,
we estimated a total of 243 KLD coefficients (three industries * nine quantiles * nine KLD
environmental variables). Of these, as Table 10 reports, a total of 17, or ~ 7.4 percent, were
significant at p ≤ 0.05. Another nine were significant at 0.05 < p ≤ 0.10. But, of the 25 strongly
or marginally significant KLD coefficients, 13, or 52 percent (marked X above), had signs
opposite to what is expected when guided by common practice. In light of the complete set of
results, the KLD environmental variables appear to have limited value as explicators of
environmental performance.

Finally, two other sets of results may be of interest to those who wish to extend their
research in new directions using KLD. First, we repeated the regressions reported in Table 11 for
eight separate 4-digit SIC industries — the results were consistent with those for industries SIC
31, 32 and 33, again including β estimates for DVL1 ≈ 1.0.

Because our analysis of the KLD dataset structure highlighted the number of unobserved
(0) records and the large proportion of the data that involved only one strength or concern, we
repeated the same regressions as reported in Table 11, but this time for the subset of our total
sample that had a mix of concerns and strengths (17% of the entire sample shown in Table 4).
These were the corporations where KLD had observed at least one strength and at least one
concern. The results were similar to those described in Tables 9 and 10. The only significant
KLD variable, regulatory problems, was in Industry 33 (Primary and Fabricated Metals,
Machinery, Computers and Electronics, Transportation Equipment, etc.) in Q60, at p ≤ 0.05.

26
METHODS USED IN KLD RESEARCH

There are several checks for the robustness of our estimates of the Environmental
Performance model. In addition to the estimate in Table 10 for the full time period, we made
separate estimates by time period to allow a direct comparison to the estimates by Carroll, Primo
and Richter (2016). These estimates follow the same factor analysis. We then estimate the
relationship using KLD variables as individual binary variables. We report separate estimates by
the three SIC industries, but we have also estimated the relationship with all three industries in
the sample. We have made separate estimates of the same relationship for eight 4-digit SIC
industries. The Quantile Regression estimates explore the same relationship for nine different
levels of pollution. In Table 11 we consider a subset of the sample characterized by a balance of
both Strengths and Concerns.

Discussion and Implications


We have now completed our assessment of the methods commonly used in research
relying on KLD data and, in particular, on the continued use of the KLD index or the separate
use of its two summed components, TS and TC.

Perhaps KLD’s observations, or more precisely their nominal measurements, are simply
too skewed, too imprecise, and too limited with respect to the concerns and strengths of any one
corporation to meaningfully inform CSR research. Recall, only 17 percent of our sample was
corporations with at least one strength and at least one concern. Moreover, given the telling
impact of industry differences and the limited depth of the KLD observations in particular
industries (such as our SIC 31), it seems unlikely that KLD data can help us develop a robust and
interpretable and, so, useful, CSR indices by strengths or concerns, even at the industry level.

Part of the problem may be what KLD defines as a strength (or a concern). In Table 1, we
saw strengths and concerns that were correlated with signs that were inconsistent with common
use. In the IRT study of strengths and concerns reported in Table 9, again we saw loadings with
signs that suggest some strengths (and some concerns) are related to one another in ways that are
at odds with KLD’s categorizations of strengths and concerns.

This is surely a subject that warrants further investigation. The evidence of this study
suggests that academic researchers may need to independently validate each “KLD strength” or
“KLD concern” for themselves. The KLD definitions do not stand up to a close review as
explained above. Further in this regard, as Table 9 shows, again given the limited number of
convergent results in the 1991-2002 period, and the complete set of convergent results for
strengths and concerns 2003 to 2008, prudent researchers might be advised to sensibly ignore
data prior to, say, 2000 since it is only after that data that the skewed character of the KLD data
(which is dominated by concerns) is somewhat ameliorated by increases in the numbers of
corporations monitored and the attention paid to a wider array of social indicators.

Our assessments of KLD in use suggest, therefore first, that the methods used to explore
the relationship of CSR and corporate performance using KLD quite often have been flawed; and
second, that KLD’s information content is limited in part because the structure of the KLD
dataset is confining.

27
METHODS USED IN KLD RESEARCH

This last statement itself has implications. It suggests that reported findings based on
KLD data prior to 2000 need reassessment, especially where any of the three KLD indices have
been used and industry differences have not been tested. Prudence suggests that what is accepted
as articles of faith in the CSR community (Mattingly, 2015) needs to be reexamined.

With respect to the KLD data, perhaps it is time to use it with greater attention to its
limits and to choose research methods that fit the data. There may be, for example, some specific
industries where the impact of one or two specific KLD variables (S or C) on, say, TR, could be
informative. Researchers might total the strengths and concerns separately for a number of
corporations and distinguish groups of companies with high TS or high TC. Then, they could
take a critical step and specify their model without TS or TC as independent variables. If they
estimate that model separately for different groups of companies, they might help us determine
whether high-TS or high-TC companies have more effective CSP. In this vein, Chatterji and
Toffel (2010) used the KLD concerns as markers to separate poorly rated firms (those with only
concerns) from the rest, and then partitioned their data and, used dummy variables for the two
groups to explore how poorly rated firms responded to those low ratings in later years.

A second possibility would be to focus on greenwashing, the practice of promoting


apparently environmentally friendly programs to deflect attention from an organization's
environmental failures (Marquis and Toffel, 2012). In our dataset, however, the distributions of
strengths and concerns are skewed. The dataset is dominated by corporations having only one
KLD environmental strength or only one concern, making it impossible to find sufficient
numbers of pure TS or pure TC companies to meaningfully study greenwashing as set out here.
KLD is not easy to mine.

Rather than mining KLD, there are other possibilities. Several researchers have used
KLD data but avoided the assumed common effect and limited information problems referred to
in this paper. Becchetti, Ciciretti, Hasan, and Kobeissi (2012) used the cessation of the KLD
observation of particular companies as an event marker in a study of the impacts of this
withdrawal on stock prices, avoiding the problems faced when using KLD data as indices.
Groening and Kanuri (2013) examined the addition of a KLD strength or concern to corporate
profiles, treating each addition as a separate event. They then explored its effects on market
value. Of course, such events may have different effects on different firms, some positive and
others negative as McGuire et al. (2012) report.

Researchers interested in corporate social performance or environmental performance


might turn to alternate data sources. Of course, providing data to the social investing industry is a
competitive industry itself. Chatterji, Durand, Levine, and Touboul (2014) compared the ratings
of six socially responsible investment advisers (KLD, Calvert, DJSI, Innovost, Asset4 (Swiss),
and FTSE4Good (UK))1 and found little convergence among them because of what each
organization rates, the companies rated, and the differences between their definitions of CSP.
However, we note, Waddock (2008) lists social investment advisors, social research firms,
NGOs, and several journals as potential sources. Herzig and Moon (2013) used assessments
provided by journals such as the Financial Times, Ethical Corporation, and Ethical
1
In 2016 KLD and Innovest were owned by MSCI, formerly Morgan Stanley Capital
International.
28
METHODS USED IN KLD RESEARCH

Performance. Brammer, Pavelin, and Porter (2006) used the EIRIS dataset (UK). But, again, we
note that demonstrating the influence of unchallenged common practice, these researchers, like
those using KLD aggregates, summed and differenced strengths and concerns without testing
whether their research strategy was statistically appropriate.

Again, as Entine (2003) noted, data directed to the socially responsible investment
industry is tainted by anachronistic, contradictory, and ideologically constructed notions of
corporate social responsibility and corporate social performance. Chatterji et al. (2014)
recommend using more than one source to substantiate any findings in the FP-CSP realm and
stressed the need to validate these data sources for research purposes.

Conclusion
Clearly insightful research with KLD will only be possible when questions, methods, and
data are appropriately matched. The match of questions, methods, and data is crucial if theory is
to advance. The evidence presented here suggests that the methods used to tap KLD have often
been flawed because they ignore the binary character of the observations, because they ignore
industry differences, and because they cannot deal effectively with the skewed structure of the
dataset which carries more concerns than strengths.

Because the KLD variables do not appear to be clustered or associated in consistent ways
across or even within SIC-defined industries, there appears to be no defensible case for seeking
an all-encompassing CSR index to encapsulate KLD’s content. The challenge of interpreting
such indices, even indices of separate strengths or concerns, on an industry-by-industry or
company-by-company basis is formidable.

29
METHODS USED IN KLD RESEARCH

References
Albuquerque, Rui, Koskinen, Y. and Zhang, C. (2018). Corporate Social Responsibility and Firm
Risk: Theory and Empirical Evidence, European Corporate Governance Institute Finance
Working Paper No. 359/2013, 1 – 48.

Bae, Kee-Hong, El Ghoul, S., Guedhami, O., Kwok, C.C.Y., and Zheng, Y. (2018). Does
Corporate Social Responsibility reduce the Costs of High Leverage? Evidence from Capital
Structure and Product Market Interactions, Journal of Banking and Finance 100, 135 – 150.
Barnett, M. L., & Salomon, R. M. (2012). Does it pay to be really good? Addressing the shape of
the relationship between social and financial performance. Strategic Management Journal, 33,
1304–1320.

Bartholomew, D. J., Steele, F., Moustaki, I., & Galbraith, J. I. (2008). Analysis of multivariate
social science data. Boca Raton, FL: Chapman Hall/CRC Press.

Becchetti, L., Ciciretti, R., Hasan, I., & Kobeissi, N. (2012). Corporate social responsibility
shareholder's value. Journal of Business Research, 65(11), 1628–1635.

Brammer, S. J., Pavelin, S., & Porter, L. A. (2006). Corporate social performance and
geographical diversification. Journal of Business Research, 59(9), 1025–1034.

Cameron, A. C., & Trivedi, P. K. (2009). Microeconometrics using Stata. College Station, TX:
Stata Press.

Carroll, R. J., Primo, D. M., & Richter, B. K. (2016). Using item response theory to improve
measurement in strategic management research: An application to corporate social
responsibility. Strategic Management Journal, 37(1), 65–85.

Chatterji, A. K., Levine, D. I., & Toffel, M. W. (2009). How well do social ratings actually
measure corporate social responsibility? Journal of Economics & Management Strategy, 181,
125–169.

Chatterji, A. K., Toffel, & M. W. (2010). How firms respond to being rated. Strategic
Management Journal, 31(9), 917–945.

Chatterji A. K., Durand, R., Levine, D., & Touboul, S. (2014). Do ratings of firms converge?
Implications for strategy research. Retrieved from
http://faculty.haas.berkeley.edu/levine/papers/DoRatingsofFirmsConverge_April1st2014.pdf

Chen, C. M., & Delmas, M. (2011). Measuring corporate social performance: An efficiency
perspective. Production and Operations Management, 20(6), 789–804.

Clarkson, P., Li, Y., & Richardson, G. (2004). The market valuation of environmental capital
expenditures by pulp and paper companies. The Accounting Review, 79(2), 329–353.

30
METHODS USED IN KLD RESEARCH

Clarkson, P., Li Y., Richardson, G., & Vasvari, F. (2011). Does it really pay to be green?
Determinants and consequences of proactive environmental strategies. Journal of Accounting
and Public Policy, 30, 122–144.

Entine, J. (2003). The myth of social investing. Organization & Environment, 16(3), 352–368.

Goss, Allen and Roberts, G. S. (2011). The Impact of Corporate Social Responsibility on the
Cost of Bank Loans, Journal of Banking and Finance 35, 1794 – 1810.
Groening, G., Kanuri, V. K. (2013). Investor reaction to positive and negative corporate social
events. Journal of Business Research, 66(10), 1852–1860.
Hart, S. L., & Ahuja, G. (1996). Does it pay to be green? An empirical examination of the
relationship between pollution prevention and firm performance. Business Strategy and the
Environment, 5, 30–37.

Hart, T. A., & Sharfman, M. (2012). Assessing the concurrent validity of the revised Kinder,
Lydenberg, and Domini corporate social performance indicators. Business & Society, 51, 1–
24.

Hatten, Kenneth J., Keeler, J. P., James, W. L. and Kim, K. (2018). Why Is the Financial
Performance – Environmental Performance Relationship Difficult to Measure?” Research
Journal of Business and Management, 212 – 221.

Herzig, C., & Moon, J. (2013). Discourses on corporate social irresponsibility in the financial
sector. Journal of Business Research, 66(10), 1870–1880.

Hughes, K. E. (2000). The value relevance of nonfinancial measures of air pollution in the electric
utility industry. The Accounting Review, 752, 209–228.

Jo, Hoje and Harjoto, M. A. (2012). The Causal Effect of Corporate Governance on Corporate
Social Responsibility, Journal of Business Ethics 106: 53 – 72.

King, A., & Lenox, M. (2002). Exploring the locus of profitable pollution reduction.
Management Science, 48(2), 389–299.

Koenker, R., & Hallock, K. F. (2001). Quantile regression. Journal of Economic Perspectives,
15(4), 143–156.

Lattin, J., Carroll, J. D., & Green, P. E. (2003). Analyzing Multivariate Data. Pacific Grove, CA:
Thomson Brooks/Cole.

Marquis, C., & Toffel, M. W. (2012). When Do Firms Greenwash? Corporate Visibility, Civil
Society, Scrutiny, and Environmental Disclosure. Working Paper 11–115, Harvard Business
School, Boston, MA.

31
METHODS USED IN KLD RESEARCH

Mattingly, J. E., (2015). Corporate social performance: A review of empirical research


examining the corporation-society relationship using Kinder, Lydenberg, Domini social
ratings data. Business & Society, 54, 1–44.

Mattingly, J. E., & Berman, S. (2006). Measurement of corporate social action. Business &
Society, 45(1), 20–46.

McGuire, J., Dow, S., & Ibrahim, B. (2012). All in the family? Social performance and corporate
governance in the family firm. Journal of Business Research, 65(11), 1643–1650.

MSCI ESG KLD STATS: 1991 – 2014 Data Sets, Methodology MSCI ESG Research Inc., June
2015. Wiso.uni-hamburg.de/bibliothek/recherché/datenbanken/unternehmensdaten/msci-
methodology-2014.pdf

Rowley, T., & Berman, S. (2000). A brand new brand of corporate social performance. Business
& Society, 39, 397–418.

Sharfman, M. (1996). The construct validity of the Kinder, Lydenberg & Domini social
performance ratings data. Journal of Business Ethics, 153, 287–296.

Tabachnick, B., & Fidell, L. (2007). Using Multivariate Statistics (5th ed.). Boston, MA:
Pearson.

Waddock, S. A. (2008). Building a new institutional infrastructure for corporate responsibility.


Academy of Management Perspectives, 22, 87–108.

Walls, J. L., Berrone, P., & Phan, P. H. (2012). Corporate governance and environmental
performance: Is there really a link? Strategic Management Journal, 33(8), 885–913.

Woolridge, J. M. (2013). Introductory econometrics: A modern approach. Mason, OH: Thomson


South-Western.

Zyglidopoulos, S. C., Georgiadis, A. P., Carroll, C. E., & Siegel, D. S. (2012). Does media
attention drive corporate social responsibility? (5th ed.). Journal of Business Research, 65(11),
1622–1627.

32

You might also like