Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Multiple Regression Analysis of Cost Behavior

Author(s): George J. Benston


Source: The Accounting Review, Vol. 41, No. 4 (Oct., 1966), pp. 657-672
Published by: American Accounting Association
Stable URL: http://www.jstor.org/stable/243582 .
Accessed: 17/06/2014 20:18

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Accounting Association is collaborating with JSTOR to digitize, preserve and extend access to The
Accounting Review.

http://www.jstor.org

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
Multiple Regression Analysis
of Cost Behavior

George J. Benston

ACCOUNTANTS probably have always ment is discussed in the first section of this
been concerned with measuring and paper. Multiple regression analysis is con-
reporting the relationship between sidered first in relation to other methods of
cost and output. The pre-eminence of cost analysis. Then its applicability to cost
financial accounting in this century re- decision problems is delineated. Second,
sulted in directing much of our attention the method of multiple regression is dis-
towards attaching costs to inventories. cussed in nonmathematical terms so that
However, the recent emphasis on decision its uses can be understood better. The
making is causing us to consider ways of third section represents the "heart" of the
measuring the variability of cost with out- paper. Here the technical requirements of
put and other decisions variables. In this multiple regression are outlined, and the
paper, the application, use, and limitations implications of these requirements for the
of multiple regression analysis, a valuable recording of cost data in the firm's ac-
tool for measuring costs, are discussed.' counting records are outlined. The func-
A valid objection to multiple regression tional form of the regression equation is
analysis in the past has been that its com- then considered. In the final section, we
putational difficulty often rendered it too discuss some applications for multiple re-
costly. Today, with high speed computers gression analysis.
and library programs, this objection is no
longer valid: most regression problems 1 The use of statistical analysis for auditing and con-
ought to cost less than $30 to run. Un- trol is outside the scope of this paper. Excellent discus-
fortunately, this new ease and low cost of sions of these uses of statistics may be found in Richard
N. Cyert and H. Justin Davidson, Statistical Sampling
using regression analysis may prove to be for Accounting Information (Prentice-Hall, 1962), and
its undoing. Analysts may be tempted to Herbert Arkin, Handbookof Sampling for Auditing and
Accounting, Volume I: Methods (McGraw-Hill, 1963).
use the technique without adequately
realizing its technical data requirements
and limitations. The "GI-GO" adage,
"garbage in, garbage out," always must be George J. Benston is A ssociate Professor
kept in mind. A major purpose of this of Accounting at the University of Rochester.
paper is to state these requirements and This manuscript was awarded first place in
limitations explicitly and to indicate how the American Accounting Association's 1966
they may be handled. Manuscript Contest, open to members ac-
The general problem of cost measure- quiring the doctoratein 1962 or later.
657

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
658 The Accounting Review, October 1966

THE GENERAL PROBLEM A variant of the fixed-variable method


In his attempts to determine the factors is one in which cost and output data for
that cause costs to be incurred and the many periods are plotted on a two-
magnitudes of their effects, the accountant dimensional graph. A line is then fitted to
is faced with a formidable task. Engineers, the data, the slope being taken as variable
foremen, and others who are familiar with cost per unit of output. When the least-
the production process being studied squares method of fitting the line is used,
usually can provide a list of cost-causing the procedure is called simple linear re-
factors, such as the number of different gression. Until the recent advent of com-
units produced, the lot sizes in which units puters, simple regression was considered to
were made, and so forth. Other factors be quite sophisticated.2 While it was rec-
that affect costs, such as the season of the ognized that its use neglects the effects on
year, may be important, though they are cost of factors other than output, it was de-
more subtle than production factors. The fended on the then reasonable grounds that
accountant must separate and measure multiple regression with more than two or
the effects of many different causal factors three variables is too difficult computation-
whose importance may vary in different ally to be considered economically feasible.
periods. Multiple Regression
CommonlyUsed Methodsof Cost Analysis Multiple regression can allow the ac-
Perhaps the most pervasive method of countant to estimate the amount by which
analyzing cost variability is separation of the various cost-causing factors affect
costs into two or three categories: variable, costs. A very rough description is that it
fixed and sometimes semivariable. But this measures the cost of a change in one vari-
method does not provide a solution to the able, say output, while holding the effects
problem of measuring the costs caused by on cost of other variables, say the season
each of many factors operating simulta- of the year or the size of batches, constant.
neously. In this "direct costing" type of pro- For example, consider the problem of ana-
cedure, output is considered to be the sole lyzing the costs incurred by the shipping
cause of costs. Another objection to this department of a department store. The
method is that there is no way to deter- manager of the department believes that
mine whether the accountant's subjective his costs are primarily a function of the
separation of costs into variable and fixed number of orders processed. However,
is reasonably accurate. Dividing output heavier packages are more costly to handle
during a period into variable cost during than are lighter ones. He also considers the
that period yields a single number (unit weather an important factor; rain or ex-
variable cost) whose accuracy cannot be treme cold slows down delivery time. We
assessed. If the procedure is repeated for might want to eliminate the effect of the
several periods, it is likely that different weather, since it is not controllable. But
unit variable costs will be computed. But we would like to know how much each
the accountant cannot determine whether order costs to process and what the cost of
the average of these numbers (or some heavier against lighter packages is. If we
other summary statistic) is a useful num- can make these estimates, we can (1) pre-
ber. Another important short-coming of pare a flexible budget for the shipping
this method is the assumption of linearity department that takes account of changes
between cost and output. While linearity in operating conditions, (2) make better
may be found, it should not be assumed 2National Association of Accountants, Separating
automatically. and Using Costs as Fixed and Variable, June, 1960.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
Benston: Multiple Regression Analysis 659

pricing decisions, and (3) plan for capital cost of output may be affected by such con-
budgeting more effectively. A properly ditions as whether production is increasing
specified multiple regression equation can or decreasing, the lot sizes are large or
provide the required estimates. small, the plant is new or old, the White
A criticism of multiple regression analy- Sox are losing or winning, and so forth.
sis is that it is complicated, and so would Since there is some change in the environ-
be difficult to "sell" to lower management ment of different time periods or in the
and supervisory personnel. However, the circumstances affecting different decisions,
method allows for a more complete speci- it would seem that the accountant must
fication of "reality" than do simple re- make an individual cost analysis for every
gression or the fixed-variable dichotomy. decision considered.
Studies have shown that supervisors tend However, the maximization rule of
to disregard data that they believe are economics also applies to information
"unrealistic such as those based on the technology: the marginal cost of the in-
simplification that costs incurred are a formation must not exceed the marginal
function of units of output only.3 There- revenue gained from it. The marginal
fore, multiple regression analysis should revenue from cost information is the addi-
prove more acceptable to supervisors than tional revenue that accrues or the losses
procedures that require gross simplifica- that are avoided from not making mis-
tion of reality. takes, such as accepting contracts where
The regression technique also can allow the marginal costs exceed the marginal
the accountant to make probability state- revenue from the work, or rejecting con-
ments concerning the reliability of the tracts where the reverse situation obtains.
estimates made.4 For example, he may The marginal cost of information is the
find that the marginal cost of processing a cost of gathering and presenting the in-
package of average weight is $.756, when formation, plus the opportunity cost of
the effects on cost of different weather con- delay, since measurement and presentation
ditions and other factors are accounted are not instantaneous.5 Since these costs
for. If the properties underlying regression can be expected to exceed the marginal
analysis (discussed below) are met, the re- revenue from information for many de-
liability of this cost estimate may be de- cisions, it usually is not economical to
termined from the standard error of the estimate different costs for each different
coefficient (say $.032) from which the ac- decision. Thus, it is desirable to group de-
countant may assess a probability of .95 cision problems into categories that can be
that the marginal cost per package is be- served by the same basic cost information.
tween $.692 and $.820 (.756?.064). Two such categories are proposed here: (1)
Multiple regression analysis, then, is a recurring problems and (2) one-time
very powerful tool; however, it is not ap- problems.
plicable to all cost situations. To decide
the situations for which it is best used, let 3 H. A. Simon, H. Guetzkow, G. Kozmetsky, and G.
Tyndall, Centralization versus Decentralization in
us first consider the problem of cost esti- Organizing the ControllersDepartment (New York: The
mation in general and then consider the Controllership Foundation, 1954).
4 This and the following statements are made in the
sub-class of problems for which multiple context of a Bayesian analysis, in which the decision
regression analysis is useful. maker combines sample information with his prior
judgment concerning unknown parameters. In the
examples given, a jointly diffuse prior distribution is
Types of Cost Decision Problems assumed for all parameters.
6 These two costs are related since delay can be re-
In general, cost is a function of many duced by expending more resources on the information
variables, including time. For example, the system.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
660 The Accounting Review, October 1966

Recurring decision problems are those for cost data are therefore not applicable. Or
which the data required for analysis are the decision may involve a substantial
used with some regularity. Examples are commitment of resources, making the
determining the prices that will be pub- marginal revenue from avoiding wrong
lished in a catalogue, preparation of output decisions quite high.
schedules for expected production, the set-
ting of budgets and production cost MULTIPLE REGRESSION ANALYSIS
standards, and the formulation of fore- Regression analysis is particularly use-
casts. These decisions require cost data in ful in estimating costs for recurring de-
the form of schedules of expected costs due cisions.6 The procedure essentially con-
to various levels of activity over an ex- sists of estimating mathematically the
pected range. average relationship between costs (the
One-time problems are those which oc- "dependent" variable) and the factors
cur infrequently, unpredictably, or are of that cause cost incurrences (the "inde-
such a magnitude as to require individual pendent" variables). The analysis pro-
cost estimates. Examples of these prob- vides the accountant with an estimate of
lems are cost-profit-volume decisions, such the expected marginal cost of a unit
as whether the firm should take a one-time change in output, for example, with the
special order, make, buy, or lease equip- effects on total cost of other factors ac-
ment, develop a new product, or close a counted for. These are the data he re-
plant. These decisions require that cost quires for costing recurring decisions.
estimates be made which reflect conditions The usefulness of multiple regression
especially relevant to the problem at hand. analysis for recurring decisions of costs can
These categories present different re- be appreciated best when the essential
quirements for cost estimation. Recurring nature of the technique is understood. It is
problems require a schedule of expected not necessary that the mathematical
costs and activity. Since these problems proofs of least squares or the methods of
are repetitive, the marginal cost of gather- inverting matrices be learned since library
ing and presenting data each time usually computer programs do all the work.7 How-
is expected to be greater than the mar- ever, it is necessary that the assumptions
ginal revenue from the data. Thus, while underlying use of multiple regression be
the marginal cost of additional production, fully understood so that this valuable tool
for example, will differ depending on such is not misused.
factors as whether overtime is required or Multiple regression analysis presup-
excess capacity is available, in general it is poses a linear relationship between the con-
more profitable to estimate the amount tributive factors and costs.8 The functional
that the marginal cost of the additional relationship between these factors, xi,
production may be, on the average, rather , Ix, and cost, C, is assumed in
X2)...
than to take account of every special multiple regression analysis to be of the
factor that may exist in individual cir- following form:
cumstances.
In contrast, one-time problems are
6 Indeed, its use requires the assumption that the
characterized by the economic desirability past costs used for a regressions analysis are a sample
of making individual cost estimates. We from a universe of possible costs generated by a con-
do not rely on average marginal costs be- tinuing, stationary, normal process.
7 The mathematics of multiple regression is described
cause the more accurate information is in many statistics and econometrics texts.
8 A curvilinear or exponential relationship also can
worth its cost. This situation may occur be expressed as a linear relationship. This technique is
when the problem is unique, and average discussed below.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
Benston: Multiple Regression Analysis 661

(1) C8 = 0oXo,9 + 31X1,t + 12X2,t + * of increasing the batches by 3 units, given


+ InXn,t + Pt,
fixed values of the number of units and the
relative proportions of de luxe units pro-
where duced, is estimated to be -$60 ($-20
i3 is a constant term (x0= 1 for all ob- times 3).
servations and time periods), It is tempting to interpret the constant
the O's are fixed coefficients that express term, bo, as fixed cost. But this is not cor-
the marginal contribution of each xi rect unless the linear relationship found in
to C, and the range of observations obtains back to
y is the sum of unspecified factors, the zero output.9 This can be seen best in the
disturbances, that are assumed to be following two-dimensional graph of cost on
randomly distributed with a zero output. The line was fitted with the equa-
mean and constant variance, and tion C=bo+bixi, where the dots are the
t= 1, 2, * , m= time periods. observed values of cost and output. The
slope of the line is the coefficient, bi, an
The 3 coefficients are estimated from a estimate of the marginal change in total
sample of C's and x's from time periods 1 cost (C) with a unit change (z) in output
through m. For example, assume that the (xl). The intercept on the C axis is bo, the
cost recorded in a week is a function of constant term. It would be an estimate of
such specified factors as xi= units of out- fixed cost if the range of observations in-
put, x2= number of units in a batch, and cluded the point where output were zero,
x3=the ratio of the number of "de luxe" and the relationship between total cost and
units to total units produced. Then the output were linear. However, if more ob-
right hand side of equation (2) is an esti- servations of cost and output (the x's) were
mate of the right hand side of equation (1), available, it might be that the dashed
obtained from a sample of weekly observa- curve would be fitted and bowould be zero.
tions, where the b's are estimates of the O3's Thus the value of the constant term, bo, is
and u is the residual, the estimate of /u, the not the costs that would be expected if
disturbance term: there were no output; it is only the value
(2) Ct = bo0,+ b1x1,t + b2X2,9+ b3X3,t+ Ut. that is calculated as a result of the regres-
sion line computed from the available data.
If the values estimated for coefficients of The data for the calculations are taken
the three independent variables, x1, x2, and from the accounting and production re-
X3, are b=100, bi=30, b2=-20, and cords of past time periods. The coefficients
b3= 500, the expected cost (C) for any estimated from these data are averages of
given week (I) is estimated by: past experience. Therefore, the b's calcu-
C = 100 + 30xi - 20X2 + 50OX3. lated are best suited for recurring cost
decisions. The fact that the b's are aver-
Given estimates of the O3's,one has, in ages of past data must be emphasized, be-
effect, estimates of the marginal cost as- cause their use for decisions is based on the
sociated with each of the determining assumption that the future will be like an
factors. In the example given above, the average of past experience.
marginal cost of producing an additional The mathematical method usually used
unit of output, xi, is estimated to be $30, for estimating the O's is the least-squares
with the effects or costs of the size of technique. It has the properties of provid-
batch (x2) and the ratio of the number of
de luxe to total units (X3) accounted for. 9 Fixed cost is defined here as avoidable cost related
Or, f2, the marginal reduction in total cost to time periods and not to output variables.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
662 The Accounting Review, October 1966

COST 0
(C) 0

OUTPUT
*X

0~~~~~~~~~~~~~~~~~~

-4-/ +

OUTPUTXi

ing best, linear, unbiased estimates of the cost function is used, the coefficient (b,) of
fl's. These properties are desirable because output (xi) is the estimated marginal cost
they tend "to yield a series of estimates of output. With an estimate of the stan-
whose average would coincide with the dard error of the coefficient, sb1,we can say
true value being estimated and whose that the true marginal cost, ,31, is within
variance about that true value is smaller the range bi?Sb1, with a given probabil-
than that of any other unbiased estima- ity."
tors."'0 While these properties are not al-
ways of paramount importance, they are REQUIREMENTSOF MULTIPLE
very valuable for making estimates of the REGRESSIONAND COST
expected average costs required for re- RECORDINGIMPLICATIONS
curring problems. Although multiple regression is an ex-
Another important advantage of the cellent tool for estimating recurring costs,
least-squares technique is that when it is it does have several requirements that
combined with the assumptions about the make its use hazardous without careful
disturbance term (gt) that are discussed in planning.'2 Most of the data requirements
Section III-7 below, the reliability of the of multiple regressions analysis depend on
relations between the explanatory vari- the way cost-accounting records are main-
ables and costs can be determined. Two tained. If the data are simply taken from
types of reliability estimates may be com- the ordinary cost-accounting records of the
puted. One, the standard error of estimate, 10 J. Johnston, Statistical Cost Analysis (New York:
shows how well the equation fits the data. McGraw-Hill, 1960), p. 31.
The second, the standard error of the re- 11The interpretation of the confidence interval is
admittedly Bayesian.
gression coefficients, assesses the probabil- 12 Proofs of the requirements described may be found

ity that the O's estimated are within a in many econometrics textbooks, such as Arthur S.
Goldberger, Econometric Theory (Wiley, 1964), and J.
range of values. For example, if a linear Johnston, EconometricMethods (McGraw-Hill, 1963).

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
Benston: Multiple Regression Analysis 663

company, it is unlikely that the output of duction within the period. Otherwise, the
the regression model will be meaningful. variations that occur during the period
Therefore, careful planning of the extent to will be averaged out, possibly obscuring
which the initial accounting data are coded the true relationship between cost and out-
and recorded is necessary before regression put.
analysis can be used successfully. This
section of the paper is organized into four 2. Number of Time Periods (Observations)
groupings that include several numbered For a time series, each observation
subsections in which the principal techni- covers a time period in which data on costs
cal requirements are described, after which and output and other explanatory vari-
the implications for the cost system are ables are collected for analysis. As a mini-
discussed. In the first group, (1) the length mum, there must be one more observation
and (2) number of time periods, (3) the than there are independent variables to
range of observations, and (4) the specifi- make regression analysis possible. (The
cation of cost-related factors are described, excess number is called "degrees of free-
following which their implications for cost dom.") Of course, many more observations
recording are outlined. In the second must be available before one could have
group, (5) errors of measurement and their any confidence that the relationship esti-
cost recording implication are considered. mated from the sample reflects the "true"
The third group deals with (6) correlations underlying relationship. The standard er-
among the explanatory variables and the rors, from which one may determine the
important contribution that accounting range within which the true coefficients lie
analysis can make to this problem. Finally, (given some probability of error), are re-
(7) the requirements for the distribution of duced by the square root of the number of
the nonspecified factors (disturbances) are observations.
given. The implications of these require-
3. Range of Observations
ments for the functional form of the vari-
ables are taken up in Section V. The observations on cost and output
should cover as wide a range as possible. If
1. Length of Time Periods there is very little variation from period to
(a) The time periods (1, 2, 3, * * *, m) period in cost and output, the functional
chosen should be long enough to allow the relationship between the two cannot be
bookkeeping procedures to pair output estimated effectively by regression analy-
produced in a period with the cost in- s1s.
curred because of that production. For
4. Specification of Cost-Related Factors
example,+ if 500 units are produced in a
day, but records of supplies used are kept All factors that affect cost should be
on a weekly basis, an analysis of the cost of specified and included in the analysis.'3
supplies used cannot be made with shorter This is a very important requirement that
than weekly periods. Lags in recording is often difficult to meet. For example, ob-
costs must be corrected or adjusted. Thus, servations may have been taken over a
production should not be recorded as period when input prices changed. The
occurring in one week while indirect labor true relationship between cost and output
is recorded a week later when the pay may be obscured if high output coincided
checks are written. 13 Complete specification is not mandatory if require-

(b) The time periods chosen should be ment 7 (below) is met. However, requirement 7 is not
likely to be fulfilled if the specification is seriously in-
short enough to avoid variations in pro- complete.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
664 The Accounting Review, October 1966

with high input due to price-level effects. each period. Increases in production may
If the higher costs related to higher price be met by overtime. However, decreases
levels are not accounted for (by inclusion may be accompanied by idle time or slower
of a price index as an independent vari- operations. Thus, we would expect the
able) or adjusted for (by stating the de- additional costs of increases to be greater
pendent variable, cost, in constant dol- than the cost savings from decreases."
lars), the marginal cost of additional out- Other commonly found factors that
put estimated will be meaningful only if affect costs are changes in technology,
changes in input prices are proportional to changes in capacity, periods of adjustment
changes in output and are expected to re- to new processes or types of output, and
main so. seasonal differences. The effect of these
factors may be accounted for by including
Implications for Cost Recording variables in the regression equation, by
of 1, 2, 3, and 4 specific adjustment of the data, or by
In general, the time period requirements excluding data that are thought to be
(la, lb and 2) call for the recording of pro- "contaminated."
duction data for periods no longer than one The wide range of observations needed
month and preferably as short as one week for effective analysis also argues against
in length. If longer periods are chosen, it is observation periods of longer than one
unlikely that there will be a sufficient num- month. With long periods, variations in
ber of observations available for analysis production would more likely be averaged
because, as a bare minimum, one more out than if shorter periods were used
period than the number of explanatory (which violates requirement lb). In addi-
variables is needed. Even if it is believed tion, if stability of conditions limits the
that only one explanatory variable (such number of explanatory variables other
as units of output) is needed to specify the than output that otherwise would reduce
cost function in any one period, require- the degrees of freedom, this same stability
ment 4 (that all cost related factors be probably would not produce a sufficient
specified) demands consideration of differ- range of output to make regression anal-
ences among time periods. Thus, such ysis worthwhile. Thus, weekly or monthly
events as changes in factor prices and pro- data usually are required for multiple re-
duction methods, whether production is gression.
increasing or decreasing, and the seasons
of the year might have to be specified as 5. Errors of Measurement
explanatory variables.
It is difficult to believe that data from a
The necessity of identifying all relevant
"real life" production situation will be re-
explanatory variables such as those just
ported without error. The nature of the
mentioned, can be met by having a journal
errors is important since some kinds will
kept in which the values or the behavior of
affect the usefulness of regression analysis
these variables in specific time periods is
more than others will. Errors in the de-
noted. If such a record is not kept, it will
pendent variable, cost, are not fatal since
be difficult (if not impossible) to recall un-
usual events and to identify them with the
14 A dummy variable can be used to represent qualita-
relevant time periods, especially when tive variables, such as P= 1 when production increased
short time periods are used. For example, and P = 0 when production decreased. From the coeffi-
it is necessary to note whether production cient of P, we can estimate the cost effect of differences
in the direction of output change and also reduce con-
increased or decreased substantially in tamination of the coefficient estimated for output.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
Benston: Multiple Regression Analysis 665

they affect the disturbance term, M." The error is difficult to correct. Usually, all that
predictive value of the equation is less- one can do is eliminate the bonus payment
ened, but the estimate of marginal cost (p3,) from the data of the period in which it is
is not affected. paid and realize that the estimated coeffi-
But where there are errors in measuring cient of output will be biased downward.
output or the other independent variable Average marginal costs, then, will be un-
(x's), the disturbance term, ,u, will be cor- derstated.
related with the independent variables.'6 A somewhat similar situation follows
If this condition exists, the sample coeffi- from the high cost of the careful record
cient estimated by the least-squares proce- keeping required to charge such input
dure will be an underestimate of the true factors as production supplies to short
marginal cost. Thus, it is very important time periods. In this event, these items of
that the independent variables be mea- cost should be deducted from the other
sured accurately. cost items and not included in the analysis.
The possibility of measurement errors is If these amounts are large enough, specific
intensified by the number of observations analysis may be required, or the decision
requirement. Short reporting periods in- not to account for them carefully may be
crease the necessity for careful classifica- re-evaluated.
tion. For example, if a cost caused by pro- This separation of specific cost items
duction in week 1 is not recorded until also is desirable where the accountant
week 2, the dependent variable (cost) of knows that their allocation to time periods
both observations will be measured in- bears no relation to production. For ex-
correctly. This error is most serious when ample, such costs as insurance or rent may
production fluctuates between observa- be allocated to departments on a monthly
tions. However, when production is in- basis. There is no point in including these
creasing or decreasing steadily, the mea- costs in the dependent variable because it
surement error tends to be constant (either is known that they do not vary with the
in absolute or proportional terms) and independent variables. At best, their in-
hence will affect only the constant term. clusion will only increase the constant
The regression coefficients estimated, and term. However, if by chance they are cor-
hence the estimates of average marginal related with an independent variable, they
cost, will not be affected.'7 will bias the estimates made (requirement
Another important type of measure- 7a). This type of error may be built into
ment error is the failure to charge the pe- the accounting system if fixed costs are
riod in which production occurs with future allocated to time periods on the basis of
costs caused by that production. For ex- production. For example, depreciation
ample, overtime pay for production 15 Let y stand for the measurement errors in C:
workers may be paid for in the week fol- C+-Y=0O+olxl+I+
lowing their work. This can be adjusted for C=flo+Iixi+,A- y.

easily. However, the foreman may not be 16In this event, where ,6stands for the measurement
error in xi:
paid for his overtime directly. Rather,
C=fl0+fl1(X+V)+As
many months after his work he might get a
year-end bonus or a raise in pay. These The new disturbance term ,ll +Au is not independent of
costs cannot easily be associated with the xi because of the covariance between these variables.
17 If the error is
proportionally constant (i.e., 10 per
production that caused them but will be cent of production), transformation of the variables
charged in another period, thus making (such as to logarithms) is necessary.
18 Actually, the present value of the future payment
both periods' costs incorrect."8This type of should be included as a current period cost.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
666 The Accounting Review, October 1966

may be charged on a per unit basis. The centers where a single output is likely to
variance of this cost, then, may be a be produced. This allows a set of multiple
function of the accounting method and regressions to be computed, one for each
not of the underlying economic relation- cost center. The procedure (which may be
ships."9 followed anyway for inventory costing)
also reduces the number of explanatory
6. Correlations among the explanatory (in- variables that need be specified in any one
dependent) variables regression.21 Care must be taken to assure
When the explanatory variables are that the allocation of costs to cost centers
highly correlated with one another, it is is not arbitrary or unrelated to output.
very difficult, and often impossible, to For example, allocation of electricity or
estimate the separate relationships of each rent on a square footage basis can serve no
to the dependent variable. This condition useful purpose. However, allocation of the
is called multicollinearity, and it is a salary of the foremen on a time basis is
severe problem for cost studies. When we necessary when they spend varying
compute marginal costs, we usually want amounts of time per period supervising
to estimate the marginal cost of each of the different cost centers.
different types of output produced in a A further complication arises if several
multiproduct firm. However, this is not al- different types of outputs are produced
ways possible. For example, consider a within the cost centers. For example, the
manufacturer who makes refrigerators, assembly department may work on differ-
freezers, washing machines, and other ent models of television sets at the same
major home appliances. If the demand for time. In most instances, it is neither fea-
all home appliances is highly correlated, sible nor desirable to allocate the cost
the number of refrigerators, freezers and center's costs to each type of output. Cost,
washing machines produced will move to- then, should be regressed on several output
gether, all being high in one week and low variables, one for the quantity of each
in another. In this situation it will be im- type of output. If these independent vari-
possible to disentangle the marginal cost of ables are multicollinear, the standard
producing refrigerators from the marginal errors of their regression coefficients will be
cost of producing freezers and washing so large relative to the coefficients as to
machines by means of multiple regres- make the estimates useless. In this event,
sion.20 an index of output may be constructed, in
Problems similar to that of our man- which the different types of output are
ufacturer can be alleviated by disaggrega- weighted by a factor (such as labor hours)
tion of total cost into several sub-groups that serves to describe their relationship to
that are independent of each other. Pre- cost. Cost then may be regressed on this
analysis and preliminary allocations of weighted index. The regression coefficient
cost and output data may accomplish this computed expresses the average relation-
disaggregation. This is one of the most im-
portant contributions the accountant can 19 Depreciation is assumed to be time, not user, de-
make to regression analysis. preciation.
20 However, the computed regression can provide
If the total costs of the entire plant are useful predictions of total costs if the past relationships
regressed on outputs of different types, it of production among the different outputs are main-
is likely that the computed coefficients will tained.
21 The author used this procedure with considerable
have very large standard errors and, hence, success in estimating the marginal costs of banking
will not be reliable. This situation may be operations. See "Economies of Scale and Marginal Costs
in Banking Operations," National Banking Review,
avoided by first allocating costs to cost 1965, pp. 507-549.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
Benston: Multiple Regression Analysis 667

ship between the "bundle" of outputs and (3) E = bo+ biM + b2S1+ b3S2+ b4S3
cost and cannot be decomposed to give the
where
relationship between one output element
and cost. However, since the outputs were E= electricity cost
collinear in the past, it is likely that they M= total machine hours in the plant
will be collinear in the future, so that S= seasonal dummy variables
knowledge about the cost of the "bundle" where
of outputs may be sufficient.
A valid objection to the allocation of S1= 1 for summer, 0 for other seasons
costs to cost centers is that one can never S2= 1 for spring, 0 for other seasons
be sure that the allocations are accurate. S3= 1 for winter, 0 for other seasons
Nevertheless, some allocations must be bo, bi, b2, b3, and b4 are the computed con-
made for multicollinearity to be overcome. stants and coefficients.
Therefore, the statistical method cannot If the regression is fully specified, with all
be free from the accountant's subjective factors that cause the use of electricity in-
judgment; in fact, it depends on it. cluded (such as the season of the year), the
A limitation of analysis of costs by cost regression coefficient of M, bi, is the esti-
centers also is that cost externalities mate of the average marginal cost of
among cost centers may be ignored. For electricity per machine hour. This cost can
example, the directly chargeable costs of be added to the other costs (such as mate-
the milling department may be a function rials and labor) to estimate the marginal
of the level of operations of other depart- cost of specific outputs.
ments. The existence and magnitude of For some activities, physical units, such
operations outside of a particular cost cen- as labor hours, can be used as the depen-
ter may be estimated by including an dent variable instead of costs. This proce-
appropriate independent variable in the dure is desirable where most of the ac-
cost center regression. An over-all index of tivity's costs are a function of such physi-
production, such as total direct labor cal units and where factor prices are ex-
hours on total sales is one such variable. pected to vary. Thus, in a shipping de-
Or, if a cost element is allocated between partment, it may be best to regress hours
two cost centers, the output of one cost worked on pounds shipped, percentage of
center may be included as an independent units shipped by truck, the average num-
variable in the other cost center's regres- ber of pounds per sale, and other explana-
sions. The existence and effect of these tory variables. Then, with the coefficients
possible inter-cost center elements may be estimated, the number of labor hours can
determined from the standard error of the be estimated for various situations. These
coefficient and sign of this variable. hours then can be costed at the current
Some types of costs that vary with ac- labor rate.
tivity cannot be associated with specific
cost centers because it is difficult to make 7. Distribution of the Non-Specified Factors
meaningful allocations or because of book- (Disturbances)
keeping problems (as discussed above). In (a) Serial correlation of the disturbances.
this event, individual regression analyses A very important requirement of least
of these costs probably will prove valuable.
22 Machine hours may not be recorded by cost center
For example, electricity may be difficult to although direct labor hours are. If machine hours (M)
allocate to cost centers although it varies are believed to be proportional to direct labor hours
with machine hours.22A regression can be (L), so that Mj=kjLi, where k is a constant multiplier
that may vary among cost centers, i, kiL, is a perfect
computed such as the following: substitute for Mi.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
668 The AccountingReview, October1966

squares that affects the coefficients and the is that the variance of the disturbance
estimates made about their reliability is term is constant; it should not be a func-
that the disturbances not be serially cor- tion of the level of the dependent or inde-
related. For a time series (in which the pendent variables.23 If the variance of the
observations are taken at successive pe- disturbance is nonconstant, the standard
riods of time), this means that the distur- errors of the coefficients estimated are not
bances that arose in a period t are inde- correct, and the reliability of the coeffi-
pendent from the disturbances that arose cients cannot be determined.
in previous periods, t-1, t-2, etc. The con- When the relationship estimated is be-
sequences of serial correlation of the dis- tween only one independent variable (out-
turbances are that (1) the standard errors put) and the dependent variable (cost),
of the regression coefficients (b's) will be the presence of non-constant variance of
seriously underestimated, (2) the sampling the disturbances can be detected by plot-
variances of the coefficients will be very ting the independent against the dependent
large, and (3) predictions of cost made variable. However, where more than one
from the regression equation will be more independent variable is required, such ob-
variable than is ordinarily expected from servations cannot be easily made. In this
least-squares estimators. Hence, the tests event, the accountant must attempt to
measuring the probability that the true estimate the nature of the variance from
marginal costs and total costs are within a other information and then transform the
range around the estimates computed from data to a form in which constant variance
the regression are not valid. is achieved. At the least, he should decide
(b) Independence from explanatory vari- whether the disturbances are likely to bear
ables. The disturbances which reflect the a proportional relationship to the other
factors affecting cost that cannot be spec- variables (as is commonly the situation
ified must be uncorrelated with the ex- with economic data). If they do, it may be
planatory (independent) variables. (xi, desirable to transform the variables to
x2, . . .I, n). If the unspecified factors are logarithms. The efficacy of the transforma-
correlated with the explanatory variables, tions may be tested by plotting the inde-
the coefficients will be biased and inconsis- pendent variables against the residuals
tent estimates of the true values. Such (the estimates of the disturbances).
correlation often is the result of bookkeep- (d) Normal distribution of the distur-
ing procedures. For example, repairs to bances. For the traditional statistical tests
equipment in a machine shop is a cost- of the regression coefficients and equations
causing activity that often is not specified to be strictly valid, the disturbances
because of quantification difficulties. How- should be normally distributed. Tests of
ever, these repairs may be made when normality can be made by plotting the re-
output is low because the machines can be siduals on normal probability paper, an
taken out of service at these times. Thus, option available in many library regression
repair costs will be negatively correlated programs. While requirement 7 does not
with output. If these costs are not sepa- have implications for the accounting sys-
rated from other costs, the estimated co- tem, it does determine the form in which
efficient of output will be biased down- the variables are specified. These consider-
ward, so that the true extent of variable- ations are discussed in the following sec-
ness of cost with output will be masked. tion.
(c) Variance of the disturbances. A basic 23 Constant variance is known as homoscedasticity.
assumption underlying use of least squares Non-constant variance is called heteroscedasticity.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
Benston: Multiple Regression Analysis 669

FUNCTIONALFORM OF THE noted by bars over the letters). Thus, the


REGRESSIONEQUATION estimated marginal cost of P is a function
Thus far we have been concerned with of the levels of the other variables.
correct specification of the regression The logarithmic form of the variables
equation rather than with its functional also allows for estimates of nonlinear re-
form. However, the form of the variables lationships between cost and the explana-
must fit the underlying data well and be of tory variables. The form of the relation-
such a nature that the residuals are dis- ships may be approximated by graphing
tributed according to requirement 7 above. the dependent variable against the inde-
The form chosen first should follow the pendent variable. (The most important
underlying relationship that is thought to independent variable should be chosen
exist. Consider, for example, an analysis of where there is more than one, although in
the costs (C) of a shipping department. this event the simple two-dimensional
Costs may be a function of pounds shipped plotting can only be suggestive.) If the
(P), percentage of pounds shipped by plot indicates that a non-linear rather than
truck (T), and the average number of a linear form will fit the data best, the
effect of using logarithms may be deter-
pounds per sale (A). If the accountant be-
mined by plotting the data on semi-log and
lieves that the change in cost due to a
change of each explanatory variable is un- log-log ruled paper.
affected by the levels of the other explana- If the data seem curvilinear even in
tory variables, a linear form could be used, logarithms, or if an additive rather than a
multiplicative form describes the underly-
as follows:
ing relationships best, polynomial forms of
(4) C= a + bP + cT + dA. the variables may be used. Thus, for an
additive relationship between cost (C) and
In this form, the estimated marginal cost quantity of output (Q), the form fitted
of a unit change in pounds shipped (P) is may be C=a+bQ+cQ2+cQ3. If a multi-
aCl/)P or b. plicative relationship is assumed, the form
However, if the marginal cost of each may be log C=log a+log Q+(log Q)2
explanatory variable is thought to be a Either form describes a large family of
function of the levels of the other explana- curves with two bends.
tory variables, the following form would be When choosing the form of the vari-
better: ables, attention must always be paid to the
C = apbTcAd. effect of the form on the residuals, the
(5)
estimates of the disturbances. Unless the
In this case, a linear form could be variance of the residuals is constant, not
achieved by converting the variable to subject to serial correlation, and approxi-
logarithms: mately normally distributed (requirement
5), inferences about the reliability of the
(6) logC=loga + blogP + clog T coefficients estimated cannot be made.
+ d log A. Graphing is a valuable method for deter-
mining whether or not these requirements
Now, an approximation to the expected are met. (The graphs mentioned usually
marginal cost of a unit change in pounds can be produced by the computers.) Three
shipped (P) is OC/OP= baP-lTeAd, graphs are suggested. First, the residuals
where the other explanatory variables are should be plotted in time sequence. They
held constant at some average values (de- should appear to be randomly distributed,

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
670 The Accounting Review, October 1966

with no cycles or trends.24 Second, the re- and another product, digits, are produced.
siduals can be plotted against the pre- The widgets are assembled in batches
dicted value of the dependent variable. while the larger digits are assembled
There should be as many positive or nega- singly. Weekly observations on cost and
tive residuals scattered evenly about a output are taken and punched on cards. A
zero line, with the variance of the residuals graph is prepared, from which it appears
about the same at any value of the pre- that a linear relationship is present. Fur-
dicted dependent variable. Finally, the ther, the cost of producing widgets is not
residuals should be plotted on normal prob- believed to be a function of the production
ability paper to test for normality. of digits or other explanatory variables.
If the graphs show that the residuals do Therefore, the following regression is com-
not meet the requirements of least squares, puted:
the data must be transformed. If serial
(7) C = 110.3 + 8.21N - 7.83B + 12.32D
correlation of the residuals is a problem,
transformation of the variables may help. A (40.8) (.53) (1.69) (2.10)
commonly used method is to compute first + 235S + 523WV- 136A
differences, in which the observation from
(100) (204) (154)
period i, t-1, t-2, t-3, etc., are re-
placed with t-(t-1), (t-1)-(t-2), where
(t-2)-(t-3), and so forth. With first C= expected cost
difference data, one is regressing the N=number of widgets
change in cost on the change in output, B=average number of widgets in a
etc., a procedure which in many instances batch
may be descriptively superior to other D = number of digits
methods of stating the data. However, the S= summer dummy variable, where
residuals from first difference data also S = 1 for summer, 0 for other
must be subjected to serial correlation seasons
tests, since taking first differences often W= winter dummy variable, where
results in negative serial correlations.25 W= 1 for winter, 0 for other seasons
Where non-constant variance of the re- A = autumn dummy variable, where
siduals is a problem, the residuals may in- A = 1 for autumn, 0 for other
crease proportionally to the predicted seasons
dependent variables. In this event trans- R2= .892 (the coefficient of multiple
formation of the dependent variable to determination)
logarithms will be effective in achieving Standard error of estimate= 420.83,
constant variance. If the residuals increase which is 5% of the dependent variable,
more than proportionately, the square root cost.
of the dependent variable may be a better Number of observations= 156.
transformation.
24 A more formal test for serial correlation is provided

AN ILLUSTRATION by the Durbin-Watson statistic, which is built into


many library regression computer programs. (J. Durbin,
Assume that a firm manufactures a and G. J. Watson, "Testing for Serial Correlation in
Least-Squares Regression," Parts I and II, Biometrica,
widget and several other products, in 1950 and 1951.)
25 If there are random measurement errors in the
which the services of several departments data, observations from period t -1 might be increased
are used. Analysis of the costs of the as- by a positive error. Then t-(1-1) will be lower and
sembly department will provide us with an (t-1)- (t-2) will be higher than if the error were not
present. Consequently, t-(t-1) and (t-1) - 2) will
illustration. In this department, widgets be negatively serially correlated.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
Benston: Multiple Regression Analysis 671

The numbers in parentheses beneath the These calculations also reflect the differ-
coefficients are the standard errors of the ence between the production reported for
coefficients. These results may be used for a given week and the means of the produc-
such purposes as price and output deci- tion data from which the regression was
sions, analysis of efficiency, and capital computed. The greater the difference be-
budgeting. tween given output and the mean output,
For price and output decisions, we the less confidence we have in the predic-
would want to estimate the average mar- tion of the regression equation. For this
ginal cost expected if an additional widget example, the adjusted standard error of
is produced. From the regression we see estimate for the values of the independent
that the estimated average marginal cost, variables given is 592.61. Thus, we assess
aC/dN is 8.21, with the other factors a probability of .67 that the actual costs
affecting costs accounted for. The standard incurred will be between 2918.53 and
error of the coefficient, .53, allows us to 9103.75 (8511.14+592.61) and probability
assess a probability of .67 that the "true" .95 that they will be between 9696.36 and
marginal cost is between 7.68 and 8.74 7325.92 (8511.14+2.592.61). With these
(8.21+.53) and .95 that it is between 7.15 figures, management can decide how un-
and 9.27 (8.21 + 1.06).26 usual the actual production costs are in
The regression also can be used for the light of past experience.
flexible budgeting and analysis of perfor- The regression results may be useful for
mance. For example, assume that the fol- capital budgeting, if the company is con-
lowing production is reported for a given sidering replacing the present widget as-
week: sembly procedure with a new machine.
While the cash flow expected from using
W= 532
the new machine must be estimated from
B=20
engineering analyses, they are compared
D=321
with the cash flows that would otherwise
S= summer= 1
take place if the present machines were
Then we expect that, if this week is like kept. These future expected flows may be
an average of the experience for past estimated by "plugging" the expected out-
weeks, total costs would be: put into the regression equation and cal-
culating the expected costs. XWhilethese
100.3+8.21(532) - 7.83(20)
estimates may be statistically unreliable
+12.32(321)+235.3(1) = 8511.14.
for data beyond the range of those used to
The actual costs incurred can be compared calculate the regression, the estimates may
to this expected amount. Of course, we do still be the best that can be obtained.
not expect the actual amount to equal the
predicted amount, if only because we could CONCLUSION
not specify all of the cost-causing variables The assertion has been made throughout
in the regression equation. However, we this paper that regression analysis is not
can calculate the probability that the ac- only a valuable tool but a method made
tual cost is within some range around the available, inexpensive and easy to use by
expected cost. This range can be com- computers. The reader may be inclined to
puted from the standard error of estimate accept all but the last point, having read
and a rather complicated set of relation-
" The statements about probability are based on a
ships that reflect uncertainty about the Bayesian approach, with normality and diffuse prior
height and tilt of the regression plane. distributions assumed.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions
672 The Accounting Review, October 1966

through the list of technical and book- limited as it may be. Nevertheless, it is
keeping problems. Actually it is the ease of necessary to remember that it is a tool, not
computation that the library computer a cure-all. The method must not be used in
programs afford which makes it necessary cost situations where there is not an on-
to stress precautions and care: it is all too going stationary relationship between cost
easy to "crank out" numbers that seem and the variables upon which cost depend.
useful but actually render the whole pro- Where the desired conditions prevail,
gram, if not deceptive, worthless. multiple regression can provide valuable
But when one considers that costs often information for solving necessary decision
are caused by many different factors whose problems, information that can put "life"
effects are not obvious, one recognizes the into the economic models that accountants
great possibilities of regression analysis, are now embracing.

This content downloaded from 62.122.73.86 on Tue, 17 Jun 2014 20:18:12 PM


All use subject to JSTOR Terms and Conditions

You might also like