Professional Documents
Culture Documents
1966 Benston
1966 Benston
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Accounting Association is collaborating with JSTOR to digitize, preserve and extend access to The
Accounting Review.
http://www.jstor.org
George J. Benston
ACCOUNTANTS probably have always ment is discussed in the first section of this
been concerned with measuring and paper. Multiple regression analysis is con-
reporting the relationship between sidered first in relation to other methods of
cost and output. The pre-eminence of cost analysis. Then its applicability to cost
financial accounting in this century re- decision problems is delineated. Second,
sulted in directing much of our attention the method of multiple regression is dis-
towards attaching costs to inventories. cussed in nonmathematical terms so that
However, the recent emphasis on decision its uses can be understood better. The
making is causing us to consider ways of third section represents the "heart" of the
measuring the variability of cost with out- paper. Here the technical requirements of
put and other decisions variables. In this multiple regression are outlined, and the
paper, the application, use, and limitations implications of these requirements for the
of multiple regression analysis, a valuable recording of cost data in the firm's ac-
tool for measuring costs, are discussed.' counting records are outlined. The func-
A valid objection to multiple regression tional form of the regression equation is
analysis in the past has been that its com- then considered. In the final section, we
putational difficulty often rendered it too discuss some applications for multiple re-
costly. Today, with high speed computers gression analysis.
and library programs, this objection is no
longer valid: most regression problems 1 The use of statistical analysis for auditing and con-
ought to cost less than $30 to run. Un- trol is outside the scope of this paper. Excellent discus-
fortunately, this new ease and low cost of sions of these uses of statistics may be found in Richard
N. Cyert and H. Justin Davidson, Statistical Sampling
using regression analysis may prove to be for Accounting Information (Prentice-Hall, 1962), and
its undoing. Analysts may be tempted to Herbert Arkin, Handbookof Sampling for Auditing and
Accounting, Volume I: Methods (McGraw-Hill, 1963).
use the technique without adequately
realizing its technical data requirements
and limitations. The "GI-GO" adage,
"garbage in, garbage out," always must be George J. Benston is A ssociate Professor
kept in mind. A major purpose of this of Accounting at the University of Rochester.
paper is to state these requirements and This manuscript was awarded first place in
limitations explicitly and to indicate how the American Accounting Association's 1966
they may be handled. Manuscript Contest, open to members ac-
The general problem of cost measure- quiring the doctoratein 1962 or later.
657
pricing decisions, and (3) plan for capital cost of output may be affected by such con-
budgeting more effectively. A properly ditions as whether production is increasing
specified multiple regression equation can or decreasing, the lot sizes are large or
provide the required estimates. small, the plant is new or old, the White
A criticism of multiple regression analy- Sox are losing or winning, and so forth.
sis is that it is complicated, and so would Since there is some change in the environ-
be difficult to "sell" to lower management ment of different time periods or in the
and supervisory personnel. However, the circumstances affecting different decisions,
method allows for a more complete speci- it would seem that the accountant must
fication of "reality" than do simple re- make an individual cost analysis for every
gression or the fixed-variable dichotomy. decision considered.
Studies have shown that supervisors tend However, the maximization rule of
to disregard data that they believe are economics also applies to information
"unrealistic such as those based on the technology: the marginal cost of the in-
simplification that costs incurred are a formation must not exceed the marginal
function of units of output only.3 There- revenue gained from it. The marginal
fore, multiple regression analysis should revenue from cost information is the addi-
prove more acceptable to supervisors than tional revenue that accrues or the losses
procedures that require gross simplifica- that are avoided from not making mis-
tion of reality. takes, such as accepting contracts where
The regression technique also can allow the marginal costs exceed the marginal
the accountant to make probability state- revenue from the work, or rejecting con-
ments concerning the reliability of the tracts where the reverse situation obtains.
estimates made.4 For example, he may The marginal cost of information is the
find that the marginal cost of processing a cost of gathering and presenting the in-
package of average weight is $.756, when formation, plus the opportunity cost of
the effects on cost of different weather con- delay, since measurement and presentation
ditions and other factors are accounted are not instantaneous.5 Since these costs
for. If the properties underlying regression can be expected to exceed the marginal
analysis (discussed below) are met, the re- revenue from information for many de-
liability of this cost estimate may be de- cisions, it usually is not economical to
termined from the standard error of the estimate different costs for each different
coefficient (say $.032) from which the ac- decision. Thus, it is desirable to group de-
countant may assess a probability of .95 cision problems into categories that can be
that the marginal cost per package is be- served by the same basic cost information.
tween $.692 and $.820 (.756?.064). Two such categories are proposed here: (1)
Multiple regression analysis, then, is a recurring problems and (2) one-time
very powerful tool; however, it is not ap- problems.
plicable to all cost situations. To decide
the situations for which it is best used, let 3 H. A. Simon, H. Guetzkow, G. Kozmetsky, and G.
Tyndall, Centralization versus Decentralization in
us first consider the problem of cost esti- Organizing the ControllersDepartment (New York: The
mation in general and then consider the Controllership Foundation, 1954).
4 This and the following statements are made in the
sub-class of problems for which multiple context of a Bayesian analysis, in which the decision
regression analysis is useful. maker combines sample information with his prior
judgment concerning unknown parameters. In the
examples given, a jointly diffuse prior distribution is
Types of Cost Decision Problems assumed for all parameters.
6 These two costs are related since delay can be re-
In general, cost is a function of many duced by expending more resources on the information
variables, including time. For example, the system.
Recurring decision problems are those for cost data are therefore not applicable. Or
which the data required for analysis are the decision may involve a substantial
used with some regularity. Examples are commitment of resources, making the
determining the prices that will be pub- marginal revenue from avoiding wrong
lished in a catalogue, preparation of output decisions quite high.
schedules for expected production, the set-
ting of budgets and production cost MULTIPLE REGRESSION ANALYSIS
standards, and the formulation of fore- Regression analysis is particularly use-
casts. These decisions require cost data in ful in estimating costs for recurring de-
the form of schedules of expected costs due cisions.6 The procedure essentially con-
to various levels of activity over an ex- sists of estimating mathematically the
pected range. average relationship between costs (the
One-time problems are those which oc- "dependent" variable) and the factors
cur infrequently, unpredictably, or are of that cause cost incurrences (the "inde-
such a magnitude as to require individual pendent" variables). The analysis pro-
cost estimates. Examples of these prob- vides the accountant with an estimate of
lems are cost-profit-volume decisions, such the expected marginal cost of a unit
as whether the firm should take a one-time change in output, for example, with the
special order, make, buy, or lease equip- effects on total cost of other factors ac-
ment, develop a new product, or close a counted for. These are the data he re-
plant. These decisions require that cost quires for costing recurring decisions.
estimates be made which reflect conditions The usefulness of multiple regression
especially relevant to the problem at hand. analysis for recurring decisions of costs can
These categories present different re- be appreciated best when the essential
quirements for cost estimation. Recurring nature of the technique is understood. It is
problems require a schedule of expected not necessary that the mathematical
costs and activity. Since these problems proofs of least squares or the methods of
are repetitive, the marginal cost of gather- inverting matrices be learned since library
ing and presenting data each time usually computer programs do all the work.7 How-
is expected to be greater than the mar- ever, it is necessary that the assumptions
ginal revenue from the data. Thus, while underlying use of multiple regression be
the marginal cost of additional production, fully understood so that this valuable tool
for example, will differ depending on such is not misused.
factors as whether overtime is required or Multiple regression analysis presup-
excess capacity is available, in general it is poses a linear relationship between the con-
more profitable to estimate the amount tributive factors and costs.8 The functional
that the marginal cost of the additional relationship between these factors, xi,
production may be, on the average, rather , Ix, and cost, C, is assumed in
X2)...
than to take account of every special multiple regression analysis to be of the
factor that may exist in individual cir- following form:
cumstances.
In contrast, one-time problems are
6 Indeed, its use requires the assumption that the
characterized by the economic desirability past costs used for a regressions analysis are a sample
of making individual cost estimates. We from a universe of possible costs generated by a con-
do not rely on average marginal costs be- tinuing, stationary, normal process.
7 The mathematics of multiple regression is described
cause the more accurate information is in many statistics and econometrics texts.
8 A curvilinear or exponential relationship also can
worth its cost. This situation may occur be expressed as a linear relationship. This technique is
when the problem is unique, and average discussed below.
COST 0
(C) 0
OUTPUT
*X
0~~~~~~~~~~~~~~~~~~
-4-/ +
OUTPUTXi
ing best, linear, unbiased estimates of the cost function is used, the coefficient (b,) of
fl's. These properties are desirable because output (xi) is the estimated marginal cost
they tend "to yield a series of estimates of output. With an estimate of the stan-
whose average would coincide with the dard error of the coefficient, sb1,we can say
true value being estimated and whose that the true marginal cost, ,31, is within
variance about that true value is smaller the range bi?Sb1, with a given probabil-
than that of any other unbiased estima- ity."
tors."'0 While these properties are not al-
ways of paramount importance, they are REQUIREMENTSOF MULTIPLE
very valuable for making estimates of the REGRESSIONAND COST
expected average costs required for re- RECORDINGIMPLICATIONS
curring problems. Although multiple regression is an ex-
Another important advantage of the cellent tool for estimating recurring costs,
least-squares technique is that when it is it does have several requirements that
combined with the assumptions about the make its use hazardous without careful
disturbance term (gt) that are discussed in planning.'2 Most of the data requirements
Section III-7 below, the reliability of the of multiple regressions analysis depend on
relations between the explanatory vari- the way cost-accounting records are main-
ables and costs can be determined. Two tained. If the data are simply taken from
types of reliability estimates may be com- the ordinary cost-accounting records of the
puted. One, the standard error of estimate, 10 J. Johnston, Statistical Cost Analysis (New York:
shows how well the equation fits the data. McGraw-Hill, 1960), p. 31.
The second, the standard error of the re- 11The interpretation of the confidence interval is
admittedly Bayesian.
gression coefficients, assesses the probabil- 12 Proofs of the requirements described may be found
ity that the O's estimated are within a in many econometrics textbooks, such as Arthur S.
Goldberger, Econometric Theory (Wiley, 1964), and J.
range of values. For example, if a linear Johnston, EconometricMethods (McGraw-Hill, 1963).
company, it is unlikely that the output of duction within the period. Otherwise, the
the regression model will be meaningful. variations that occur during the period
Therefore, careful planning of the extent to will be averaged out, possibly obscuring
which the initial accounting data are coded the true relationship between cost and out-
and recorded is necessary before regression put.
analysis can be used successfully. This
section of the paper is organized into four 2. Number of Time Periods (Observations)
groupings that include several numbered For a time series, each observation
subsections in which the principal techni- covers a time period in which data on costs
cal requirements are described, after which and output and other explanatory vari-
the implications for the cost system are ables are collected for analysis. As a mini-
discussed. In the first group, (1) the length mum, there must be one more observation
and (2) number of time periods, (3) the than there are independent variables to
range of observations, and (4) the specifi- make regression analysis possible. (The
cation of cost-related factors are described, excess number is called "degrees of free-
following which their implications for cost dom.") Of course, many more observations
recording are outlined. In the second must be available before one could have
group, (5) errors of measurement and their any confidence that the relationship esti-
cost recording implication are considered. mated from the sample reflects the "true"
The third group deals with (6) correlations underlying relationship. The standard er-
among the explanatory variables and the rors, from which one may determine the
important contribution that accounting range within which the true coefficients lie
analysis can make to this problem. Finally, (given some probability of error), are re-
(7) the requirements for the distribution of duced by the square root of the number of
the nonspecified factors (disturbances) are observations.
given. The implications of these require-
3. Range of Observations
ments for the functional form of the vari-
ables are taken up in Section V. The observations on cost and output
should cover as wide a range as possible. If
1. Length of Time Periods there is very little variation from period to
(a) The time periods (1, 2, 3, * * *, m) period in cost and output, the functional
chosen should be long enough to allow the relationship between the two cannot be
bookkeeping procedures to pair output estimated effectively by regression analy-
produced in a period with the cost in- s1s.
curred because of that production. For
4. Specification of Cost-Related Factors
example,+ if 500 units are produced in a
day, but records of supplies used are kept All factors that affect cost should be
on a weekly basis, an analysis of the cost of specified and included in the analysis.'3
supplies used cannot be made with shorter This is a very important requirement that
than weekly periods. Lags in recording is often difficult to meet. For example, ob-
costs must be corrected or adjusted. Thus, servations may have been taken over a
production should not be recorded as period when input prices changed. The
occurring in one week while indirect labor true relationship between cost and output
is recorded a week later when the pay may be obscured if high output coincided
checks are written. 13 Complete specification is not mandatory if require-
(b) The time periods chosen should be ment 7 (below) is met. However, requirement 7 is not
likely to be fulfilled if the specification is seriously in-
short enough to avoid variations in pro- complete.
with high input due to price-level effects. each period. Increases in production may
If the higher costs related to higher price be met by overtime. However, decreases
levels are not accounted for (by inclusion may be accompanied by idle time or slower
of a price index as an independent vari- operations. Thus, we would expect the
able) or adjusted for (by stating the de- additional costs of increases to be greater
pendent variable, cost, in constant dol- than the cost savings from decreases."
lars), the marginal cost of additional out- Other commonly found factors that
put estimated will be meaningful only if affect costs are changes in technology,
changes in input prices are proportional to changes in capacity, periods of adjustment
changes in output and are expected to re- to new processes or types of output, and
main so. seasonal differences. The effect of these
factors may be accounted for by including
Implications for Cost Recording variables in the regression equation, by
of 1, 2, 3, and 4 specific adjustment of the data, or by
In general, the time period requirements excluding data that are thought to be
(la, lb and 2) call for the recording of pro- "contaminated."
duction data for periods no longer than one The wide range of observations needed
month and preferably as short as one week for effective analysis also argues against
in length. If longer periods are chosen, it is observation periods of longer than one
unlikely that there will be a sufficient num- month. With long periods, variations in
ber of observations available for analysis production would more likely be averaged
because, as a bare minimum, one more out than if shorter periods were used
period than the number of explanatory (which violates requirement lb). In addi-
variables is needed. Even if it is believed tion, if stability of conditions limits the
that only one explanatory variable (such number of explanatory variables other
as units of output) is needed to specify the than output that otherwise would reduce
cost function in any one period, require- the degrees of freedom, this same stability
ment 4 (that all cost related factors be probably would not produce a sufficient
specified) demands consideration of differ- range of output to make regression anal-
ences among time periods. Thus, such ysis worthwhile. Thus, weekly or monthly
events as changes in factor prices and pro- data usually are required for multiple re-
duction methods, whether production is gression.
increasing or decreasing, and the seasons
of the year might have to be specified as 5. Errors of Measurement
explanatory variables.
It is difficult to believe that data from a
The necessity of identifying all relevant
"real life" production situation will be re-
explanatory variables such as those just
ported without error. The nature of the
mentioned, can be met by having a journal
errors is important since some kinds will
kept in which the values or the behavior of
affect the usefulness of regression analysis
these variables in specific time periods is
more than others will. Errors in the de-
noted. If such a record is not kept, it will
pendent variable, cost, are not fatal since
be difficult (if not impossible) to recall un-
usual events and to identify them with the
14 A dummy variable can be used to represent qualita-
relevant time periods, especially when tive variables, such as P= 1 when production increased
short time periods are used. For example, and P = 0 when production decreased. From the coeffi-
it is necessary to note whether production cient of P, we can estimate the cost effect of differences
in the direction of output change and also reduce con-
increased or decreased substantially in tamination of the coefficient estimated for output.
they affect the disturbance term, M." The error is difficult to correct. Usually, all that
predictive value of the equation is less- one can do is eliminate the bonus payment
ened, but the estimate of marginal cost (p3,) from the data of the period in which it is
is not affected. paid and realize that the estimated coeffi-
But where there are errors in measuring cient of output will be biased downward.
output or the other independent variable Average marginal costs, then, will be un-
(x's), the disturbance term, ,u, will be cor- derstated.
related with the independent variables.'6 A somewhat similar situation follows
If this condition exists, the sample coeffi- from the high cost of the careful record
cient estimated by the least-squares proce- keeping required to charge such input
dure will be an underestimate of the true factors as production supplies to short
marginal cost. Thus, it is very important time periods. In this event, these items of
that the independent variables be mea- cost should be deducted from the other
sured accurately. cost items and not included in the analysis.
The possibility of measurement errors is If these amounts are large enough, specific
intensified by the number of observations analysis may be required, or the decision
requirement. Short reporting periods in- not to account for them carefully may be
crease the necessity for careful classifica- re-evaluated.
tion. For example, if a cost caused by pro- This separation of specific cost items
duction in week 1 is not recorded until also is desirable where the accountant
week 2, the dependent variable (cost) of knows that their allocation to time periods
both observations will be measured in- bears no relation to production. For ex-
correctly. This error is most serious when ample, such costs as insurance or rent may
production fluctuates between observa- be allocated to departments on a monthly
tions. However, when production is in- basis. There is no point in including these
creasing or decreasing steadily, the mea- costs in the dependent variable because it
surement error tends to be constant (either is known that they do not vary with the
in absolute or proportional terms) and independent variables. At best, their in-
hence will affect only the constant term. clusion will only increase the constant
The regression coefficients estimated, and term. However, if by chance they are cor-
hence the estimates of average marginal related with an independent variable, they
cost, will not be affected.'7 will bias the estimates made (requirement
Another important type of measure- 7a). This type of error may be built into
ment error is the failure to charge the pe- the accounting system if fixed costs are
riod in which production occurs with future allocated to time periods on the basis of
costs caused by that production. For ex- production. For example, depreciation
ample, overtime pay for production 15 Let y stand for the measurement errors in C:
workers may be paid for in the week fol- C+-Y=0O+olxl+I+
lowing their work. This can be adjusted for C=flo+Iixi+,A- y.
easily. However, the foreman may not be 16In this event, where ,6stands for the measurement
error in xi:
paid for his overtime directly. Rather,
C=fl0+fl1(X+V)+As
many months after his work he might get a
year-end bonus or a raise in pay. These The new disturbance term ,ll +Au is not independent of
costs cannot easily be associated with the xi because of the covariance between these variables.
17 If the error is
proportionally constant (i.e., 10 per
production that caused them but will be cent of production), transformation of the variables
charged in another period, thus making (such as to logarithms) is necessary.
18 Actually, the present value of the future payment
both periods' costs incorrect."8This type of should be included as a current period cost.
may be charged on a per unit basis. The centers where a single output is likely to
variance of this cost, then, may be a be produced. This allows a set of multiple
function of the accounting method and regressions to be computed, one for each
not of the underlying economic relation- cost center. The procedure (which may be
ships."9 followed anyway for inventory costing)
also reduces the number of explanatory
6. Correlations among the explanatory (in- variables that need be specified in any one
dependent) variables regression.21 Care must be taken to assure
When the explanatory variables are that the allocation of costs to cost centers
highly correlated with one another, it is is not arbitrary or unrelated to output.
very difficult, and often impossible, to For example, allocation of electricity or
estimate the separate relationships of each rent on a square footage basis can serve no
to the dependent variable. This condition useful purpose. However, allocation of the
is called multicollinearity, and it is a salary of the foremen on a time basis is
severe problem for cost studies. When we necessary when they spend varying
compute marginal costs, we usually want amounts of time per period supervising
to estimate the marginal cost of each of the different cost centers.
different types of output produced in a A further complication arises if several
multiproduct firm. However, this is not al- different types of outputs are produced
ways possible. For example, consider a within the cost centers. For example, the
manufacturer who makes refrigerators, assembly department may work on differ-
freezers, washing machines, and other ent models of television sets at the same
major home appliances. If the demand for time. In most instances, it is neither fea-
all home appliances is highly correlated, sible nor desirable to allocate the cost
the number of refrigerators, freezers and center's costs to each type of output. Cost,
washing machines produced will move to- then, should be regressed on several output
gether, all being high in one week and low variables, one for the quantity of each
in another. In this situation it will be im- type of output. If these independent vari-
possible to disentangle the marginal cost of ables are multicollinear, the standard
producing refrigerators from the marginal errors of their regression coefficients will be
cost of producing freezers and washing so large relative to the coefficients as to
machines by means of multiple regres- make the estimates useless. In this event,
sion.20 an index of output may be constructed, in
Problems similar to that of our man- which the different types of output are
ufacturer can be alleviated by disaggrega- weighted by a factor (such as labor hours)
tion of total cost into several sub-groups that serves to describe their relationship to
that are independent of each other. Pre- cost. Cost then may be regressed on this
analysis and preliminary allocations of weighted index. The regression coefficient
cost and output data may accomplish this computed expresses the average relation-
disaggregation. This is one of the most im-
portant contributions the accountant can 19 Depreciation is assumed to be time, not user, de-
make to regression analysis. preciation.
20 However, the computed regression can provide
If the total costs of the entire plant are useful predictions of total costs if the past relationships
regressed on outputs of different types, it of production among the different outputs are main-
is likely that the computed coefficients will tained.
21 The author used this procedure with considerable
have very large standard errors and, hence, success in estimating the marginal costs of banking
will not be reliable. This situation may be operations. See "Economies of Scale and Marginal Costs
in Banking Operations," National Banking Review,
avoided by first allocating costs to cost 1965, pp. 507-549.
ship between the "bundle" of outputs and (3) E = bo+ biM + b2S1+ b3S2+ b4S3
cost and cannot be decomposed to give the
where
relationship between one output element
and cost. However, since the outputs were E= electricity cost
collinear in the past, it is likely that they M= total machine hours in the plant
will be collinear in the future, so that S= seasonal dummy variables
knowledge about the cost of the "bundle" where
of outputs may be sufficient.
A valid objection to the allocation of S1= 1 for summer, 0 for other seasons
costs to cost centers is that one can never S2= 1 for spring, 0 for other seasons
be sure that the allocations are accurate. S3= 1 for winter, 0 for other seasons
Nevertheless, some allocations must be bo, bi, b2, b3, and b4 are the computed con-
made for multicollinearity to be overcome. stants and coefficients.
Therefore, the statistical method cannot If the regression is fully specified, with all
be free from the accountant's subjective factors that cause the use of electricity in-
judgment; in fact, it depends on it. cluded (such as the season of the year), the
A limitation of analysis of costs by cost regression coefficient of M, bi, is the esti-
centers also is that cost externalities mate of the average marginal cost of
among cost centers may be ignored. For electricity per machine hour. This cost can
example, the directly chargeable costs of be added to the other costs (such as mate-
the milling department may be a function rials and labor) to estimate the marginal
of the level of operations of other depart- cost of specific outputs.
ments. The existence and magnitude of For some activities, physical units, such
operations outside of a particular cost cen- as labor hours, can be used as the depen-
ter may be estimated by including an dent variable instead of costs. This proce-
appropriate independent variable in the dure is desirable where most of the ac-
cost center regression. An over-all index of tivity's costs are a function of such physi-
production, such as total direct labor cal units and where factor prices are ex-
hours on total sales is one such variable. pected to vary. Thus, in a shipping de-
Or, if a cost element is allocated between partment, it may be best to regress hours
two cost centers, the output of one cost worked on pounds shipped, percentage of
center may be included as an independent units shipped by truck, the average num-
variable in the other cost center's regres- ber of pounds per sale, and other explana-
sions. The existence and effect of these tory variables. Then, with the coefficients
possible inter-cost center elements may be estimated, the number of labor hours can
determined from the standard error of the be estimated for various situations. These
coefficient and sign of this variable. hours then can be costed at the current
Some types of costs that vary with ac- labor rate.
tivity cannot be associated with specific
cost centers because it is difficult to make 7. Distribution of the Non-Specified Factors
meaningful allocations or because of book- (Disturbances)
keeping problems (as discussed above). In (a) Serial correlation of the disturbances.
this event, individual regression analyses A very important requirement of least
of these costs probably will prove valuable.
22 Machine hours may not be recorded by cost center
For example, electricity may be difficult to although direct labor hours are. If machine hours (M)
allocate to cost centers although it varies are believed to be proportional to direct labor hours
with machine hours.22A regression can be (L), so that Mj=kjLi, where k is a constant multiplier
that may vary among cost centers, i, kiL, is a perfect
computed such as the following: substitute for Mi.
squares that affects the coefficients and the is that the variance of the disturbance
estimates made about their reliability is term is constant; it should not be a func-
that the disturbances not be serially cor- tion of the level of the dependent or inde-
related. For a time series (in which the pendent variables.23 If the variance of the
observations are taken at successive pe- disturbance is nonconstant, the standard
riods of time), this means that the distur- errors of the coefficients estimated are not
bances that arose in a period t are inde- correct, and the reliability of the coeffi-
pendent from the disturbances that arose cients cannot be determined.
in previous periods, t-1, t-2, etc. The con- When the relationship estimated is be-
sequences of serial correlation of the dis- tween only one independent variable (out-
turbances are that (1) the standard errors put) and the dependent variable (cost),
of the regression coefficients (b's) will be the presence of non-constant variance of
seriously underestimated, (2) the sampling the disturbances can be detected by plot-
variances of the coefficients will be very ting the independent against the dependent
large, and (3) predictions of cost made variable. However, where more than one
from the regression equation will be more independent variable is required, such ob-
variable than is ordinarily expected from servations cannot be easily made. In this
least-squares estimators. Hence, the tests event, the accountant must attempt to
measuring the probability that the true estimate the nature of the variance from
marginal costs and total costs are within a other information and then transform the
range around the estimates computed from data to a form in which constant variance
the regression are not valid. is achieved. At the least, he should decide
(b) Independence from explanatory vari- whether the disturbances are likely to bear
ables. The disturbances which reflect the a proportional relationship to the other
factors affecting cost that cannot be spec- variables (as is commonly the situation
ified must be uncorrelated with the ex- with economic data). If they do, it may be
planatory (independent) variables. (xi, desirable to transform the variables to
x2, . . .I, n). If the unspecified factors are logarithms. The efficacy of the transforma-
correlated with the explanatory variables, tions may be tested by plotting the inde-
the coefficients will be biased and inconsis- pendent variables against the residuals
tent estimates of the true values. Such (the estimates of the disturbances).
correlation often is the result of bookkeep- (d) Normal distribution of the distur-
ing procedures. For example, repairs to bances. For the traditional statistical tests
equipment in a machine shop is a cost- of the regression coefficients and equations
causing activity that often is not specified to be strictly valid, the disturbances
because of quantification difficulties. How- should be normally distributed. Tests of
ever, these repairs may be made when normality can be made by plotting the re-
output is low because the machines can be siduals on normal probability paper, an
taken out of service at these times. Thus, option available in many library regression
repair costs will be negatively correlated programs. While requirement 7 does not
with output. If these costs are not sepa- have implications for the accounting sys-
rated from other costs, the estimated co- tem, it does determine the form in which
efficient of output will be biased down- the variables are specified. These consider-
ward, so that the true extent of variable- ations are discussed in the following sec-
ness of cost with output will be masked. tion.
(c) Variance of the disturbances. A basic 23 Constant variance is known as homoscedasticity.
assumption underlying use of least squares Non-constant variance is called heteroscedasticity.
with no cycles or trends.24 Second, the re- and another product, digits, are produced.
siduals can be plotted against the pre- The widgets are assembled in batches
dicted value of the dependent variable. while the larger digits are assembled
There should be as many positive or nega- singly. Weekly observations on cost and
tive residuals scattered evenly about a output are taken and punched on cards. A
zero line, with the variance of the residuals graph is prepared, from which it appears
about the same at any value of the pre- that a linear relationship is present. Fur-
dicted dependent variable. Finally, the ther, the cost of producing widgets is not
residuals should be plotted on normal prob- believed to be a function of the production
ability paper to test for normality. of digits or other explanatory variables.
If the graphs show that the residuals do Therefore, the following regression is com-
not meet the requirements of least squares, puted:
the data must be transformed. If serial
(7) C = 110.3 + 8.21N - 7.83B + 12.32D
correlation of the residuals is a problem,
transformation of the variables may help. A (40.8) (.53) (1.69) (2.10)
commonly used method is to compute first + 235S + 523WV- 136A
differences, in which the observation from
(100) (204) (154)
period i, t-1, t-2, t-3, etc., are re-
placed with t-(t-1), (t-1)-(t-2), where
(t-2)-(t-3), and so forth. With first C= expected cost
difference data, one is regressing the N=number of widgets
change in cost on the change in output, B=average number of widgets in a
etc., a procedure which in many instances batch
may be descriptively superior to other D = number of digits
methods of stating the data. However, the S= summer dummy variable, where
residuals from first difference data also S = 1 for summer, 0 for other
must be subjected to serial correlation seasons
tests, since taking first differences often W= winter dummy variable, where
results in negative serial correlations.25 W= 1 for winter, 0 for other seasons
Where non-constant variance of the re- A = autumn dummy variable, where
siduals is a problem, the residuals may in- A = 1 for autumn, 0 for other
crease proportionally to the predicted seasons
dependent variables. In this event trans- R2= .892 (the coefficient of multiple
formation of the dependent variable to determination)
logarithms will be effective in achieving Standard error of estimate= 420.83,
constant variance. If the residuals increase which is 5% of the dependent variable,
more than proportionately, the square root cost.
of the dependent variable may be a better Number of observations= 156.
transformation.
24 A more formal test for serial correlation is provided
The numbers in parentheses beneath the These calculations also reflect the differ-
coefficients are the standard errors of the ence between the production reported for
coefficients. These results may be used for a given week and the means of the produc-
such purposes as price and output deci- tion data from which the regression was
sions, analysis of efficiency, and capital computed. The greater the difference be-
budgeting. tween given output and the mean output,
For price and output decisions, we the less confidence we have in the predic-
would want to estimate the average mar- tion of the regression equation. For this
ginal cost expected if an additional widget example, the adjusted standard error of
is produced. From the regression we see estimate for the values of the independent
that the estimated average marginal cost, variables given is 592.61. Thus, we assess
aC/dN is 8.21, with the other factors a probability of .67 that the actual costs
affecting costs accounted for. The standard incurred will be between 2918.53 and
error of the coefficient, .53, allows us to 9103.75 (8511.14+592.61) and probability
assess a probability of .67 that the "true" .95 that they will be between 9696.36 and
marginal cost is between 7.68 and 8.74 7325.92 (8511.14+2.592.61). With these
(8.21+.53) and .95 that it is between 7.15 figures, management can decide how un-
and 9.27 (8.21 + 1.06).26 usual the actual production costs are in
The regression also can be used for the light of past experience.
flexible budgeting and analysis of perfor- The regression results may be useful for
mance. For example, assume that the fol- capital budgeting, if the company is con-
lowing production is reported for a given sidering replacing the present widget as-
week: sembly procedure with a new machine.
While the cash flow expected from using
W= 532
the new machine must be estimated from
B=20
engineering analyses, they are compared
D=321
with the cash flows that would otherwise
S= summer= 1
take place if the present machines were
Then we expect that, if this week is like kept. These future expected flows may be
an average of the experience for past estimated by "plugging" the expected out-
weeks, total costs would be: put into the regression equation and cal-
culating the expected costs. XWhilethese
100.3+8.21(532) - 7.83(20)
estimates may be statistically unreliable
+12.32(321)+235.3(1) = 8511.14.
for data beyond the range of those used to
The actual costs incurred can be compared calculate the regression, the estimates may
to this expected amount. Of course, we do still be the best that can be obtained.
not expect the actual amount to equal the
predicted amount, if only because we could CONCLUSION
not specify all of the cost-causing variables The assertion has been made throughout
in the regression equation. However, we this paper that regression analysis is not
can calculate the probability that the ac- only a valuable tool but a method made
tual cost is within some range around the available, inexpensive and easy to use by
expected cost. This range can be com- computers. The reader may be inclined to
puted from the standard error of estimate accept all but the last point, having read
and a rather complicated set of relation-
" The statements about probability are based on a
ships that reflect uncertainty about the Bayesian approach, with normality and diffuse prior
height and tilt of the regression plane. distributions assumed.
through the list of technical and book- limited as it may be. Nevertheless, it is
keeping problems. Actually it is the ease of necessary to remember that it is a tool, not
computation that the library computer a cure-all. The method must not be used in
programs afford which makes it necessary cost situations where there is not an on-
to stress precautions and care: it is all too going stationary relationship between cost
easy to "crank out" numbers that seem and the variables upon which cost depend.
useful but actually render the whole pro- Where the desired conditions prevail,
gram, if not deceptive, worthless. multiple regression can provide valuable
But when one considers that costs often information for solving necessary decision
are caused by many different factors whose problems, information that can put "life"
effects are not obvious, one recognizes the into the economic models that accountants
great possibilities of regression analysis, are now embracing.