Ingenta Id 1v95fu1c0c4id Circus&format

The Statistician (1997)
46, No. 4, pp. 551559
Modelling performance at international tennis and golf

tournaments: is there a home advantage?
By ROGER L. HOLDER{
University of Birmingham, UK
and ALAN M. NEVILL

Liverpool John Moores University, UK
[Received January 1997. Revised August 1997]
SUMMARY
In the context of international sports events, home advantage suggests that competitors will perform above their
expected level when competing in events held in their own country. The world rankings and tournament performances
have been compiled for most competitors in the grand slam tennis tournaments and the major golf tournaments in
1993. Tournament performance was related to world ranking and home advantage by using both permutation and
normal regression models. Logistic and polytomous regression models were used to examine progression through the
tennis tournaments only. Although there was surprising consistency between the various approaches, there was little
evidence of home advantage.
Keywords: Logistic and polytomous regression; Log-linear modelling; Permutation regression methods; Tournament
results; World ranks
1.
Introduction
The results of an international tennis tournament take the form of a particular type of ranking.
There will be a winner, a losing nalist, two losing seminalists, four losing quarter nalists, etc.
Consequently the outcome variable will be a ranking but with ties whose pattern is entirely
predictable before the competition. The results of international golf tournaments can also be
expressed as ranks with a winner, who takes the least number of shots (usually over four rounds)
in the tournament, a runner-up who scores the second lowest total, etc. However, unlike tennis
tournaments, the number of ties for competitors with shared totals lower down the `leader board'
will not have the same predictability.
Competitors in international tennis and golf tournaments will usually have a world ranking
which has been compiled to reect their performances in previous tournaments. It is of interest
to examine the association between tournament outcome and world ranking to investigate how
far `form' inuences performance. To some extent it has a direct bearing, in that seeding in all
tennis tournaments and selective invitations to certain golf tournaments both make use of
previous performance. However, if some association can be established between these two
variables, it is of further interest to examine whether the nature of this association differs for
players representing the host country (subsequently referred to as home players) compared with
foreign (visiting) players in a particular international tournament. In other words, do home
players achieve above (or below) their expected performance level based on their world rank,
i.e. is there evidence of a home advantage?
{Address for correspondence: School of Mathematics and Statistics, University of Birmingham, Edgbaston,
Birmingham, B15 2TT, UK.
E-mail: R.L.Holder@bham.ac.uk
& 1997 Royal Statistical Society
00390526/97/46551
552
HOLDER AND NEVILL
Home advantage, dened by Courneya and Carron (1992) as

`the consistent nding that home teams in sports competitions win over 50% of the games played
under a balanced home and away schedule',
has been well documented in the literature (e.g. Schwartz and Barsky (1977), Varca (1980),
Snyder and Purdy (1985), Pollard (1986), Courneya and Carron (1992) and Nevill et al. (1996)).
Several theories have been proposed to explain home advantage. These include
(a) crowd support inuencing both players' behaviour and ofcials' decisions,
(b) fatigue to the travelling away team and
(c) familiarity with the home venue.
There have been few studies into the presence of home advantage in international grand
slam tennis or major golf tournaments, although we acknowledge that this type of international
home advantage is not precisely the same as that dened by Courneya and Carron (1992). The
rst two factors described above, when used to explain home advantage, are unlikely to
inuence the results of international golf and tennis tournaments. Most competitors travel
sufciently early to overcome `jet lag' and fatigue and very few subjective decisions are made
by ofcials in tennis and golf tournaments, unlike in sports such as football, basketball etc.,
that are likely to inuence the result. However, familiarity with the venue may be important
especially in a game such as golf. Here the lay-out of the course, undulations on the greens,
etc. might be crucial to the local golfer although these factors may not be quite so relevant to
the hardened top world-ranked players.
To identify home advantage in such tournaments, Nevill et al. (1997) proposed a regression
analysis of competitors' tournament results in relationship to their world ranks, equivalent to
an analysis of covariance, using the results from the international grand slam tennis and
`major' golf tournaments held during 1993. The form of home advantage assumed was an
increase above expected performance for all home players. This contrasts with the majority of
studies where home advantage is assessed on the basis of a balanced schedule of home and
away matches.
Nevill et al. (1997) acknowledged that assumptions necessary for the proposed regression
analyses will not always be met, i.e. the error term will not be exactly normally distributed
with constant error variance. Hence a purpose of the present study is to explore alternative
distribution-free methods that make less restrictive assumptions about the relationship between
tournament results and world ranks. Stefani and Clarke (1992) also used a regression
formulation to incorporate home advantage when predicting the winning margin in a variety of
different team sports. In both American college and Australian rules football, an assumption of
normal errors was found to give accurate predictions. Clarke and Norman (1995) extended this
work to investigate home ground advantage in English soccer.
2.
Methods
To investigate the hypotheses that `home advantage' exists in international tennis and golf
competitions, the present study will re-examine the data collected by Nevill et al. (1997) from
the eight international tournaments held in 1993, i.e. the four grand slam tennis tournaments
(the Australian, French and US Open and Wimbledon), together with the four major golf
tournaments (the British and US Open, the US Masters and the US Professional Golfers'
Association (PGA)). In each case, tournament results and world rankings were established for
most competitors, together with the country that the competitors represented. This provided a
database of 127 3 4 508 matches from the four grand slam tennis tournaments. Correspondingly, the four major golf tournaments provided 55 (British Open) 64 (US PGA) 58
(US Masters) 67 (US Open) 244 competitors' tournament results together with their corresponding world rankings.
MODELLING PERFORMANCE AT TENNIS AND GOLF TOURNAMENTS
553
2.1. Loglog- and normal score regression

In Nevill et al. (1997) a loglog-relationship between tournament ranking and world
ranking was proposed and standard regression methodology using the following equation was
used to examine the variation in slope and intercept parameters between home and visiting
competitors:
ln(result rank) a b ln(world rank) c home d home ln(world rank):
(1)
The non-linear relationship between result and world rank implied by the loglog-regression
model might be expected as a consequence of an increase in the proportion of absent players with
numerically higher (inferior) world ranking. With result ranks xed this extended tail in the world
rank distribution will tend to add curvature to the ideal relationship where the top world rank
player won the competition, the losing nalist is the second in world ranking, etc.
An alternative regression model might be derived from the assumption of `latent variables'
underlying both the tournament result and the world ranking scales. Replacing results and
rankings by their equivalent normal scores and adopting a regression model to relate these two
normal scores would examine the possibility of an underlying normal scale. (For comparison
only, a standard regression analysis was carried out on the untransformed result ranks using the
competitors' world ranks (untransformed) as the predictor variable.)
2.2. Permutation regression
However, the feature of the above regression models which is less justiable is the additive
normal error term. As mentioned previously, the result ranks will always have the same pattern
and the random element will stem from the uncertain way that the world ranks are associated
with the result ranks. Consequently, a more realistic probability model might be that, under the
null hypothesis of no association between tournament rank and world rank, all permutations of
world ranks across the result ranks are equally likely. Thus a permutation null distribution of
the loglog-regression test statistic could be formed by repeated regressions of log-result-ranks
on log-world-ranks using all permutations. A random sample of such permutations would give
an estimate of such a permutation distribution within feasible computational limits.
To test for an overall home advantage, the usual regression approach would be to establish
a (0, 1) variable to indicate home or away status of a competitor and to include this indicator
variable as a further variable in the regression, testing the signicance of this variable in the
presence of the other independent variable (log-world-rank for the loglog-regression). An
equivalent permutation procedure would be to permute the values of this (0, 1) variable across
all competitors but to keep the world ranks associated with the actual result ranks achieved.
To test for an interaction of home advantage and world rank (the degree of home advantage
changing with world rank), standard regression methodology would be to create a further
variable from the product of the (0, 1) home or away status variable and, for the log
log-regression, the log-world-rank variable. A permutation approach would therefore be to
permute the values of only this new variable across the competitors' results, keeping the logworld-ranks and the home or away indicator variables correctly assigned to each competitor's
result.
2.3. Logistic and polytomous regression for tennis tournaments
For tennis tournaments, the structure of the competition is to have a series of `rounds' with
winning players going forward from one round to the next. Consequently, an alternative way of
looking at the outcome of a tournament would be to model the progression of competitors
from one round to the next on the basis of their world ranking (log-transformed) and whether
they are home or away competitors. The following logistic regression model would allow the
probability of success in the ith round to be related to these independent variables by using a
similar structure to that adopted in previous regressions:
554
HOLDER AND NEVILL
P(winning in ith round)
expfa i b i ln(world rank) c i home d i home ln(world rank)g

:
1 expfa i b i ln(world rank) c i home d i home ln(world rank)g
(2)
With such a model it would be possible to examine at what stage of a competition there was any
home advantage or indeed any association of success with world ranking. The probability
modelled in equation (2) is a conditional probability: that of winning in the ith round given that
the player reaches the ith round.
Polytomous regression might be used to model the unconditional probability that the player
reaches the (i 1)th round or further and a typical model formulation might be that given by
P(progressing beyond ith round)
expfa i b ln(world rank) c home d home ln(world rank)g
: (3)
1 expfa i b ln(world rank) c home d home ln(world rank)g
The major difference between models (2) and (3) is that equation (2) models the chance of
reaching the (i 1)th round or further given that the player has reached the ith round whereas
equation (3) models the same success assessed as though at the start of the tournament. Also, in
contrast with model (2), the effect of world rank and home advantage is assumed not to vary from
round to round in model (3).
3.
Results
3.1. Loglog- and normal score regression models

Table 1 sets out the performance of the various regression models relating tournament result
to world ranking judged on a conventional basis (R2 ) for all eight tennis and golf international
tournaments. The possibilities examined were a regression of
(a) tournament result on world rank (untransformed),
(b) log-tournament-result on log-world-rank (loglog) and
(c) normal scores of tournament result on normal scores of world rank (normal scores).
Two features are evident from Table 1: the loglog-regression is generally the more successful and there is a stronger association between tournament result and world rank in the
tennis tournaments than in the golf tournaments. For loglog-regression, the intercept term will
estimate the log-result-rank for the top world ranking player and consequently there is some
suggestion that, particularly in the golf tournaments, the world ranking number 1 player may
be expected to nish as low as in 13th place.
TABLE 1
Transformations in linear regressions of result rank on world rank
Tournament
Australian Open
French Open
US Open (tennis)
Wimbledon
British Open
US Masters
US Open (golf)
US PGA
Untransformed
R2
0.185
0.095
0.139
0.024
0.027
0.038
0.071
0.000
Normal scores
R2
0.187
0.165
0.180
0.148
0.117
0.036
0.069
0.028
loglog-values
R2
Intercept
Slope
0.355
0.264
0.219
0.387
0.275
0.030
0.058
0.103
1.41
1.84
1.92
1.61
1.67
2.61
2.45
2.29
0.64
0.53
0.51
0.57
0.42
0.15
0.22
0.25
555
3.2. Loglog-permutation regression models

Signicance levels achieved by the loglog-regression determined with the usual F-ratio of
regression and residual mean square are given in the second and third columns of Table 2.
For the second column, the usual F-distribution has been used, i.e. the error terms in the
regression model have been assumed to be normally distributed. For the third column, 2000
permutations of world ranks were produced and the proportion giving an F-ratio greater than
that found with the original data is recorded. The degree of agreement between the two
columns is surprisingly good particularly bearing in mind the non-normal nature of the
dependent variable. Note that the differences between the two columns are of the same order
as the standard errors of these estimated proportions.
Agreement between the null distributions of the F-statistic under normal and permutation
assumptions may be judged more completely from the Q Q-plot in Fig. 1, derived from 2000
permutations of the world ranks taken from the 1993 Australian Open tennis tournament. The
agreement over a wide range of the distribution is surprisingly good and thus it would appear
that, as far as the outcome of this F-ratio test is concerned, an assumption of normality of
errors would not have given misleading conclusions.
Using either conventional or permutation testing methods it would therefore appear that logworld-rank is of value in predicting tournament result in each of the tennis tournaments
considered but in only two of the golf tournaments.
Including a home or away term in the regression model gives the signicance presented in
TABLE 2
Permutation and normal errors p-values (linear regression: log-result-rank versus log-world-rank)
Tournament
Australian Open
French Open
US Open (tennis)
Wimbledon
British Open
US Masters
US Open (golf)
US PGA
Fig. 1.
World rank
test statistic
Home or away
test statistic
F-distribution
Permutation
F-distribution
,0.00001
,0.00001
,0.00001
,0.00001
0.00002
0.2536
0.0459
0.0072
,0.0005
,0.0005
,0.0005
,0.0005
0.0005
0.2475
0.0525
0.0075
0.6974
0.2731
0.7664
0.00002
0.9381
0.9957
0.1540
0.5110
Permutation
0.6945
0.2810
0.7570
,0.0005
0.9375
0.9950
0.1465
0.5120
(Home or away) 3
(world rank) test statistic
F-distribution
Permutation
0.3535
0.0976
0.1836
0.0936
0.9897
0.7623
0.0345
0.0610
0.3605
0.1000
0.1930
0.0900
0.9925
0.7530
0.0375
0.0530
Q Q-plot of the permutation and F-distribution for regression statisticsAustralian Open tournament, 1993
556
HOLDER AND NEVILL
the fourth and fth columns of Table 2, again showing a high level of agreement between
permutation and F-distribution results.
Only for the Wimbledon Open tournament does there appear to be a detectable home or
away advantage and disappointingly this is probably an anomaly of the data collection. World
rankings were available for all home players but only the top 100 world-ranked away players.
Fig. 2 shows the resultant pattern of known tournament and world rank results.
Finally the last two columns of Table 2 give permutation and conventional signicance
levels for a home or away advantage that may depend on the level of world rank (i.e. the
(home or away) 3 (world rank) interaction).
For the US Open tournament, both methods suggest signicance and the exact nature of the
difference is evident in Fig. 3 where it can be seen that world rank appears to associate with
tournament result for the home but not the away competitors.
3.3. Logistic regression
In Table 3, the slope parameters b i that were signicant ( p , 0:05) are given for each of the
four international tennis tournaments. None of the home advantage parameters c i reached
signicance but, for the Wimbledon tournament, the estimate of parameter d 1 was signicantly
Fig. 2. Untransformed world and result ranks for home () or away ( . ) competitorsWimbledon tournament, 1993
Fig. 3. log-world- and log-result-ranks for home ( . ) or away () competitors with separate regression linesUS
Open golf tournament, 1993
557
TABLE 3
Logistic regression (tennis only)signicant world rank parameter estimates
Tournament
Australian Open
French Open
US Open
Wimbledon
Round 1, b1
Round 2, b2
Round 3, b3
Round 4, b4
2.18
1.76
1.42
1.61
1.39
2.38
1.50
2.19
3.37
1.99
9.79
3.18
different from 0. However, given the reservations expressed earlier about the availability of
home and away players' data from Wimbledon, no further interpretation is pursued here. The
trend in the slope parameters in later rounds, which is evident in the French, US Open and
Wimbledon tournaments, must to some extent reect the elimination of weaker players in the
earlier rounds, thus reducing the range of world ranks that are present in later rounds. This
convergence onto the best world-ranked players in later rounds is evident in Fig. 4 which
shows the tted logistic models for the rst few rounds of the Wimbledon tournament.
The outcome of a polytomous regression is mainly conrmatory of previous ndings rather
than providing any further insight. The assumption of a common regression coefcient on
world ranks for all rounds is clearly contrary to the ndings of the logistic regression.
Table 4 gives a summary of pertinent statistics from applying polytomous regression to the
four international tennis tournaments' results. The second column indicates the strength of association ( 2 -statistic testing the hypothesis b 0) between log-world-ranking and progression to
Fig. 4. Fitted logistic regressions for rounds 1 ( ), 2 ( ), 3 ( ) and 4 ( . )Wimbledon tournament, 1993
TABLE 4
Polytomous regression (tennis only)signicance tests on world rank and home or away advantage
Tournament
Australian Open
French Open
US Open
Wimbledon
log-world-ranking
(21 )
Slope b,
polytomous
Slope, linear
regression
Home or away
advantage (21 )
26.44
17.78
15.26
26.85
3.16
2.09
2.08
2.72
0.64
0.53
0.51
0.57
0.12
1.45
0.03
15.81
558
HOLDER AND NEVILL
later rounds. The corresponding regression coefcients for log-world-rank are given in the third
column. Interestingly, their relative magnitudes agree closely with the regression coefcients of
the loglog-linear-regressions reported in Table 1 and reproduced again in the fourth column.
Finally, the last column gives an indication of the strength of a home advantage (21 -statistic
testing the hypothesis c 0). Again, the Wimbledon tournament is the only tournament to show
such an effect.
4.
Discussion and conclusions
The methods described above provide limited evidence of a home advantage in the
international tennis and golf tournaments studied in the present paper. Examining nal
tournament results or, in the case of tennis, intermediate round performances reveals only two
tournaments in which there is apparent evidence. In one case, the Wimbledon tournament, this
may be an illusory effect due to differences in availability of world ranking information for
home and visiting players. In the other, the US Open golf tournament, the effect is not of a
clear home advantage but rather of a different association between world rank and tournament
result. For home competitors, a strong association was found between their results and world
rankings, whereas no signicant association was observed for the away competitors. Before
placing too strong an interpretation on this result, the process of how both home and visiting
competitors are invited to enter the US Open tournament needs to be understood. Only the
world's top-ranked visiting competitors are invited to such tournaments, whereas some of the
home competitors will have much lower (numerically higher) world ranks (e.g. entry through
wild card invitation or qualifying tournaments). Consequently, the range of world ranks for
visiting competitors will be considerably less than that of home competitors, allowing less
scope for an association between results and world rankings for visiting competitors, and vice
versa.
As far as the linear regression modelling is concerned, a loglog-relationship appears the
most successful among relatively straightforward relationships. Inference in this context is
particularly well suited to a permutation approach but, surprisingly, null distributions on the
permutation approach are very similar to those expected following the more conventional
normal error structure assumptions. However, the explained variation (R2 ) in each of the
regression analyses was relatively low, indicating that world ranking is not a strong predictor
of tournament result, although tennis tournaments were a little more predictable than golf
tournaments.
The opportunity to model individual round performance is given with a logistic regression
approach. Interestingly, this reveals that world rank has a different effect in different rounds
which means that a polytomous regression model, with a constant world rank regression
parameter over all rounds, is not an appropriate model.
After analysing the data presented here but using loglog-regression, Nevill et al. (1997)
found
`. . . little evidence of home advantage in either the grand-slam tennis or the golf major tournaments
held in 1993. The only possible evidence of home advantage was found in the Wimbledon tennis and
the U.S. Open golf championships.'
These ndings were limited in that they made certain distributional assumptions that are not
required by the permutation methods described above. However, there was surprisingly good
agreement in the conclusions reached by both studies (see Table 2).
References
Clarke, S. R. and Norman, J. M. (1995) Home ground advantage of individual clubs in English soccer. Statistician, 44,
509521.
559
Courneya, K. S. and Carron, A. V. (1992) The home advantage in sport competitions: a literature review. J. Sport
Exercise Psychol., 14, 1327.
Nevill, A. M., Holder, R. L., Bardsley, A., Calvert, H. and Jones, S. (1997) Identifying home advantage in international
tennis and golf tournaments. J. Sports Sci., 15, 437443.
Nevill, A. M., Newell, S. M. and Gale, S. (1996) Factors associated with home advantage in English and Scottish
soccer. J. Sports Sci., 14, 181186.
Pollard, R. (1986) Home advantage in soccer: a retrospective analysis. J. Sports Sci., 4, 237248.
Schwartz, B. and Barsky, S. F. (1977) The home advantage. Socl Forces, 55, 641661.
Snyder, E. E. and Purdy, D. A. (1985) The home advantage in collegiate basketball. Sociol. Sport J., 2, 352356.
Stefani, R. and Clarke, S. R. (1992) Predictions and home advantage for Australian rules football. J. Appl. Statist., 19,
251261.
Varca, P. (1980) An analysis of home and away game performance of male college basketball teams. J. Sport Psychol.,
2, 245257.

Ingenta Id 1v95fu1c0c4id Circus&format

Uploaded by

Copyright:

Available Formats

You might also like

Ingenta Id 1v95fu1c0c4id Circus&format

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ingenta Id 1v95fu1c0c4id Circus&format

Uploaded by

Copyright:

Available Formats

The Statistician (1997)

46, No. 4, pp. 551559

Modelling performance at international tennis and golf

and ALAN M. NEVILL

HOLDER AND NEVILL

Home advantage, dened by Courneya and Carron (1992) as

MODELLING PERFORMANCE AT TENNIS AND GOLF TOURNAMENTS

2.1. Loglog- and normal score regression

HOLDER AND NEVILL

P(winning in ith round)

expfa i b i ln(world rank) c i home d i home ln(world rank)g

3.1. Loglog- and normal score regression models

MODELLING PERFORMANCE AT TENNIS AND GOLF TOURNAMENTS

3.2. Loglog-permutation regression models

HOLDER AND NEVILL

MODELLING PERFORMANCE AT TENNIS AND GOLF TOURNAMENTS

HOLDER AND NEVILL

Discussion and conclusions

MODELLING PERFORMANCE AT TENNIS AND GOLF TOURNAMENTS

You might also like