Project Grey Analysis Task

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

PROJECT

GREY – STAGE I
TASK DECRIPTION


April 23, 2019

1. INTRODUCTION

This task is part of a research project aimed at identification of patterns, trends and
anomalies in sports exchange markets in the UK, using the example of greyhound races.

This task is Stage I, a test stage, intended to establish your qualification and ability to
timely perform the required task. You will be required to review and analyze data from
only six of twenty UK racetracks and only from a limited period (Jan – Jul, 2018).

By way of introduction, greyhound races in the UK almost always include 6 runners,
unless there is a last second non-runner. Effectively there are four different distances
that greyhounds race over. In the UK an abbreviated letter is always shown before the
grade of the race to indicate the distance that is being run. These letters are as follows.

A – Middle distance races with four bends

D – Sprint races with two bends

E – Marathon races with eight bends or more

S – Stayers races with six bends

2. TASK

Your goal is to ‘reverse engineer’ the data to determine which event occurred with a %
frequency indicated in the Task Table, as explained in more detail in paragraph 3 below.

Please do the following:

1) Download the Excel spreadsheet from here:

https://www.dropbox.com/sh/t2lh4bb6idhn9ki/AACuX-HEUC66kjEqdznfuQH2a?dl=0

File name: Greys_Jan-Jul_2018_v.7.xlsx

File contains racing data from six racing venues in the UK from Jan-Jul, 2018, as follows:

Workbook 1

1) All – consolidated data from all of the six venues
2) Each subsequent workbook contains data from only one venue:
a. Belle Vue
b. Brighton
c. Crayford
d. Doncaster
e. Harlow
f. Henlow

Key to the data:

Field Description
EVENT_ID Betfair internal event ID
MENU_HINT Contains the name of the race meeting
The race name as it appears on the Betfair menu
EVENT_NAME First letter and number indicate grade of the race
Next number indicates distance
EVENT_DT The start time of the race
SELECTION_ID Betfair internal selection ID
The name of the runner, preceded by a number from 1 to 6, indicating the trap
SELECTION_NAME
number from which the runner starts running
WIN_LOSE 1 if the runner won, 2 if it’s a dead heat, 0 if it lost (or was unplaced)
BSP Betfair Starting Price


2) For the purposes of this task, only the following strings of data should be
analyzed:

a. Race grade/distance (EVENT_NAME column)

b. Runner’s trap number. (SELECTION_NAME column)
It is a number from 1 to 6 placed in front of the runner’s name in the
spreadsheet, e.g. 3. Micky The Cheese – here trap number is 3.

c. Result (WIN_LOSE) – win (1), dead heat (2) or lose (0)
There is no need to study dead heat results separately, consider them a
win with two winners. All dead heats are highlighted in the spreadsheet.

d. Betfair Starting Price (BSP)
It is the decimal odds representing implied probability of a runner winning
the race, as recorded by the Betfair Sports Exchange


3) Analyze the dataset for each of the six racing markets.

The goal is to determine which rare event happened in each of these markets with
the frequency indicated in the following table (the “Task Table”):

2


Please ignore Towcester data as it is incomplete.

Following may serve as additional guidance:

To clarify, main elements to be analyzed include position of a selection (greyhound) in a


certain trap, favoritism of the selection (based on BSP – Betfair Starting Price, ranked
from 1 (lowest odds BSP) to 6 (highest odds BSP), race type/distance (e.g. B1 490m).

Please be mindful that resulting frequency % may be for a positive as well as negative
event, e.g. relating to total number of wins (or losses) by a third favorite in trap 1, whose
BSP was above 4.5 and below 6.0, in a type A6 race for 380 m.

Further, it is probably safe to assume that the likelihood of such event happening is not
track dependent, i.e. the probability percentage of such event happening is more or less
same for each venue (below 3%).

Finally, please note that the number of races in the spreadsheet you are analyzing and
the Task Table is not 100% identical. Therefore, the resulting actual win figures are not
expected to be exactly the same as in the Task Table, but the percentages should be
almost equal.

For ease of reference each worksheet title contains three figures:


3
First is the number of races in the worksheet, second – number of races for that venue in
the Task Table, third – number of resulting wins as per the Task Table.

Therefore, for the ease of analysis, please start analyzing data from those venues where
the number of races is closest to the number of races in the Task Table, in the following
order:

1) Brighton
2) Doncaster
3) Henlow
4) Harlow
5) Crayford
6) Belle Vue

This should reduce the amount of work because if you conclude what the rare event is
after analyzing only one or two venues, it will be easier to process the data from the
remaining venues just to confirm such conclusion.

4) Take note of any other patterns or anomalies that you may discover in the
process of searching the solution for the issue in paragraph 3 above. This is an
auxiliary task and there is no need to invest time in a separate analysis.

3. DELIVERABLES

For the avoidance of doubt, deliverables must include:

1) The answer to the question/goal in paragraph 30 of the Task section - Which rare
event happened in each of these six racing venues with the known (indicated) %
frequency?

- Reasoning/calculations/evidence for the answer must be explained and proven by
necessary statistical analysis

2) Revised Excel spreadsheet including any and all formulas and macros that you used to
achieve and support your conclusion.

In case of any doubts as to what exactly is required, or if you need any clarifications in
the process of work, please contact me without hesitation and any delay.


4. LEGAL AGREEMENT

IMPORTANT:

Your acceptance and execution of this Task shall serve as your agreement and consent to
hold and maintain strict confidentiality with respect to any and all information received
by you for the purposes of this Task, including but not limited to the Task itself as well as
any conclusions of your execution thereof, for the sole and exclusive benefit of the
ordering party. You shall not use, disclose, make available, nor allow third parties to use
such confidential information or work product resulting from the execution of this Task.
These nondisclosure provisions shall survive the termination of the agreement whereby

4
you execute this task and your duty to hold confidential information shall remain in
effect for five years from the date hereof, or for such longer period as permitted by the
applicable law. This agreement shall be governed and construed in accordance with
English law.


Name: ________________________________________

Date: ________________________________________

Signature: ________________________________________


* * *

You might also like