Professional Documents
Culture Documents
Project Grey Analysis Task
Project Grey Analysis Task
Project Grey Analysis Task
GREY – STAGE I
TASK DECRIPTION
April 23, 2019
1. INTRODUCTION
This task is part of a research project aimed at identification of patterns, trends and
anomalies in sports exchange markets in the UK, using the example of greyhound races.
This task is Stage I, a test stage, intended to establish your qualification and ability to
timely perform the required task. You will be required to review and analyze data from
only six of twenty UK racetracks and only from a limited period (Jan – Jul, 2018).
By way of introduction, greyhound races in the UK almost always include 6 runners,
unless there is a last second non-runner. Effectively there are four different distances
that greyhounds race over. In the UK an abbreviated letter is always shown before the
grade of the race to indicate the distance that is being run. These letters are as follows.
A – Middle distance races with four bends
D – Sprint races with two bends
E – Marathon races with eight bends or more
S – Stayers races with six bends
2. TASK
Your goal is to ‘reverse engineer’ the data to determine which event occurred with a %
frequency indicated in the Task Table, as explained in more detail in paragraph 3 below.
Please do the following:
1) Download the Excel spreadsheet from here:
https://www.dropbox.com/sh/t2lh4bb6idhn9ki/AACuX-HEUC66kjEqdznfuQH2a?dl=0
File name: Greys_Jan-Jul_2018_v.7.xlsx
File contains racing data from six racing venues in the UK from Jan-Jul, 2018, as follows:
Workbook 1
1) All – consolidated data from all of the six venues
2) Each subsequent workbook contains data from only one venue:
a. Belle Vue
b. Brighton
c. Crayford
d. Doncaster
e. Harlow
f. Henlow
Key to the data:
Field Description
EVENT_ID Betfair internal event ID
MENU_HINT Contains the name of the race meeting
The race name as it appears on the Betfair menu
EVENT_NAME First letter and number indicate grade of the race
Next number indicates distance
EVENT_DT The start time of the race
SELECTION_ID Betfair internal selection ID
The name of the runner, preceded by a number from 1 to 6, indicating the trap
SELECTION_NAME
number from which the runner starts running
WIN_LOSE 1 if the runner won, 2 if it’s a dead heat, 0 if it lost (or was unplaced)
BSP Betfair Starting Price
2) For the purposes of this task, only the following strings of data should be
analyzed:
a. Race grade/distance (EVENT_NAME column)
b. Runner’s trap number. (SELECTION_NAME column)
It is a number from 1 to 6 placed in front of the runner’s name in the
spreadsheet, e.g. 3. Micky The Cheese – here trap number is 3.
c. Result (WIN_LOSE) – win (1), dead heat (2) or lose (0)
There is no need to study dead heat results separately, consider them a
win with two winners. All dead heats are highlighted in the spreadsheet.
d. Betfair Starting Price (BSP)
It is the decimal odds representing implied probability of a runner winning
the race, as recorded by the Betfair Sports Exchange
3) Analyze the dataset for each of the six racing markets.
The goal is to determine which rare event happened in each of these markets with
the frequency indicated in the following table (the “Task Table”):
2
Please ignore Towcester data as it is incomplete.
Following may serve as additional guidance:
3
First is the number of races in the worksheet, second – number of races for that venue in
the Task Table, third – number of resulting wins as per the Task Table.
Therefore, for the ease of analysis, please start analyzing data from those venues where
the number of races is closest to the number of races in the Task Table, in the following
order:
1) Brighton
2) Doncaster
3) Henlow
4) Harlow
5) Crayford
6) Belle Vue
This should reduce the amount of work because if you conclude what the rare event is
after analyzing only one or two venues, it will be easier to process the data from the
remaining venues just to confirm such conclusion.
4) Take note of any other patterns or anomalies that you may discover in the
process of searching the solution for the issue in paragraph 3 above. This is an
auxiliary task and there is no need to invest time in a separate analysis.
3. DELIVERABLES
For the avoidance of doubt, deliverables must include:
1) The answer to the question/goal in paragraph 30 of the Task section - Which rare
event happened in each of these six racing venues with the known (indicated) %
frequency?
- Reasoning/calculations/evidence for the answer must be explained and proven by
necessary statistical analysis
2) Revised Excel spreadsheet including any and all formulas and macros that you used to
achieve and support your conclusion.
In case of any doubts as to what exactly is required, or if you need any clarifications in
the process of work, please contact me without hesitation and any delay.
4. LEGAL AGREEMENT
IMPORTANT:
Your acceptance and execution of this Task shall serve as your agreement and consent to
hold and maintain strict confidentiality with respect to any and all information received
by you for the purposes of this Task, including but not limited to the Task itself as well as
any conclusions of your execution thereof, for the sole and exclusive benefit of the
ordering party. You shall not use, disclose, make available, nor allow third parties to use
such confidential information or work product resulting from the execution of this Task.
These nondisclosure provisions shall survive the termination of the agreement whereby
4
you execute this task and your duty to hold confidential information shall remain in
effect for five years from the date hereof, or for such longer period as permitted by the
applicable law. This agreement shall be governed and construed in accordance with
English law.
Name: ________________________________________
Date: ________________________________________
Signature: ________________________________________
* * *