Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

challenge accepted

A galaxy of data challenges


By organizing Kaggle competitions, astrophysicist Thomas Kitching can focus on asking the right questions.

I
n science we always have more questions But is running competitions the best way
than answers. Finding the right tools to to generate new ideas and advance science?
tackle these questions is an essential part Organizing a competition is stressful,
of research. This challenge has recently and while the majority of competitors
taken on a new shape as many fields are collaborative some competitors get,
transition from data-scarce to data-rich well…​competitive — which is not always
paradigms and individual researchers realize a pleasant experience. More broadly, the
they do not possess the tools to analyse approach of crowdsourcing scientific
large heterogeneous datasets. One strategy advances to a competition could be seen
is to pursue interdisciplinary research Credit: ESA/Hubble and NASA as a step towards a more market-driven
and collaborations. Another is to seek the approach to science, particularly when
wisdom of the crowd. prizes are monetary in nature. In a survival
This is the approach that we took by and so solutions can be tested, improved and of the fittest, almost everyone ends up a
engaging with Kaggle, the data science combined, and evolve at a rapid rate, with no loser on the leader board. So perhaps it
platform. Kaggle runs competitions in specialist equipment required. would be better to collaborate rather than
which organizers upload a dataset and It can be difficult for organizers to offer compete, and Kaggle is now moving towards
pose a problem or question based on that good prizes. However, we identified a that model by encouraging people to upload
data. Organizers define a metric for scoring new mode of impact by partnering with a and explore datasets together.
submissions, and provide training data for company — Winton Capital Management A broader question is whether
competitors to play around with and train — who sponsored monetary prizes for the crowdsourced competitions are a step
their models. A prize is also offered, which competitions. In return we allowed them towards a mode of science in which
can be anything from cash to a job interview. to offer job interviews for data science scientists only ever ask the questions, but
From the organizer’s perspective, Kaggle positions to winners of the competitions. rely on others to find the answers. Taking
is a source for myriads of data scientists This is a win-win-win scenario: astronomy this further: we use human competitors now,
worldwide who can try to solve your data wins because we get good competitors but perhaps in the future AI competitors will
science problem. The key is to ask the right solving our problems, the sponsoring answer the questions we set. This reminds
question, and provide the right prize. From company wins because they can reduce me of Isaac Asimov’s Multivac stories,
the competitor’s perspective, you can get recruitment costs by getting access to people where scientists are those who ask the right
access to new datasets, interact with other who have proven their data science skills, questions to an AI, which provides
data scientists, and potentially get rich or and the competitors win because they get the answers.
land a new job. a chance to solve astronomy mysteries and But for the moment at least, by admitting
We have run three Kaggle competitions earn a position on the leader board while our limited knowledge and reaching out
since 2010 that have posed the problems (hopefully) having fun. for help, these competitions provide a data
of measuring galaxy shapes1, determining During our competitions, new ideas playground in which problems can be solved
the position of dark matter in noisy data2 were revealed for successfully solving and new friendships formed. ❐
and classifying galaxy types3, and there is the problems, leading to scientific papers
currently a competition to help classify describing the results. While the exact Thomas Kitching
supernovae observations4. Running public implementation of these ideas needed Mullard Space Science Laboratory, UCL,
competitions to spur scientific advances refinement before applying to real data, London, UK.
is not a new idea, and has a long heritage the kernel of these new directions for e-mail: t.kitching@ucl.ac.uk
dating back to at least the Longitude Prize analysis began on the Kaggle leader
— a challenge set in 1713 to accurately board. However, it was pointed out in one Published online: 11 February 2019
determine the longitudinal position of a competition — in a not entirely tongue https://doi.org/10.1038/s42256-018-0016-x
ship, with a maximum prize of up to £20,000 in cheek manner — that the best way to
(equivalent to over £2 million in 2018). win would be to model not the data, but References
1. Mapping dark matter. Kaggle https://www.kaggle.com/c/mdm
Fast-forward to the twenty-first century and the person who created the competition, (2012).
such competitions thrive by the Internet, in order to determine what assumptions 2. Observing dark worlds. Kaggle https://www.kaggle.com/c/
through which a nearly unlimited number were made. Indeed, from my perspective DarkWorlds (2013).
of potential competitors can be instantly as a competition setter this is an astute 3. Galaxy zoo. Kaggle https://www.kaggle.com/c/galaxy-zoo-the-
galaxy-challenge (2014).
reached. Furthermore, Kaggle competitions observation, and one that future challenges 4. PLAsTiCC astronomical classification. Kaggle https://www.kaggle.
are software rather than hardware based, should carefully consider. com/c/PLAsTiCC-2018 (2018).

120 Nature Machine Intelligence | VOL 1 | FEBRUARY 2019 | 120 | www.nature.com/natmachintell

You might also like