Professional Documents
Culture Documents
Algorithms
Algorithms
md
Introduction
This document is a technical description of the different algorithms involved in
classifying planctons, and evaluating the users performances.
First, we will give a technical description of the classification algorithm. The classifier is
the piece of code that is ran every CLASSIFIER_RUN_MINUTES and who checks if there
are planctons that can be considered to have reached a consensus from the comunity
and that can hence be classified.
Then, we will present how the users reliability and then scoring algorithms were
implemented.
Finally, we will give example scenarios as well as their outcomes in order to better
illustrate how the various algorithms work.
Behavior
Classification algorithm
User prediction
taxonomy_id : ID of the taxonomy the user classified the plancton in. Equals
null if the plancton was left unused at the end of a game.
Game result
Internally, when a user send's back a finished game to our server, we end up storing an
array of the above data structure, one entry for each plancton involved in the game.
Categorisation criterias
Here are the criterias that have to be reached in order to consider a plancton
classified:
Non-categorisation criterias
Here are the criterias that have to be reached in order to consider a plancton not-
categorisable:
1. The number of times the plancton has been left at the end of a finished game is >=
NB_PLANCTON_NOT_VALIDATION .
2. The sum of reliabilities of the users that left the plancton is >=
MIN_NOT_VALIDATION_SCORE .
The users reliability scores are also part of the project and are updated at two
moments: when receiving a game's result, and when classifying planctons.
On game result
When parsing a game result, for each valid line of plancton made, the user will:
On plancton classification
When succesfully classifying a plancton, the users that participated in the plancton's
classification will:
Scoring algorithm
This section describes the various components of the scoring algorithm. This algorithm
acts at two different moments: when receiving a game's result, and when classifying
planctons.
On game result
When a game is received, each line of planctons made by the user is double checked
in order to prune out the invalid ones. On the valid ones, the user will:
On plancton classification
Constants
At the time of writing of this document, here are the values of the constants involved in
this project:
Examples
Successfull categorisation
reliability
row plancton_id classified_as count reliability_sum
percent
1 1 Harosa 4 0.8 -
Or in plain text:
Row #1: "4 users, whose reliability sums to 0.8, said that plancton #1 is an
Harosa."
Row #2: "60 users, whose reliability sums to 9.4, said that plancton #1 is an
Annelida."
Row #3: "31 users, whose reliability sums to 3.1, said that plancton #1 is a
Copilia."
Then:
1 2 Harosa 4 0.8
Or in plain text:
Row #1: "4 users, whose reliability sums to 0.8, said that plancton #2 is an
Harosa."
Row #2: "103 users, whose reliability sums to 11.2, said that plancton #2 is Not
categorisable."
Then:
-> The plancton #2 will now be classified as Not Categorisable (Row #2).
Valid line
If the classifier then validates planton #4 as being an Harosa , then the user would: