Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

ALGORITHMS.

md

Introduction 
This document is a technical description of the different algorithms involved in
classifying planctons, and evaluating the users performances.

First, we will give a technical description of the classification algorithm. The classifier is
the piece of code that is ran every CLASSIFIER_RUN_MINUTES and who checks if there
are planctons that can be considered to have reached a consensus from the comunity
and that can hence be classified.

Then, we will present how the users reliability and then scoring algorithms were
implemented.

Finally, we will give example scenarios as well as their outcomes in order to better
illustrate how the various algorithms work.

Behavior 

Classification algorithm 

User prediction 

An user's prediction is materialized by the following data structure:

user_id : ID of the user who made the prediction.

plancton_id : ID of the plancton involved in the prediction.

taxonomy_id : ID of the taxonomy the user classified the plancton in. Equals
null if the plancton was left unused at the end of a game.

user_reliability : The reliability score of the user at the moment of the


classification.
created_at : Date at which the prediction was made.

Game result 

Internally, when a user send's back a finished game to our server, we end up storing an
array of the above data structure, one entry for each plancton involved in the game.
Categorisation criterias 

Here are the criterias that have to be reached in order to consider a plancton
classified:

1. The number of times the plancton has been user-predicted is >=


NB_PLANCTON_VALIDATION .

2. The sum of the users reliabilities is >= MIN_VALIDATION_SCORE .


3. The sum of users reliability pointing to a specific taxonomy represents >=
VALIDATION_PERCENTAGE of all the sums of reliabilities.

Non-categorisation criterias 

Here are the criterias that have to be reached in order to consider a plancton not-
categorisable:

1. The number of times the plancton has been left at the end of a finished game is >=
NB_PLANCTON_NOT_VALIDATION .

2. The sum of reliabilities of the users that left the plancton is >=
MIN_NOT_VALIDATION_SCORE .

User reliability algorithm 

The users reliability scores are also part of the project and are updated at two
moments: when receiving a game's result, and when classifying planctons.

⚠️ Important: An user's reliability is always kept between 0.0 and 2.0 .

On game result 

When parsing a game result, for each valid line of plancton made, the user will:

Have its reliability multiplied by MUL_RELIABILITY_GOOD_REF on each correctly


predicted referenced plancton.
Have its reliability multiplied by MUL_RELIABILITY_BAD_REF on each wrongly
predicted referenced plancton.

On plancton classification 

When succesfully classifying a plancton, the users that participated in the plancton's
classification will:

Have their reliability multiplied by MUL_RELIABILITY_GOOD_PRED if they were


agreeing with the consensus reached.
Have their reliability multiplied by MUL_RELIABILITY_BAD_PRED if they were
disagreeing with the consensus reached.

Scoring algorithm 

This section describes the various components of the scoring algorithm. This algorithm
acts at two different moments: when receiving a game's result, and when classifying
planctons.

On game result 

When a game is received, each line of planctons made by the user is double checked
in order to prune out the invalid ones. On the valid ones, the user will:

Gain POINTS_PER_VALID_LINE_PLANCTONS points for each plancton of the line.

On plancton classification 

When successfully classifying a plancton, the classifier will also give


POINTS_PER_VALID_PRED_PLANCTONS points to users that classified that plancton
according to the consensus.

Constants 
At the time of writing of this document, here are the values of the constants involved in
this project:

CLASSIFIER_RUN_MINUTES = 60 <- The classfier is ran every hour.

MIN_VALIDATION_SCORE = 3.0 <- A plancton is only considered elligible for


consensus if the sum of the user's reliability reaches at least this value.
NB_PLANCTON_VALIDATION = 20 <- A plancton is only considered elligible for
consensus if at least this number of user classfied it.
VALIDATION_PERCENTAGE = 0.75 <- A plancton's classification must represent this
percentage of the reliabilities.

NB_PLANCTON_NOT_VALIDATION = 100 <- A plancton can be considered as non-


categorisable if at least this number of users said so.
MIN_NOT_VALIDATION_SCORE = 10.0 <- The sum of the user's reliability must
amount to this value for non-categorisation.
MUL_RELIABILITY_GOOD_REF = 1.05 <- The user gains 5% of reliability upon
correctly telling what a referenced plancton is.
MUL_RELIABILITY_BAD_REF = 0.92 <- The user looses 8% of reliability upon
wrongly telling what a referenced plancton is.
MUL_RELIABILITY_GOOD_PRED = 1.10 <- The user gains 10% of reliability upon
agreeing with a reached consensus.
MUL_RELIABILITY_BAD_PRED = 0.85 <- The user looses 15% of reliability upon
disagreeing with a reached consensus.

POINTS_PER_VALID_LINE_PLANCTONS = 3 <- Number of points per plancton the


user wins on a valid game line.
POINTS_PER_VALID_PRED_PLANCTONS = 6 <- Number of points the wins when he
agreed with a reached consensus.

⚠️ Important: These values are still questionned and trying to be fine-


tunned, this document is not garanteed to contain the latest version of these
values. Written on 03/10/2023.

Examples 

Successfull categorisation 

Here is an example collection of user predictions:

reliability
row plancton_id classified_as count reliability_sum
percent

1 1 Harosa 4 0.8 -

2 1 Annelida 60 9.4 0.752

3 1 Copilia 31 3.1 0.248

Or in plain text:

Row #1: "4 users, whose reliability sums to 0.8, said that plancton #1 is an
Harosa."
Row #2: "60 users, whose reliability sums to 9.4, said that plancton #1 is an
Annelida."
Row #3: "31 users, whose reliability sums to 3.1, said that plancton #1 is a
Copilia."

Then:

1. Rows #2 and #3 validate criteria 1. of the "Categorisation criterias".


2. Rows #2 and #3 validate criteria 2. of the "Categorisation criterias".
3. Row #2 validates criteria 3. of the "Categorisation criterias".

-> The classifier will validate plancton #1 to be a Annelida (Row #2).

Successfull non- categorisation 

Here is an example collection of non-categorisation:

row plancton_id classified_as count reliability_sum

1 2 Harosa 4 0.8

2 2 null 103 11.2

Or in plain text:

Row #1: "4 users, whose reliability sums to 0.8, said that plancton #2 is an
Harosa."
Row #2: "103 users, whose reliability sums to 11.2, said that plancton #2 is Not
categorisable."

Then:

1. Row #2 validates criteria 1. and 2. of the "Non categorisation criterias".

-> The plancton #2 will now be classified as Not Categorisable (Row #2).

Valid line 

Example of a game line received from an user:

#1 Harosa (ref) - #2 Harosa (ref) - #3 Harosa (ref) - #4 Harosa (pred)

The user would:


Gain 4 * POINTS_PER_VALID_LINE_PLANCTONS points
Have their reliability score multiplied 3 times by MUL_RELIABILITY_GOOD_REF

If the classifier then validates planton #4 as being an Harosa , then the user would:

Gain POINTS_PER_VALID_PRED_PLANCTONS points


Have their reliability multiplied by MUL_RELIABILITY_GOOD_PRED

You might also like