Author: Mrio Martins

(Universidade Federal do Amap UNIFAP (Brazil)

Collaboration: Amlia Mendes
Centro de Lingustica da Universidade de Lisboa CLUL

CODES can be accessed at CLUL's CQPWeb platform


CODES is a developmental corpus of texts written by school age

children and adolescents, monolingual speakers of European
Portuguese. The texts were collected between September 2011
and January 2012 by Mrio Martins, to his PhD thesis, entitled
Complexidade textual e progresso escolar em dois registos: um
estudo de correlao baseado em um corpus quasi-longitudinal
(Universidade de Lisboa).
Designed as a quasi-longitudinal corpus, CODES is composed of
244 texts of narrative (n = 122) and argumentative (n = 122)
registers. The subjects (51% female and 49% male) are students
in the 5th (n = 26; mean age = 10,19), the 7th (n = 46; mean age =
12,33) and 10th (n = 50; mean age = 15,16) grades from four
different public schools of the Portuguese basic schooling system.
Each student wrote one narrative and one argumentative text,
from two different writing tasks, as described below:
Narrative task: Narrate a remarkable event (a real or a imagined
one) that you and your best friend lived during the last summer
vacations. (Narra um facto marcante (real ou imaginado) que tu e teu(tua)
melhor amigo(a) viveram durante o ltimo vero.)

Argumentative task: Do you think social networks (Facebook,

Twitter, Google+, Windows Live Space, etc.) are important
today? Write a text to be published on the school blog where you
express your opinion on social networks. In this text, you should
say whether you are for or against the existence of social
networks. Do not forget to justify your opinion! (Achas que as redes
sociais (Facebook, Twitter, Google+, Windows Live Space, etc.) so
importantes hoje em dia? Escreve um texto para ser publicado no blogue da
tua escola em que exponhas a tua opinio sobre as redes sociais. Neste
texto deves dizer se s a favor ou contra a existncia das redes sociais. No
te esqueas de justificar a tua opinio!

The student parents were informed about the general objective of

the research - a correlational study between textual complexity
measures and school progression. The parents have signed a
consent term.
CODES was automatically part-of-speech tagged using memorybased tagger (MBT) (Daelemans et al, 1996), with a tag set of 80
different tags. It was automatically lemmatized using a
Portuguese version of the MBLEM lemmatizer (van den Bosch
and Daelemans, 1999). CODES has 48 000 words, approximately.

Cite the corpus as follows:

Martins, Mrio. 2015. CODES. Centro de Lingustica da
For more information about CODES, please contact Mrio
Martins (

