Professional Documents
Culture Documents
Abstract:: How To Create An Online Corpus
Abstract:: How To Create An Online Corpus
Abstract:: How To Create An Online Corpus
Aim of this paper is to provide a guideline for beginners for making Corpus. There is step by
step guideline for building Corpus. Building Corpus was very time consuming field of
linguistics research but now it is much easy to build a Corpus. A lot of retrievable electronic text
can be found in the web.
In order to build a Corpus there are number of factors which need to be taken
consideration. This paper is designed for describing those factors which are important to know
for making written and spoken Corpus.
Introduction:
Here we are going to explain some steps by following we can create spoken or written Corpus.
There are number of software in the computer world to analyze the corpus as some are free ware and
some need contribution for use.
AntConc http://www.antlab.sci.waseda.ac.jp/software.html
Wordsmith http://lexically.net/
Monoconc http://www.monoconc.com/
CasualConc https://sites.google.com/site/casualconc/
Wmatrix http://ucrel.lancs.ac.uk/wmatrix/
SketchEngine http://www.sketchengine.co.uk/
To create a corpus in the interface, login and go to the home page (if already logged in, you can get to this
page by clicking home top right of the screen.)
The five sections in this page describe:
2- Add a file:
If you select add a new file you have options to
When you have created a corpus there are many tools available to you in the left hand side panel. Select
the corpus by clicking on its name from the home page and under the Corpus heading in the left hand side
menu to can:
Feedback: We want your feedback on our site. If you’ve got questions, spotted an inaccuracy or just want to share some ideas
Download: The STV News app is Scotland’s favourite and is available for iPhone from the Apple store and for Android
from Google Play. Download it today and continue to enjoy STV News wherever you are.
Join in: For debate, chat, comment and more, join our communities on the STV News Facebook page or follow @STVNews on
Twitter.
Instead of saving it as a .doc or .docx file, we’re going to save it as a .txt file to the desktop.
File > Save as > [article title], but in the drop down menu labeled SAVE AS TYPE we’re going to
choose the file type “Plain Text (.txt)”.
This will give you a warning: “Saving as a text file will cause all formatting, pictures, and objects
in your file to be lost.” You also get an option for Text Encoding. Select Other Encoding:
“Unicode (UTF-8)” and click OK.
Go to the desktop and check to see you have a file that looks like this. (Depending on some
settings, It might save as ‘Cameron.txt’, or it might just save as ‘Cameron’.)
To be safe, make sure every file is saved with the .txt suffix! Each file you want to use in your
corpus must be a plain text file for Antconc to use it. You can open the file in Notepad to see
what it looks like:
Further reading on corpus construction:
Wynne, M (ed.) (2005). Developing Linguistic Corpora: a Guide to Good Practice . Oxford: Oxbow
Books.http://www.ahds.ac.uk/creating/guides/linguistic-corpora/
Getting AntConc:
Please go to http://www.laurenceanthony.net/software/antconc/releases/AntConc324/ and
download the file Antconc.exe (for PCs). Select Save File. On Internet Explorer, it will ask you if
you want to RUN or SAVE the file. Select RUN, rather than Save File; it will go directly to the
security warning below. On Firefox, select SAVE FILE, then RUN (see screenshots below)
You want to RUN the software, so click RUN on the security warning dialog box.
Getting Started:
When AntConc launches, it will look like this.
On the left-hand side, there is a window to see all corpus files loaded.
Further help and resources are available, linked 1/3 of the way down on the software page,
after Citing/Referencing AntConc. Here’s a selection –
https://groups.google.com/forum/#!forum/antconc
AntConc3.2.0 Help
AntConc3.1.2 Help
Various video tutorials (https://www.youtube.com/user/AntlabJPN)
#corpusMOOC / https://www.futurelearn.com/courses/corpus-linguistics (running
again September 2014)
Search Operators
* operator (zero or more characters) can help, for instance, find both the singular and the plural
forms of nouns
Example: search for quality*, then sort this search. what tends to precede and
follow quality & qualities? For a full list of available wildcard operators and what they mean, go to
Global Settings > Wildcard Settings.
Spoken corpus:
When we are saying we can create spoken corpus as well than question aeries how we can make
it. There are five easy steps to make spoken corpus presented by Cambridge University students.
4. Record
Each recording should be between 10 minutes - 2 hours. If you're worried that the conversation
might dry up, you could always think about some topics beforehand.