Professional Documents
Culture Documents
BCB332 ESS333 - Getting GBIF Data and Preparing Data For DIVA
BCB332 ESS333 - Getting GBIF Data and Preparing Data For DIVA
Tutorial Purpose: (1) Gain familiarity with GBIF, create a use account, and download species
occurrence data for the appropriate species – (2) Modify GBIF download to prepare for import into DIVA
and (3) import into DIVA for a final shapefile
By the end of this tutorial you should have in your ‘Species Data’ folder of the ‘Video Project’ directory
(established in Part 1):
In order to download data from GBIF, you must for have a user account. User accounts are free and only
require a valid email.
BGIF opening page once you have logged in with your new account
Click on the link that seems to make the most sense. In this case, the first return
appears to make the most sense.
2
In the above image you can see 29 381 records (top right) but above the map it says
15 666 records that geo-referenced which means they have Latitude and Longitude
values and can be mapped in your GIS (we call this georeferenced data).
7) Georeferenced data is:
a. Biodiversity georeferenced data are data where each record of a species (a specimen in a
collection or an observation usually others as well) has a latitude and longitude which
places it on the surface of the earth as a point. It may have been recorded by the collector
directly or calculated from the description provided by the collector.) You preferably
want over five hundred georeferenced records. IF you have, fewer than 500 records, let
one of the tutors or myself check it and possibly allocate another species. There are
many common species with only few records should this be the case we will need to
choose another species for your assignment.
b. If the number of georeferenced records is sufficient click on the “[Occurrences]”.
c. It is possible that you have a huge number. We are aim to get about 3000 records. Your
certainly should have more than 2000 records but it could be difficult to work with much
more than 3000 records. There is a time bar and this help sample your download. I set it
in the image below to download only records from 2000 onwards. More recent records
are likely to be more accurate as they would likely have been collected using a GPS
receiver.
d. You use this slide bar and then you click on the “Explore” tab to see how many records
will be included with this filter search.
3
8) This will take you to the “Search occurrences” page. Select the large “Download” button at the
top right. Be sure to download as “Simple CSV” – this will output a tab delimited table file
compatible with Microsoft Excel.
a. You must be logged in to download GBIF data.
You will get this warning dialogue box – the details of which you need to pay close
attention to:
4
Once you accept you will get the following message:
5
10) Select the link in the email.
a. Your occurrence data will down load as zipped file containing the georeferenced
specimen or observational records GBIF has for your crop species. If you used the year
filter it will reflect only those records.
1) Create a project folder on your computer for your Project files only
a. This directory should be organized – consider creating the Project folder in the same
location as the Tutorial 2 data files for Climate Data and GIS Data.
b. The remainder of this tutorial will assume this file structure:
6
7
5) An ‘Extraction path and options’ window will open
6) Extract the file to the ‘Species Data’ folder in your ‘Video Project’ direcrory:
8
Part Two: Importing the species occurrence data into excel and creating a GIS useable data set.
The GBIF data you have downloaded and unzipped is a single CSV file. This file is not ready to be
imported into DIVA GIS – the data format must first be changed and insignificant and obviously
incorrect data should be removed. Follow the steps below to ready the data.
a.
4. An ‘Import Text File’ window will appear
5. Navigate to the CSV file you have just unzipped and click ‘Import’
6. The “Text import wizard” will appear as a popup.
a. Choose the “Delimited” option on the first screen click “Next”
b. Check “Tab” on the second screen for tab delimited data click “Next”
c. Don’t do anything on the third screen and click “Finish”
d. Accept the import data defaults by clicking “Okay”.
e. The data should appear from cell A1 with the column headings in row 1
1) Save the excel spreadsheet twice; both times the files should be saved in the default .xlsx or .xls
file formats:
a. First as the original
b. Second as a workable spreadsheet.
The data table you have generated contains information related to occurrences of your particular species.
Each occurrence and all its associated information is contained in a single row in the table (one record).
9
The columns of each row correspond to particular information about that occurrence. For example, the
decimallatitude column indicates the latitude in decimal degrees of each occurrence.
10
Delete unnecessary columns
THERE ARE A LOT of columns or fields in the download and hopefully as many rows or records as you
were promised by GDIF (depending if you used a search filter). Along with each location (latitude and
longitude) of the species record is a large amount of taxonomic and collections data. We will first have to
clean this up.
You will now tidy up the data set to make it more usable for GIS modeling we are going to do in the
DIVA GIS program.
1) Ensure the following columns are maintained plus you should look at other fields which could be
useful in terms of selecting the best quality data from the GBIF Website.
a. gbifid
b. countrycode
c. decimallatitude
d. decimallongitude
e. year
2) All other columns can be deleted; these might include but is not limited to columns related to
depth, author, identified by, taxonomic information, data set key, and occurrence id.
3) Click on a column header letter (A, B, C… etc.)
a.
4) Right click and select ‘Delete’
a.
Renaming columns
11
All column headers (text in row 1) must be less than 10 characters long and cannot contain spaces.
Rename all column headers to informative headers that meet these requirements; for example,
decimallatitude could be changed to lat. These are abbreviations I would use.
a. gbifid
b. country
c. lat
d. long
e. year
12
Limiting data to 3000 records
This step will only apply to groups that have more than 3000 occurrences of their species.
On our lab computers DIVA is capable to handle only about 3000 records of occurrence data. If you have
many thousands of records you may have to make a subset choosing 3000 records (or rows) to work with.
I have already suggested that
You should save this final version of your spreadsheet in two file formats.
This final version should include a limited number of columns, have at most 1000 records, and have
properly labeled column headers in the first row of the spreadsheet.
First, save your spreadsheet as an .xlsx or .xls worksheet. Traditional .xlsx and .xls worksheets are easy to
open and manipulate. We are saving in this format in the event that the data needs to be further modified.
Second, save your final spreadsheet as a tab delimited text file (.txt). This file will be ultimately imported
into DIVA GIS.
1) Launch DIVA GIS (Make sure you have closed Excel first)
2) Add the countries shapefile to the map – this shapefile (I will show you in class)
3) Adding our species points onto the map
a. Select to Import Points to Shapefile – From text file (.TXT) from within the Data tab in
the top menu:
13
b. By importing these points, a shapefile has automatically been created. Thiss shaefile has
been created within the same directory as the initial TXT file was help – the shapefile
takes the same name as the TXT file
i. In the future, you can add this shapefile directly onto the map rather than having
to keep importing from the TXT file
14