Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

BCB332 ESS33 Getting GBIF Data, Preparing Data for DIVA, and Import into DIVA

Tutorial Purpose: (1) Gain familiarity with GBIF, create a use account, and download species
occurrence data for the appropriate species – (2) Modify GBIF download to prepare for import into DIVA
and (3) import into DIVA for a final shapefile

By the end of this tutorial you should have in your ‘Species Data’ folder of the ‘Video Project’ directory
(established in Part 1):

1) A saved original copy of your GBIF .csv file


2) A saved original copy of your GBIF data in .xlsx or .xls file format
a. This spreadsheet should have all records and all columns of the original data
3) A saved modified copy of your GBIF data in .xlsx or .xls file format
a. This final version should include a limited number of columns, have at most 3000
records, and have properly labeled column headers in the first row of the spreadsheet.
4) A saved modified copy of your GBIF data in .txt tab delimited file format
a. This final version should include a limited number of columns, have at most 3000
records, and have properly labeled column headers in the first row of the spreadsheet.

Part 1: Downloading species data from GBIF

In order to download data from GBIF, you must for have a user account. User accounts are free and only
require a valid email.

Steps for creating a GBIF user account

1) Using your favorite browser go to http://www.gbif.org (GBIF - Global Biodiversity Information


Facility)
2) Select to ‘Create a user account’ at the top right.
3) Fill out the prompts with appropriate information. The email address listed must be valid.
4) A confirmation email will be sent to you. Open the email and follow the link to activate your
account.

Steps for downloading data

1) Once you have your GBIF account set up …Launch http://gbif.org


2) Navigate to and hover over “Data” in the top right
3) From the drop down menu, click on “Species”. It may seem like you should use “occurrences”
but it is necessary to check that you have the right species, subspecies or variety before
investigating occurrences.
4) Enter the species you have chosen in the search block and click search: e.g. Arachis hypogaea L.
the peanut or groundnut; and press “Search”
a. The site will return a list of species names which correspond. It is import to check that
you have chosen the correct species, subspecies or variety for your assignment as in crop
1
plant a single species may have been bred into numerous very different cultivars or a wild
type of a species may have now have no significance as a food crop.

BGIF opening page once you have logged in with your new account

5) When you are sure you have the correct species/subspecies,


6) select that species (click on the blue species name from the list:
a. For example: Ipomoea batatas

Click on the link that seems to make the most sense. In this case, the first return
appears to make the most sense.

2
In the above image you can see 29 381 records (top right) but above the map it says
15 666 records that geo-referenced which means they have Latitude and Longitude
values and can be mapped in your GIS (we call this georeferenced data).
7) Georeferenced data is:
a. Biodiversity georeferenced data are data where each record of a species (a specimen in a
collection or an observation usually others as well) has a latitude and longitude which
places it on the surface of the earth as a point. It may have been recorded by the collector
directly or calculated from the description provided by the collector.) You preferably
want over five hundred georeferenced records. IF you have, fewer than 500 records, let
one of the tutors or myself check it and possibly allocate another species. There are
many common species with only few records should this be the case we will need to
choose another species for your assignment.
b. If the number of georeferenced records is sufficient click on the “[Occurrences]”.
c. It is possible that you have a huge number. We are aim to get about 3000 records. Your
certainly should have more than 2000 records but it could be difficult to work with much
more than 3000 records. There is a time bar and this help sample your download. I set it
in the image below to download only records from 2000 onwards. More recent records
are likely to be more accurate as they would likely have been collected using a GPS
receiver.

d. You use this slide bar and then you click on the “Explore” tab to see how many records
will be included with this filter search.

3
8) This will take you to the “Search occurrences” page. Select the large “Download” button at the
top right. Be sure to download as “Simple CSV” – this will output a tab delimited table file
compatible with Microsoft Excel.
a. You must be logged in to download GBIF data.

You will get this warning dialogue box – the details of which you need to pay close
attention to:

4
Once you accept you will get the following message:

9) You will receive an email with a download link.

5
10) Select the link in the email.
a. Your occurrence data will down load as zipped file containing the georeferenced
specimen or observational records GBIF has for your crop species. If you used the year
filter it will reflect only those records.

Unzipping your data

1) Create a project folder on your computer for your Project files only
a. This directory should be organized – consider creating the Project folder in the same
location as the Tutorial 2 data files for Climate Data and GIS Data.
b. The remainder of this tutorial will assume this file structure:

1. This organizational structure is not required; HOWEVER, it is critical to


stay organized in your data and files to complete this project.

2) Find your downloaded zip file acquired from GBIF


3) Right click on the zip file.
4) Select to “Extract files…”

6
7
5) An ‘Extraction path and options’ window will open
6) Extract the file to the ‘Species Data’ folder in your ‘Video Project’ direcrory:

8
Part Two: Importing the species occurrence data into excel and creating a GIS useable data set.

The GBIF data you have downloaded and unzipped is a single CSV file. This file is not ready to be
imported into DIVA GIS – the data format must first be changed and insignificant and obviously
incorrect data should be removed. Follow the steps below to ready the data.

Steps for importing the occurrence dataset into excel


Your first step is to open this data file in Excel. Even this differs from how you normally work
with files as you have to Open Excel and then direct it to the text file (CSV or Tab- delimited) and
actually import the data.
1. Open a new Excel document
2. Click on the data tab in the top menus
3. Select the “From Text” button within the “Get External Data” submenu

a.
4. An ‘Import Text File’ window will appear
5. Navigate to the CSV file you have just unzipped and click ‘Import’
6. The “Text import wizard” will appear as a popup.
a. Choose the “Delimited” option on the first screen click “Next”
b. Check “Tab” on the second screen for tab delimited data click “Next”
c. Don’t do anything on the third screen and click “Finish”
d. Accept the import data defaults by clicking “Okay”.
e. The data should appear from cell A1 with the column headings in row 1

Save a copy of the data


It is very important the keep a backup of the
Save and make a copy of the excel spread sheet you have just generated. Having a backup of this table is
critical in case you make errors or later need a copy of the original to roll back to.

1) Save the excel spreadsheet twice; both times the files should be saved in the default .xlsx or .xls
file formats:
a. First as the original
b. Second as a workable spreadsheet.

Understanding the data

The data table you have generated contains information related to occurrences of your particular species.
Each occurrence and all its associated information is contained in a single row in the table (one record).

9
The columns of each row correspond to particular information about that occurrence. For example, the
decimallatitude column indicates the latitude in decimal degrees of each occurrence.

10
Delete unnecessary columns

THERE ARE A LOT of columns or fields in the download and hopefully as many rows or records as you
were promised by GDIF (depending if you used a search filter). Along with each location (latitude and
longitude) of the species record is a large amount of taxonomic and collections data. We will first have to
clean this up.

You will now tidy up the data set to make it more usable for GIS modeling we are going to do in the
DIVA GIS program.

1) Ensure the following columns are maintained plus you should look at other fields which could be
useful in terms of selecting the best quality data from the GBIF Website.
a. gbifid
b. countrycode
c. decimallatitude
d. decimallongitude
e. year
2) All other columns can be deleted; these might include but is not limited to columns related to
depth, author, identified by, taxonomic information, data set key, and occurrence id.
3) Click on a column header letter (A, B, C… etc.)

a.
4) Right click and select ‘Delete’

a.

Renaming columns

11
All column headers (text in row 1) must be less than 10 characters long and cannot contain spaces.
Rename all column headers to informative headers that meet these requirements; for example,
decimallatitude could be changed to lat. These are abbreviations I would use.

a. gbifid
b. country
c. lat
d. long
e. year

12
Limiting data to 3000 records

This step will only apply to groups that have more than 3000 occurrences of their species.

On our lab computers DIVA is capable to handle only about 3000 records of occurrence data. If you have
many thousands of records you may have to make a subset choosing 3000 records (or rows) to work with.
I have already suggested that

Part Three: Saving your work

You should save this final version of your spreadsheet in two file formats.

This final version should include a limited number of columns, have at most 1000 records, and have
properly labeled column headers in the first row of the spreadsheet.

First, save your spreadsheet as an .xlsx or .xls worksheet. Traditional .xlsx and .xls worksheets are easy to
open and manipulate. We are saving in this format in the event that the data needs to be further modified.

Second, save your final spreadsheet as a tab delimited text file (.txt). This file will be ultimately imported
into DIVA GIS.

Part Four: Import into DIVA

1) Launch DIVA GIS (Make sure you have closed Excel first)
2) Add the countries shapefile to the map – this shapefile (I will show you in class)
3) Adding our species points onto the map
a. Select to Import Points to Shapefile – From text file (.TXT) from within the Data tab in
the top menu:

13
b. By importing these points, a shapefile has automatically been created. Thiss shaefile has
been created within the same directory as the initial TXT file was help – the shapefile
takes the same name as the TXT file
i. In the future, you can add this shapefile directly onto the map rather than having
to keep importing from the TXT file

14

You might also like