Tanner Data Management Model Survey

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

Data Management Model Survey

The University of Sydney


School of Geosciences

Data Management Model Survey

by

Edwina Tanner
Andy Myers
Shaun Kim
Mark Richards

December 2008

This project was funded by the Australian Partnership for Sustainable Repositories
Data Management Model Survey

Table of Contents

1. Introduction and Justification ........................................................................................1

1.2. Australian Code for the Responsible Conduct of Research ........................................................ 1


1.3. Need for an asset management system at departmental level ................................................... 1
1.4. Improving data management within the school ........................................................................... 2
1.5. Policy Considerations .................................................................................................................. 2
1.6. Model objectives .......................................................................................................................... 2

2. Existing University Data Management Systems ...........................................................3

2.1. eScholarship Repository .............................................................................................................. 3


2.2. The Sydney Digital Thesis Project ............................................................................................... 4
2.3. Integrated Research Management Application (IRMA) ............................................................... 4
2.4. Higher Education Research Data Collection (HERDC) ............................................................... 5
2.5. Research Information Management System (RIMS) ................................................................... 5
2.6. Australian National Data Service (ANDS).................................................................................... 6

3. Synopsis of School of Geosciences Survey Results.....................................................7

3.1. Senior Computer Systems Officer, School of Geosciences .......................................................... 7


3.2. Administration Officer, School of Geosciences............................................................................ 7
3.3. Postgraduate Student Representative, School of Geosciences.................................................. 8
3.4. Academic Representative, Head of School and Post-Graduate Coordinator.............................. 9
3.5. Digital Project Analyst and Head of Information Communication Technology (ICT) ................. 10
3.6. Digital Repository Coordinator Sydney eScholarship and liaison Librarian SciTech Library .... 11
4. Analysis of Research and Survey Results ..................................................................12

4.1. Current Workflow: Researchers................................................................................................. 12


4.2 Current Workflow: Students - PhD and Masters by Research .................................................. 13
4.3 Current Work flow: Students – Honours .................................................................................... 15
4.4 Outcomes of Analysis ................................................................................................................ 16
5. Theses Workflow Proposals .......................................................................................17

5.1. Digital Theses Workflow ............................................................................................................ 17


5.2. PhD and Masters (Research) Theses Workflow........................................................................ 19
5.3. Honours Theses Workflow......................................................................................................... 21
5.4. Recommendations from Theses Workflow Analysis.................................................................. 23
6. Unpublished Research Material Proposal...................................................................24

7. Raw Digital Research Data and Objects.....................................................................26

7.1. Student Data Collections ........................................................................................................... 26


7.2. Use of the Institutional Repository for Managing Data Collections............................................ 27
7.3. Research Data Collections ........................................................................................................ 28
7.4. GIS and Specialised Research Data Collections....................................................................... 28
7.5. University of Columbia Data Catalogue Example...................................................................... 30
7.6. The Use of Discipline Specific Repositories .............................................................................. 30
7.7. Submission of data with Journal Articles ................................................................................... 31
7.8. End-to End Data Management Plan .......................................................................................... 31
8. Physical Assets...........................................................................................................33
Data Management Model Survey

9. Data Management Models..........................................................................................34

9.1. Digital Thesis Department Management Model......................................................................... 34


9.2. Digital Thesis Institutional Repository Management Model....................................................... 34
9.3. Geosciences End-to-End Data Management Model ................................................................. 35
9.4. Geosciences Data Centre Management Model......................................................................... 36
9. Conclusion..................................................................................................................37

APPENDICES....................................................................................................................38

Appendix 1 - Dublin Core Metadata Fields used in the Sydney Digital Theses Program
.......................................................................................................................................38
Appendix 2 - Managing University of Sydney Faculty of Science undergraduate and
postgraduate theses within the Library’s eScholarship repository. .................................39
Appendix 3 – eScholarship Copyright Release Form...................................................43

Appendix 4 - Example of Metadata Required to Describe a Dataset ......................45


Appendix 5 – Spatial Data Sets held in the Department of Geosciences ......................46
Appendix 6 – Sample FGDC Content Standard .............................................................47
Appendix 7 – University of Columbia - Data Catalogue ............................................47
Appendix 8 - Summary of Recommendations ................................................................50
REFERENCES ..................................................................................................................52
Data Management Model Survey

EXECUTIVE SUMMARY

The value of institutional data is increased through its widespread and appropriate use; its value
is diminished through misuse, misinterpretation, or unnecessary restrictions to its access.1

An increasing amount of digital objects are being produced on a daily basis by researchers at the
University of Sydney. These materials include journal articles, theses, images and datasets. The
University increasingly has the responsibility for providing services to store, curate, disseminate
and preserve outputs from these research activities.

Departments, faculties, research groups and individuals are responsible for the creation, reuse and
management of research outputs, including data, in conformance with both internal and external
policies. Ready access to data, whether as part of a collaboration, for wider reuse, long-term
preservation, or to meet security or ethical obligations, is increasingly important. The amount and
complexity of digital objects produced or used within a research context is growing beyond the point
at which a single project or unit can be expected to provide the supporting infrastructure.2

In the competitive field that research is becoming the key to success is collaboration and being able
to efficiently find and use quality data which is ready to be assimilated into a project. In striving for
a “handle once, use many times” approach to data management there needs to be standards and
policy guidelines issued which are balanced with technical requirements and time constraints of
academics and researchers.

With the loss of data curator positions and a lack of rigid workflow guidelines or policies within a
University department the lines of responsibility for data management duties have become blurred
resulting in loss of data and duplication of effort. In order to implement best data management
practices within a department we examined the structure and workflows of existing systems so as to
exploit their maximum potential and streamline activities.

Using the school of Geosciences as an example this survey describes a data management model
which identifies how the valuable documents and data created by a University department can be
managed effectively using existing systems through the implementation of rules and guidelines
along with workflow strategies.

1
University of Utah Institutional Data Management Policy
2
Scoping Digital Repositories Services for Research Data Management
Data Management Model Survey

1. Introduction and Justification

Research policies are required that address the ownership of research materials and data,
their storage, their retention beyond the end of the project, and appropriate access to them by
the research community. 3

Academic excellence is a process dependent on building blocks of refined


knowledge, which requires ready access to quality data. The benefits of managing
materials and data effectively enable them to become an asset that can be collated
into a database structure for ready access by many research processes in a
seamless fashion. The pitfalls due to the mismanagement of material and data are
obvious and lead to duplication of effort and loss or difficulty in recovering data
for use in future research projects.

1.2. Australian Code for the Responsible Conduct of Research

The responsible conduct of research includes the proper management and retention of the
research data. Retaining the research data is important because it may be all that remains of
the research work at the end of the project. While it may not be practical to keep all the
primary material (such as ore, biological material, questionnaires or recordings), durable
records derived from them (such as assays, test results, transcripts, and laboratory and field
notes) must be retained and accessible. 4

The advantages offered by well-managed data are obvious such as increased


productivity and cost savings to research projects by having ready access to
quality data in a useable form. Such considerations are of increasing importance in
a time of financial prudence.

1.3. Need for an asset management system at departmental level


This project has identified significant shortcomings with regards to data and asset
management in a typical university school / department, and has developed a
management model that suggests cost-effective, workable solutions. The data and
asset management model proposed comprises policy/guidelines and an adaptable
framework to promote data and asset management through a focused, cooperative
effort. The project has considered the existing facilities available, both in the
department and at an institutional level, so as to determine how these can be best
utilised in the management of departmental data and assets.

3
Australia Code for the Responsible Conduct of Research
4
Australian Code for the Responsible Conduct of Research

1
Data Management Model Survey

1.4. Improving data management within the school

Improving data management within the school relies on an understanding of existing


systems and the shortcomings of current arrangements.

By conducting a “survey” of the key data handlers within a department the range of
data assets including the publications and datasets created and/or owned by the School
of Geosciences has been identified and how they are managed reviewed. The survey
has also reviewed the structure and workflows of existing systems within the
University of Sydney such as RIMS, HERDC, the library’s eResearch and Digital Thesis
initiatives to determine how the department can best take advantage of these and other
evolving data auditing systems. The data management strategy outlined in this
document has been developed based on the results of this research.

One of the main results of applying this audit strategy has been to identify a mechanism
to streamline data asset management and delivery to an identified repository so that it
can be safeguarded as well as copyright protected. This pilot project has been
conducted using the School of Geosciences, who provided the initial data to test the
model. This School has been selected as a case study to support the work conducted by
an earlier ad-hoc project to manage the School’s theses.

1.5. Policy Considerations

Apart from the technical issues discussed in this document there are also a number of data
policy issues that a department will need to addressed in order to effectively deploy a data
management model. Data management initiatives need to be supported and followed
through with policies which govern how the data collected by a department is to be
managed. Data licensing agreements should be developed to cover issues of security,
confidence and ownership of intellectual property. There will be some situations where it
is not appropriate to make information available and such data will be kept confidential.

1.6. Model objectives


The outcome of this project is a simple, high-level data model which can serve as a
prototype to test the structures and workflows necessary for a department to efficiently
and effectively manage their research assets. Whilst the School of Geosciences has been
selected to assist in the creation and testing of the model, the suggestions made are
intentionally generic, so that the model is interoperable within any department.

This case study is the first step in what may be adopted as an accepted strategy for data
management within a university department. This strategy will ensure that the data
assets of a department are readily available and poised to take advantage of emerging
technologies and the valuable data collected by researchers is backed up in an
2
Data Management Model Survey

appropriate archive for safeguarding and can be made readily available for re-use in
additional research projects.

The objectives of the model are dependant on the needs of researchers, students and
administrative staff within the department. Understanding and appreciating the desires
and constraints on these users is pivotal not only for constructing the model, but also to
ensure its long-term use and viability.

The primary objective is to propose cost-effective, workable solutions to the capture of


research assets (publications, datasets and material), which are currently not documented
by the school. This strategy will help prevent the loss of valuable research outputs through
the mismanagement of information.

Once implemented the management model will have a secondary benefit of promoting the
School of Geosciences within the university, and plausibly the wider higher education
sector. This will result from increased exposure, which in turn may promote collaboration,
resource sharing, whilst at the same time reducing the potential for research duplication.

2. Existing University Data Management Systems


This section provides a short description of some of the key systems which are available to
the research community at the University of Sydney to assist with it’s information
management requirement. A description of the Australian National Data Service (ANDS)
has also been included here as it is recognised as a key player in the future development of
data management policy and services for the management of Australian research data.

2.1. eScholarship Repository

Sydney eScholarship is an initiative of the University of Sydney Library. Sydney


eScholarship is a set of innovative services for the University of Sydney that integrates the
management of digital content with new forms of access and scholarly publication.
eScholarship is the current institutional repository which can be used for the management
a range of University of Sydney academic products.

Sydney eScholarship is built upon the Library's recognised expertise in creating, archiving,
managing, publishing and providing access to digital content, working in partnership
with other University services. Storage and access of data are managed by the systems:
Sydney eScholarship Repository, The Sydney Electronic Text and Image Service (SETIS)
digital collections, Sydney Digital Theses. This report will focus on the eScolarship
repository and Sydney Digital Theses Program.

3
Data Management Model Survey

Further information is available at: http://ses.library.usyd.edu.au/ses/about.html

2.2. The Sydney Digital Thesis Project

The Sydney Digital Thesis Project is a component of the Australian Digital Thesis
Program (http://adt.caul.edu.au/ ). Its aim was to provide the necessary framework and
operational infrastructure to enable the creation of a national database of postgraduate
theses produced at Australian universities. There are currently 28 full members of the
program which provides access to over 1500 postgraduate theses. The theses are provided
in Portable Document Format (pdf) from individual university's collections, and are
searched at a central database consisting of Dublin Core metadata provided at the time of
lodging the thesis. Appendix 1 describes the Dublin Core metadata fields used in this
project. This project accepts PhD and Masters (Research) theses exclusively.

Further information is available at: http://www.library.usyd.edu.au/theses/index.html

2.3. Integrated Research Management Application (IRMA)

In March 2006 the Research Office purchased the Integrated Research Management
Application (IRMA) as an interim system to capture publication data for the Higher
Education Research Data Collection (HERDC) and later for the Research Quality
Framework (RQF).

4
Data Management Model Survey

The purchase of IRMA met some of the Research Office’s immediate operational needs,
but provided a tactical rather than a strategic solution to the challenges of research
management.

Further information can be found at: http://www.usyd.edu.au/ro/herdc/irma.shtml

2.4. Higher Education Research Data Collection (HERDC)

All research staff and affiliates of the University must ensure that they participate in the
HERDC. For the department of Geosciences this information is collected on a monthly
basis to ensure publications are submitted to the Coordinator in a timely fashion.
Previously these were submitted on an annual basis or directly to the Research Office
which left margins for omissions, errors and inefficiencies.

These efforts directly affect the amount of funding made available to the University for
research activities so there is great deal of motivation for academic staff to submit this
information. In the future funding allocation may be based on Excellence in Research for
Australia (ERA) initiative, instead of HERDC.

The Higher Education Research Data Collection comprises research income and research
publications data submitted by universities each year. Returns are due by the end of June
every year.

Further information is available at:


http://www.dest.gov.au/sectors/research_sector/online_forms_services/higher_education_
research_data_collection.htm

2.5. Research Information Management System (RIMS)

RIMS is an enterprise system, comprising a single, central data repository for research
administration information that is accessible to staff across the University and affiliated
institutions via a user-friendly web application.

This will remove the need for the Research Office and faculties to maintain multiple,
stand-alone data repositories (hard copies, electronic documents, spreadsheets, FileMaker
databases etc) containing duplicate and inconsistent data.

The term RIMS is now used to describe just the system component of the wider Research
Management Program. Research Management Program (RMP) is a joint business
improvement initiative, run by ICT in partnership with the University’s Research
portfolio.

The overall objective of the program is to formulate and deliver efficient and effective
research management and administration capability for the University. The Research

5
Data Management Model Survey

Management Program needs to deliver more than a new system however to achieve real
and lasting improvements.

Apart from RIMS, the program comprises three other linked components – processes,
culture/behaviour and organisation design – all of which are essential to the delivery of
enhanced University-wide research management capability.

Further information is available at: http://www.usyd.edu.au/ro/rmp/rims.shtml

2.6. Australian National Data Service (ANDS)

The Australian National Data Service (ANDS) aims to:

• influence national policy in the area of data management in the Australian research
community
• inform best practice for the curation of data
• transform the disparate collections of research data around Australia into a
cohesive collection of research resources.

The long term (ten year) objectives for data management within the Australian National
Data Service (ANDS) are to:

• Transform collections of Australian research data into a cohesive network of


research repositories
• Assist Australian research data managers to become experts in creating, managing
and sharing research data under well formed and maintained data management
policies
• Increase the amount of research data that is routinely deposited into stable,
accessible and sustainable data management and preservation environments
• Provide opportunities for people to develop expertise in data management across
research communities and institutions
• Enable researchers to find and access any relevant data in the Australian 'data
commons'
• Enable Australian researchers to discover, exchange, reuse and combine data from
other researchers and other domains within their own research in new ways
• Facilitate the sharing of Australian data to support international and nationally
distributed multidisciplinary research teams

Further information is available at: http://ands.org.au/

6
Data Management Model Survey

3. Synopsis of School of Geosciences Survey Results

A number of key information managers within the school of Geosciences were surveyed.
Instead of conducting a structured survey of many staff a few key individuals were
selected for a more in depth approach. A summary of their responses is outlined below.

3.1. Senior Computer Systems Officer, School of Geosciences


MR John Twyman
• It was agreed that staff and students have little appreciation of the importance of good
data and assets management practices. It was suggested that as a priority, honours and
postgraduate students be targeted and educated with regards to the appropriate
archival of research assets.
• The importance of creating departmental policies, to support the archival of research
data was also stressed. Without these policies in place, and the support of the
undergraduate and postgraduate convenors, it was thought that any attempts to
encourage archival would be short-lived. The creation of check-lists for students when
they leave the department was recommended.
• Many students undertaking research will place items on the departmental scratch-disk,
which they neglect to delete upon graduation. Those that administer the scratch-disk
may ‘clear-out’ old information submitted by students and staff who are no longer at
the university.
• The department website is out-of-date, and needs to be revamped. In line with the
‘handle once, use many times’ approach to data management, it was suggested that, if
appropriate, the information collected be used to update the departmental website.
• Promoting good data management practices in the department will predispose staff and
students to data repositories. This can only be of benefit if the archival of research data
and assets were to become a mandatory requirement of accepting grants from funding
bodies.
• In terms of physical assets, the school houses a map collection (~10,000) which is
administered by a volunteer once a week. There is also a poorly maintained and largely
undocumented rock store.

3.2. Administration Officer, School of Geosciences


Ms Grace (Lei) Zhang

• Departmental administrative staff document the details of publications and traditional


research outputs (books, book chapters, journal articles and conference proceedings) in
IRMA (Integrated Research Management Application)

7
Data Management Model Survey

• The overwhelming majority of publications recorded are created by departmental staff,


however students and visiting fellows who publish during their time at the university
must also submit details through the application
• All departmental publications recorded in IRMA should be provided to the
Administration Officer as evidence of the output (pdf’s and hardcopies accepted)
• IRMA collects information on publications that satisfy the criteria for HERDC
classifications A1, B1, C1 and E1. This allows the university to report externally to
DEST (Department of Education Science & Technology)
• At the end of the calendar year, IRMA supplies an twelve-month summary of
publications which is used to create the School of Geosciences Annual Report
• For the 2007 calendar year, the Department of Geosciences submitted details of 160
publications into IRMA, of which 126 qualified for HERDC (categories A1, B1, C1 and
E1). The breakdown of these publications is shown in table X below.

Table 1 - Breakdown of Geosciences Publications submitted to IRMA (2007)

Total publications = 160 Total publications = 126


Books 5 Books (A1) 2
Book Chapters 14 Book Chapters (B1) 14
Journal Articles 107 Journal Articles (C1) 94
Conference Publications 34 Conference Publications (E1) 16
IRMA Publications submitted including IRMA Publications which did qualify
those that did not qualify for HERDC for HERDC

3.3. Postgraduate Student Representative, School of Geosciences


Mr Marco Olmos

• Asserted that having access to a database or inventory of research data and assets that
had been collected previously, would be a tool of great benefit. This would save
valuable time and make efficient use of university resources. “Time is one of the most
precious things that a student has.”
• Any database or inventory would also benefit supervisors with any newly created
research assets feeding back into the system. The volume of data would grow over time
and prevent research duplication.
• Current data management practice involves backing up research data on a hard-drive
and also to DVD. Considering the work being undertaken is contributing to a larger,
long-term dataset, it is hoped that the information will continue to be managed and
used within the department.
• It was thought that documenting this long-term dataset may pose some interesting
questions. The dataset has been collected over a period of 15-years by a number of

8
Data Management Model Survey

researchers and students, and so identifying all the owners of intellectual property may
be difficult. Furthermore, because of advances in software and technologies, the
custodian of the dataset may not be up-to-speed with the best way to present the
information.
• It was suggested that good data management practice and data repositories should be
introduced during undergraduate and honours. Such knowledge should be acquired
prior to commencing postgraduate study.

3.4. Academic Representative, Head of School and Post-Graduate


Coordinator, School of Geosciences
Dr Dietmar Muller and Dr Derek Wyman

The university of Sydney school of Geosciences hosts approximately 50 postgraduate


research students at any one time. These are spread across the disciplines of Physical
Geography, Human Geography and Geology and Geophysics as shown in the tables
below.

Table 2 - Honours Enrolment into the School of Geosciences

2005 2006 2007

Human Geography 7 15 10

Physical Geography 5 6 8

Geology and Geophysics 8 9 9

(Note: Information provided by Head of School)

Table 3 - Post Enrolment into the School of Geosciences

2005 2006 2007

Human Geography 18 17 18

Physical Geography 11 10 11

Geology and Geophysics 23 14 5

(Note: Information provided by Head of School)

9
Data Management Model Survey

The School of Geosciences is currently host to over 1700 honours, MSc and PhD theses
most of which are in hardcopy format. This represents approximately 3400 years of
valuable research (average ~2 years research per thesis) sitting on the shelves with limited
access. A spreadsheet is maintained to provide an inventory of entries from 1904 to the
present which is managed by an administrative assistant. In many cases particularly in the
case of honours theses this represents the only copy of this information available
anywhere. Digital theses are available on an ad-hoc basis from 2003 and most of these
theses have accompanying data files. These are currently stored on CD-ROMs.

The Geosciences post graduate coordinators advice to postgraduates was mailed out to
students at the end of the second semester as follows:

“I would like to make you aware of the Sydney Digital Theses Project and encourage you
to submit a final version of your MSc and PhD theses to this repository. I think that you
should put your these into pdf format anyway, since it is much easier for you to distribute
and preserve it. In the past, my students theses have been cited in journals and
government reports many time, only because the thesis was distributed digitally”

3.5. Digital Project Analyst and Head of Information Communication


Technology (ICT), University of Sydney
Mr Rowan Brownlee and Dr Jim Richardson

• ICT (Information, Communication, Technology) division have proposed a consolidated


human resource system which will manage fundamental information on all staff,
students and visiting fellows at the university.
• The need for a central, departmental disk space to store all digital theses generated in
the department was considered vitally important, and something that isn’t currently
promoted. This store of digital thesis (Honours, Masters by Research and PhD), will
supplement the hard-copy collection which are physically stored in the compactus. The
storage of digital theses and preparation of an inventory will need to be accompanied
by additional metadata information, such as an abstract.
• In January 2008 Edwina Tanner and Rowan Brownlee worked on a project to improve
the management of the school of Geosciences theses. The draft discussion document
produced by Rowan Brownlee that describes the systems used for managing
undergraduate and postgraduate theses within the school of Goesciences is presented
in Appendix 2.
• The above discussion document pre-dates current University activities regarding
development of data management services. At the time of writing, the Library's
repository service offered a safe haven for the described set of thesis data collections. A
year later the context has changed, with eResearch support and data management
firmly on the institutional agenda. Planned development of institutional data

10
Data Management Model Survey

management services will enable researchers to store, update, and share access to data
with colleagues and link to collaborative and analytic tools.
• Future developments in workflow will also enable 'curation by stealth' (automated
capture and management of metadata). Over time the Library will develop systematic
relationships with such services to enable archival transfer of selected collections.
Policy and service development regarding the scope of Library involvement in data
archiving will be informed by the Australian National Data Service (through their
gradual development of data management reference models). It will also be guided by
future versions of research assessment exercises which will at some stage include
consideration of data sets as research outputs (underpinned by review processes and
agreed archiving models).
• Exploratory work relating thesis data collections and the Library's repository prompted
development of options and consideration of service implications regarding
management of research data and metadata within the Library's repository. A
metadata management options paper is available online at
http://escholarship.library.usyd.edu.au/dpa/meta.shtml. This work is further
developed and placed within the context of Library roles and relationships regarding
data management and eResearch support in a paper in an upcoming April 2009 feature
issue of Cataloging and Classification Quarterly.

3.6. Digital Repository Coordinator Sydney eScholarship and liaison


Librarian SciTech Library, University of Sydney
Mr Sten Christensen and Ms Tina Reedman

• The Digital Thesis Program at the University of Sydney is part of the Australian
Digital Thesis (ADT) Program (http://adt.caul.edu.au/). The digital thesis project is
gaining momentum however at this stage it is only capturing ~40% of theses output
from the university.
• There are a number of departments including economics and business, history and
health sciences who currently use the eResearch repository for managing parts of
their honours theses collection.
• The technical drawback to the use of this repository for managing the entire
honours theses collection is that this is a public access facility which can not easily
manage a closed collection.
• According to the geosciences post graduate coordinator “the fundamental
difference in making an honours theses available to the general public versus a
Masters or PhD is that the honours thesis has not undergone the same rigorous
review process to ensure the standard of the final product”.
• Thus as much as an institutional repository would be the ideal storage and access
facility for this research output giving the student some protection against
plagiarism and copyright it has the potential to expose “unpolished” products.

11
Data Management Model Survey

• The work around being used by the departments is that only the best honours
theses are selected for publication in this repository. This leaves the remaining
large percentage of the theses which still need to be considered and managed
through another workflow stream.
• The desired platform for theses submissions would result from the university
acceptance of mandated on-line submission for all theses. For this to occur students
would require further support with respect to formatting and copyright issues and
workflows need to be established to accommodate this.

4. Analysis of Research and Survey Results


4.1. Current Workflow: Researchers

Figure 1. Schematic work-flow of the current situation for reporting of academic


research outputs. (E-scholarship is representative of an institutional digital repository)

This figure illustrates the typical, simplified workflow through to publication and
highlights the strong focus on journal publication and traditional research outputs (book
chapters, books, journals and conference publications). As this figure illustrates at no point
does the researcher reveal the details of data or assets that they collect during the course of
their research to others within the department.

12
Data Management Model Survey

Although there is the option of archiving digital data with the university’s institutional
repository, this is rare, with researchers instead relying largely on their own back-up
procedures, or using departmental scratch disks. Researchers typically have the option of
submitting data to a repository of their choice, however this in not overseen or regularly
undertaken. Digital repositories, such as E-scholarship, also have the limitation of not
being able to manage hard material.

All researchers create working papers, or drafts of articles, prior to publication. The
management of such ‘works in progress’ will vary considerably, as does the exposure of a
paper for peer comment prior to publication. These working papers may facilitate
discussion and debate on a topic, and provide impetus for future collaborations. Within
the school, there are currently no active channels for the circulation of working papers, or
grey material. The exchange of ideas and feedback appears to occur primarily through
verbal communication and electronic mail.

4.2 Current Workflow: Students - PhD and Masters by Research

Figure 2. Schematic work-flow of the current situation for reporting of PhD and
Masters (research) students research outputs. (E-scholarship is representative of an
institutional digital repository)

The primary focus for students undertaking postgraduate studies by research is the
creation of a thesis. The subsequent publication of journal articles as a result of the

13
Data Management Model Survey

research is celebrated, but is not a prerequisite for graduation. The process of undertaking
research and collecting / gathering data unearths the same pitfalls that befall researchers.

Although students have the option of archiving digital data with the university’s
institutional repository and / or external, discipline-specific repositories, this is rare,
instead relying largely on their own back-up procedures, or using departmental scratch
disks. As indicated through interview with the senior computer systems officer, students
often neglect to manage information placed on department scratch disks, and this will
eventually be wiped as a result of inactivity and non-communication. Unless the student
decides to archive data in the E-scholarship repository or some other identified archive,
the raw research data collected in conjunction with the theses will be removed from
circulation with the student upon completion of a thesis.

Current policy regarding the submission of a PhD and Masters (Research) theses requires
that two hardcopies of the document are presented to the institution. One is to be archived
in the department compactus, while the other is submitted to the principal University
Archive (Fisher Library at the University of Sydney). This second copy resides in the rare
books collection and can be searched online by title or author. The contents however can
only be accessed physically, by visiting the archive and requesting the thesis. Very little, if
any documentation is recorded at the department level when a thesis is submitted to the
local compactus.

As intellectual property rights sit with the student, there is no-mandatory requirement
that the document be digitally archived with Sydney Digital Theses. This is a
recommendation that in many instances is overlooked by euphoric students relieved at
having survived the trials and tribulations of study. Despite the option to use the digital
theses facility being available to all postgraduate students, only a very small percentage
(~1% from the School of Geosciences) of students currently submit their thesis this way.
These figures indicate that the repository is massively underutilised, despite having the
potential to collate and freely market the major research output of students.

14
Data Management Model Survey

4.3 Current Work flow: Students – Honours

Figure 3. Schematic work-flow of the current situation for reporting of Honours


students research outputs. (E-scholarship is representative of an institutional digital
repository)

With a similar focus to PhD and Masters (Research) students, the primary aim of Honours
students is the creation of a thesis. Any publications that stem from the research are
applauded, but not necessary for graduation. As with researchers and postgraduate
students, a number of repositories are available for the submission of raw data, both
institutional based and external, discipline-specific archives. Following the endemic trend
in the higher education system, the active archival of raw data is negligible, with emphasis
remaining firmly on traditional research outputs (theses for students, publications for
researchers).

The submission policy for honours thesis differs significantly to that for PhD and Masters
(Research) students. Only one hardcopy of the thesis is submitted, which is archived in the
department compactus. Very little documentation is recorded when such archival occurs
at a local level. A hardcopy is not submitted to the main University Archives, nor is an
electronic copy permitted to be submitted to the Sydney Digital Theses repository.
Archival through the Sydney Digital Theses repository is not supported due to the limited

15
Data Management Model Survey

vetting that the document receives compared to a postgraduate thesis. This reveals an
inadequate situation therein the primary research output of a years endeavour is poorly
documented and filed away to gather dust. The thesis is effectively removed from
circulation upon a student’s graduation. As an interim, partial solution to this
Shortcoming, the School of Geosciences has recently started collecting CD-ROMS of
theses. One is retained by the department, whilst the second is provided to the students’
supervisor.

Within the department of Geosciences the management of Honours thesis is somewhat


haphazard and theses have become widely distributed or lost over the years. As there is
no other facility to manage these theses the department copy is often the only one in
existence and the cost and effort of recovering lost honours theses is prohibitive.

A number of issues within the department need to be addressed including:

• Defined storage area - Theses are currently stored in compactus unsupervised in a


shared area
• Access to hardcopy and digital honour theses
• Safeguarding honours theses

Ideally the management of these valuable documents would come under the realm of a
data curator. In some departments an administrative position is responsible for this task.

One mechanism which provides an immediate solution to mange theses from this point in
time to the future is through the formalisation of an honours thesis repository between the
institutional repository (e.g. eResearch project) and the department.

If this is done this valuable information can be made available on-line for department and
wider use using a computer based management system.

4.4 Outcomes of Analysis

As identified in the previous sections, there are notable shortcomings in the management
of research assets within the School of Geosciences. These shortcomings are not limited to
a particular demographic group, as are associated with researchers (section 3.1), PhD and
Masters (Research) students (section 3.2) and Honours students (section 3.3). Some
deficiencies span all three groups (e.g. the management of raw research data), whereas
others are targeted based upon the primary research output (e.g. theses created by
students). Broadly speaking, the shortcomings can be classified into four target areas.

i) Digital Theses:
Under-utilisation of Sydney Digital Theses, and no digital theses
repository available for Honours students

16
Data Management Model Survey

ii) Unpublished Research Material:


No active channels for the documentation and circulation of working papers and
other unpublished, digital material

iii) Raw Digital Research Data:


Under-utilisation of the institutional repository and external, discipline-specific
repositories to archive raw data

iv) Physical Material:


Currently no local policy or procedures to document significant physical assets held
within the school

5. Theses Workflow Proposals


5.1. Digital Theses Workflow

Shortcoming: Under-utilisation of Sydney Digital Theses, and no digital theses repository


available for Honours students.

Recommendation: Promote the use of Sydney Digital Theses through various channels,
and identify and endorse a suitable digital repository to house Honours theses.

This proposal considers a two-pronged approach, which meets the digital archival needs
of research students, and those desiring future access to the documents. Whilst this
solution cannot guarantee the digital archival of theses, by providing the facilities, and
endorsing the use of these amenities, it is hoped that a greater percentage of students will
archive these research assets.

Within the department it is feasible that the initial management of student theses and data
could be standardised across all levels of research. This consistency of management
practices will be uniform up until the point of final submission. It is proposed that every
student enrolled in Honours, PhD or Masters (Research) programs within the school be
provided with a networked, password–protected, online folder in which they can manage
draft material relating to their research. Subfolders may be used to distinguish between
the body of the thesis and datasets (both raw and processed), as illustrated in figure 4. The
importance of this distinction between the body of the thesis and the datasets relates to the
management of these resources following final submission, as discussed later in more
detail.

17
Data Management Model Survey

Figure 4. Creation of student folders and sub-folders on networked department space,


distributing metadata templates for theses.

As shown in the above figure, within the thesis body sub-folder resides a distinct metadata
template forms. The metadata form in the thesis body folder outlines metadata fields that
are consistent with those that are required for a submission to the Sydney Digital Theses
repository as outlined in Appendix 1.

One benefit of creating these folders is that it provides the students with a designated
storage space on the local network. They can use this space to back-up material on a
regular basis, supplementing other customs such as saving to USB or portable hard-drives.
The key benefit of this system however, is that if managed and implemented correctly, it
allows the department to retain a digital copy of every student’s thesis, the associated
datasets and all metadata describing these research assets. The success of this proposed
system relies on the implementation strategy and the incorporation of appropriate
departmental procedures.

The generation of folders is a straightforward proposition, and should be created as a


research student commences study in the department. This procedure will facilitate the
automation of managing the information in the future as described in Appendix 2 in the
section “Batch submission of existing metadata and theses”.

Preparation of metadata relating to the body of the thesis should occur after the thesis has
been completed, but prior to submission. The nature of the metadata fields documenting
the body of the thesis, make it impractical for the author to complete the form prior to this
time.

18
Data Management Model Survey

When a student hands in a hard-copy of their thesis to the department (as is required by
all PhD, Masters (Research) and Honours students), it is suggested that whoever takes
deliver should spend a short-time with the student verifying networked material. They
should ensure that a final version of the thesis is available, along with the completed
metadata, as well as key datasets similarly documented. All draft and superseded material
should then be deleted from the scratch disk.

Discussions with the School’s Senior Computer Systems Officer have indicated that
presently many students place items on the department scratch-disk, which they neglect to
delete upon graduation. Those that administer the system intermittently ‘clear-out’ old
information, uninformed of the importance of certain material. This is not a criticism of the
IT personnel, but of those who place information in the system and neglect to manage it
effectively over a longer period of time. The proposed checking of networked space, not
only ensures that the department retains digital copies of research assets, but also
functions to clear out cumbersome material of no prospective value.

Management of material after this point depends on the level of study that has been
undertaken. Management of PhD and Masters (Research) theses which have undergone
rigorous peer review differs significantly to the management of Honours theses.

5.2. PhD and Masters (Research) Theses Workflow

As discussed in section 2, the university already has an established repository for PhD and
Masters (Research) Theses, the Sydney Digital Theses Project. As students typically own
the intellectual property rights to their thesis, submission to this repository is not a
mandatory requirement. It is however strongly recommended to students graduating
from the university with the necessary level of qualification. It is very important that all
postgraduate students be made explicitly aware of the repository and the benefits of
archiving their thesis.

At a local level within the department, the repository should be promoted through various
channels and should be actively endorsed by supervisors and postgraduate convenors at
appropriate times during a student’s tenure. Administrative staff that take delivery of a
hardcopy of the thesis upon completion, should also make the recommendation that the
document by digitally archived in Sydney Digital Theses. This endorsement to archive
may be verbal, or detailed in the form of a handout. Considering that the necessary
metadata fields would have already been completed to document the digital thesis held by
the department, it is hoped that the minor extra effort required to submit to Sydney Digital
Theses will encourage students to submit their thesis online. One simple procedure that
would be recommended is to place a link to the Sydney Digital Thesis project on the
Geosciences web page to further promote the archival of digital theses.

Repeated exposure to the digital theses link will hopefully remind students that they
should be digitally archiving material upon graduation. Considering that Sydney Digital

19
Data Management Model Survey

Theses can only accept PhD and Masters (Research) theses, the link should be provided
under the Postgraduate banner through the Geosciences homepage, as shown in figure 5.

Figure 5. Sample of information to provide regarding the benefit of archival and link to
Sydney Digital Thesis Project.

Information provided above the link will introduce PhD and Masters (Research) students
to the benefits of digitally archiving a thesis. It also explains that archival is with Sydney
Digital Theses, a component of the Australasian Digital Theses Program. This repository
provides an open collection, which is freely available online.

For a future researcher scouring for information, the thesis will be discoverable through
the Sydney Digital Theses repository. This situation however, relies upon a students
motivation to actively submit their thesis to the repository, an effort which many will
overlook with the euphoria of graduating. Considering that the department already has a
final digital version of the thesis, as well as the necessary metadata to document the asset
in the repository, it may be possible to formulate a back-up procedure. This would require
the student to sign any necessary copyright release forms at the time of hardcopy
submission.

Should the student neglect to submit to the Sydney Digital Theses repository within a
suggested time-frame of ~1yr, then submission may be done so on their behalf using the
material stored within the department and the signed copyright release forms. This
submission process may be automated, with Sydney Digital Theses coming into the
system and siphoning off the thesis and prepared metadata. Such an action would require
further consultation, however, it would be wasteful for research assets not to be
discoverable considering that digital versions of the asset, and metadata had been already
prepared and authorised for release.

20
Data Management Model Survey

5.3. Honours Theses Workflow

For reasons outlined previously, honours theses are not supported within the Sydney
Digital Theses repository, and until recently the School of Geosciences were not actively
encouraging the retention of a digital copy of honours theses. The current solution has
been to request a CD-ROM from the student, an interim resolution that fails to utilise the
array of digital facilities available to the department.

As an obvious alternative, the merits of using the institutional repository (eScholarship) to


house honours theses were examined, and the repository was initially found to be
suitable. Already well established as the institutional repository of the University of
Sydney, students can submit to the archive with confidence, knowing that the material is
managed professionally and securely. Submissions can also be made online, having
documented the resource sufficiently through standardised metadata records. The
problem however remains, that being an open repository any material submitted will be
discoverable to all users of the archive. Due to the fact that honours theses are scrutinised
substantially less than PhD and Masters (Research) projects, and are generally not
externally reviewed, there is a general concern regarding the quality of the material
produced. According to the Geoscience’s postgraduate convenor “the fundamental
difference in making an honours thesis available to the general public verses a Masters or
PhD is that the honours thesis has not undergone the same rigorous review process to
ensure the standard of the final product.” Thus, as much as the institutional repository
would be an ideal storage and access facility for this research output, giving the student
some protection against plagiarism and copyright, it has the potential to expose
“unpolished” products.

After discussions with the Senior Project Analyst with eScholarship, it was confirmed that
only honours projects of a high quality should be allowed to be archived within this
repository. It was further suggested that limiting archival of honours theses to those
students who achieve a distinction, or high distinction, may even provide a mild incentive
for students to push themselves to strive for excellence. Indeed, archival of material
increases exposure of the students’ research and its quality, which may lead to
collaborations and opportunities to further their early career.

The eScholarship submission process requires that the department deposits material on
behalf of a student. Considering that the department has already taken steps to retain a
digital version of the thesis and associated metadata on the local network, this is not a
major endeavour. When a student hands in a hard-copy of their thesis which is of a
sufficient calibre (D or HD), they should be presented with the necessary copyright release
forms. Having completed these documents, the department is free to submit the thesis
online to the eScholarship repository.

Unfortunately, a large percentage of honours theses created do not reach the threshold for
submission to the institutional repository. An alternative workflow therefore needs to be

21
Data Management Model Survey

considered that advertises that research has been undertaken, whilst restricting access to
the resource. One solution would be to provide visible metadata relating to the thesis,
whilst managing the resource, and access to it, through the department. Metadata should
be provided through eScholarship, however details of the abstract need to be restricted.
This is so that potential users are not inadvertently misled into thinking that the synopsis
came from an authoritative source. It may be more appropriate to provide a disclaimer in
the abstract section, specifying that the thesis was generated by an honours student and
was not subject to extensive or external screening and review. Unlike honours theses of the
distinction / high distinction calibre, theses of a lower quality should not be made
available to download through the eScholarship website.

Obviously this is not an ideal solution, as the department needs to manage the digital
thesis and all related enquiries for an extended period of time. A departmental staff
member would therefore need to take responsibility for the ongoing management of the
resource, and the creation of metadata in eScholarship. It would be of great benefit if the
institution were to provide a repository, or adapt existing systems, to allow for the
deposition of all honours theses, not only those of the top grade brackets.

In anticipation of such changes, and to avoid retrospective actions, all honours students
(regardless of grade) should be presented with the eScholarship copyright release forms
when handing in a hard-copy. This will allow departmental staff members to provide
metadata details through the institutional repository, with the potential to upload the
thesis at a later date if currently policy and systems evolve to accept them. In the
meantime however, this proposal does serve to capture all digital theses generated
through the school, and the creation of metadata in searchable repositories.

22
Data Management Model Survey

Figure 6. Simplified workflow from thesis submission to availability of metadata /


resource through the institutional repository.

5.4. Recommendations from Theses Workflow Analysis

Recommendation: That the department advise or require all their students to submit a
final copy of their PhD and/or Masters theses using the digital thesis facility. This can be
supported by the post-graduate coordinator and promoted through the department
handbook.

Recommendation: That a representative from the digital thesis project provide additional
seminars to the post-graduate students at strategic times during the semester so that the
students are more aware of this facility and the copyright and intellectual property issues
associated with using this repository.

Recommendation: It is recommended that information about the Sydney Digital Thesis


program be outlined in all Postgraduate handbooks throughout the university.

23
Data Management Model Survey

Recommendation: Someone within the department is appointed to maintain an inventory


and ensure the effective management of all the departmental theses and in particular
honours theses.

Recommendation: That someone within the department be identified to ensure that all
theses and in particular Honours theses are safeguarded and made accessible through a
computer based management system.

Recommendation: Amend school of Geosciences website so Sydney Digital Theses is


promoted under the postgraduate banner.

Recommendation: Students to receive information leaflet about the Sydney Digital Thesis
Program on enrolment in postgraduate studies. This leaflet should cover copyright issues
and recommend that they digitally archive their thesis and datasets on completion of their
study.

Recommendation: The department insist on the submission of a digital copy of all theses,
including where appropriate, one copy to the Digital Thesis project and one copy to the
department.

Recommendation: The department creates a master CD or DVD annually of all theses


submitted by students for that year and includes a copy of the updated metadata
(spreadsheet) on the media.

Recommendation: The department maintains an on-line backup of the information in a


shared area on the network as CD and DVD media formats aren't for forever.

Recommendation: All students should sign a Copyright release form and the department
should manage/archive the signed release forms in association with the thesis and data
sets.

Recommendation: If students fail to submit to Sydney Digital Theses after a period of ~1yr
then the Department should digitally archive material to the repository on behalf of the
student (if appropriate permissions have been granted).

Recommendation: A section in the student handbook is included on appropriate data


management practices and expected behaviour.

6. Unpublished Research Material Proposal

Shortcoming: No active channels for the documentation and circulation of working papers
and other unpublished, digital material.

24
Data Management Model Survey

Recommendation: Identify and promote a suitable repository through which unpublished


research material can be documented and circulated, encouraging discussion and debate.

The proposal to this shortcoming is clear-cut. All staff and students at the University of
Sydney have the opportunity to document and archive high-quality digital material with
the institutional repository, eScholarship. Whilst some departments maintain and nurture
very active collections, others neglect to recognise this facility as a means to circulate and
archive unpublished, high-quality digital material. This discrepancy in usage may be due
to numerous factors, which may relate to discipline, department size and copyright and
confidentiality considerations among others. Whilst some of these factors are beyond the
control of the department, it is possible to increase the exposure of researchers to the
facilities that are available.

The school of Geosciences homepage has a tab titled ‘research’ which, when clicked, takes
users to a web page that describes the research profile of the department. It is suggested
that under this ‘research’ banner some information is provided to researchers detailing the
benefits of archiving and circulating unpublished digital material, such as working papers,
to encourage feedback and stimulate debate. Following this introduction to the digital
archival of grey material, a link should be provided to the Geosciences collection in
eScholarship. A link should also be provided to the ‘search’ homepage of the repository,
so that other researches in the department can very easily and quickly search through
submissions in the geosciences collection. An example of how these web pages might
appear, are shown below in figure 7.

Figure 7. Information provided regarding the benefit of archival, and link to


Geosciences collection in eScholarship

25
Data Management Model Survey

7. Raw Digital Research Data and Objects

Shortcoming: Under-utilisation and awareness of the institutional repository and external,


discipline-specific repositories to archive raw data.

Recommendation: Develop a policy in which all data collected using public funds are
required to be submitted to an appropriate repository.

The storage, curation and preservation of research data presents special challenges.
Further education, ideally coupled with easily accessible backup/storage facilities, is
required for long term storage for large datasets.

It is much more efficient and cost effective to manage data collected in association with
research at the time of collection and submission of final work rather than after the fact. In
the Geosciences department all data which is collected in association with research is
managed on an individual basis such that it is rarely in a form available for re-use in
further projects.

Much of the data collected over the past few decades by the department of Geosciences are
either lost or cannot be read with current computers or software. To parallel this data
curators have also disappeared from the department and funding bodies do not provide
grants for the management and long term storage of this data in repositories. The
responsibility for the management of the data thus falls upon the researcher who collected
them to ensure that the data they collect is stored in a digestible understandable form so as
to be available for new analyses. This comes at a time when researchers are under
increasingly heavy workloads.

7.1. Student Data Collections

A students number one priority in conducting research is – “to gain efficient access to the
research and data that has previously been completed in my area of study” –Research
student department of Geosciences.

A further quote from one geosciences student “I believe that having a database with all the
available data for students such as myself would be an ideal tool as it would save valuable
time and make efficient use of the resources the university has”.

Whenever the student collects or generates a dataset during the course of their research,
they should back this up in their password-protected, network space. At this time they
should also fill-in the provided metadata form, whilst details of the research are fresh and
easy to recall. If multiple datasets are generated that are suitably distinct, then multiple
metadata records may be required. This can be achieved by copying the original metadata
template provided in the folder. If multiple datasets and metadata forms are created, it is

26
Data Management Model Survey

important for the student to manage the files so that the metadata can be easily associated
with the correct dataset. The read_me file provided in the folder outlines that this can be
achieved through the creation of further sub-folders.

Figure 8. Creation of student folders and sub-folders on networked department space,


distributing metadata templates for data sets.

The metadata form in the dataset folder requests basic information that would be required
to ‘adequately’ document research data within a data repository. An example of this
metadata required to describe a dataset is provided in Appendix 3. The intention of these
metadata forms is not to replace the creation of detailed metadata when it comes to final
archival, but to encourage students to think about metadata, and to capture vital research
information at the time of data collection.

7.2. Use of the Institutional Repository for Managing Data Collections

The eResearch repository provides an institutional repository for data generated by


researchers at the University of Sydney. This repository can provide a facility for students
and researchers to attach their data as separate files which can reside on the system
alongside a document or thesis. Data files can also be associated with a thesis either as
separate files or as appendices. This however does not ensure that the data will be
properly documented and may not preserve the original (raw) data.

Researchers may want to scour the data for clues that were missed or not looking for in a
previous analylsis. Even if the raw data survives it is useless without the background
information (metadata) that gives it meaning.

27
Data Management Model Survey

The initial question any researcher needs to ask themselves at the outset of a data
gathering exercise is – “what is the repository into which the final data will be submitted?”
If this is done at the outset of a project then the appropriate standards can be adhered to so
that the data can be submitted at the end of the project in an appropriate format with
accompanying metadata.

The institutional repository represents a good option at the present time however the
future needs to considering purpose-built institutionally-backed data management
services based on requirements analysis.

Recommendation: The University of Sydney should consider purpose-built institutionally-


backed data management services based on requirements analysis.

7.3. Research Data Collections

With the large computing capacity now available on individuals desktops, through the
eResearch and ICT framework and various national and international initiatives there are
means available for a department to manage and safeguard this information. This involves
setting a number of policies or rules to follow so that data and information will flow
logically and efficiently within the department.

The data management models presented in section 9 of this document focus on managing
data collected now and in the future rather than consider how best to integrate the “mine”
of data which has been collected in the past. Once these management systems are set in
place however they will adequately serve to establish a project for the “rescue” of data
which falls outside the scope of this project.

It is critical that the geosciences department facilitates data intensive research with the
provision of analytical infrastructure. The issue of curating completed project's data
becomes much simpler if well documented datasets are established and utilised for
analysis during the project.

7.4. GIS and Specialised Research Data Collections

There are various disparate databases and data sets which reside within the School of
Geosciences. John Twyman (senior computing officer, School of Geosciences) has been
contacted by staff in ICT who are working on a project concerned with documenting the
major spatial datasets in use around the University. In doing so, ICT hope to identify
datasets which are in relatively common use across the University (eg. Census data) and
thus explore opportunities for reducing the costs involved in obtaining such data. Down
the line, ICT are also very interested in determining what role it can play in the hosting

28
Data Management Model Survey

and management of spatial data John Twyman is conducting an inventory of spatial data
sets held by the school of geoscience the information to date is presented in Appendix 4.

Ideally the spatial data sets will be integrated and managed in a Geodatabase on a server
platform however for this to occur spatial data collected by the department would need to
adhere to a standard. Data collected should meet recognised standards where they exist.
Minimum data collection standards ensure that data are stored digitally and with
maximum transferability. By using a geospatial database information can be integrated
and re-used readily for multiple purposes such as the Australian Beach and management
Program database developed by Andy Short of the department of Geosciences shown
below in Figure 9.

Figure 9. the Australian Beach and management Program database developed by Andy
Short of the department of Geosciences.

The recommendation on the standards required for spatial data are outside the scope of
this project however the United States have developed a structure which they have been
applying with a degree of success. The Federal Geographic Data Committee (FGDC) in the
United States provides an example of a model that works across institutions as it has
selected a standard, developed policies and a framework in which to operate. The Content
standard for digital geospatial metadata described by FGDC is shown in Appendix 5.

More about FGDC can be found at: http://www.fgdc.gov/

29
Data Management Model Survey

7.5. University of Columbia Data Catalogue Example

Currently there is no overall framework for managing or gaining access to the GIS data
and information within the School of Geosciences. An inventory can be used as a first step
in setting up access to the data in a catalogue system like the example from the Columbia
University Department of Earth and Environmental Sciences. Having access to datasets in
a catalogue at the point and click level can be a vey useful research tool for a department.
Appendix Columbia University Department of Earth and Environmental Sciences Data
Catalog: Datasets by Category.

More information is available at the University of Columbia site:


http://eesc.columbia.edu/courses/ees/data/index.html.

7.6. The Use of Discipline Specific Repositories

For large and complex datasets data should be archived with discipline specific
repositories or data centre. A researcher could identify an appropriate repository or
project which would take the type of data that they are collecting and assess the data
structure and requirement for submitting data to this facility at the outset of a project.

There are various international and national archives which are available to preserve
publically funded research data generated by scientists a small selection of these are
shown below in Table 4.

Table 4. Example of National and International Data Centre Repositories

National Data Centre Repositories International Data Centre Repositories


Sydney eScolarship Repository5 PETLAB6
Australian Government Antarctic US National Snow and Ice Data Centre8
Division7
Geosciences Australia9 US National Geophysical Data Centre10
Australian Oceanographic Data US National Oceanographic Data Centre12
Centre Joint Facility11

5
http://ses.library.usyd.edu.au/
6
http://pet.gns.cri.nz/
7
http://www.aad.gov.au/
8
http://www.smso.net/National_Snow_and_Ice_Data_Center
9
http://www.ga.gov.au/
10
http://www.ngdc.noaa.gov/
11
http://www.aodc.gov.au/
12
http://www.nodc.noaa.gov/

30
Data Management Model Survey

7.7. Submission of data with Journal Articles

Some journals encourage submission of datasets to their on-line repositories so if the data
is managed from the outset this could provide a mechanism to deal with data sets
associated with journal articles. This avenue could be further investigated by the
department. An example of this is GenBank.

Further information is available at: http://www.ncbi.nlm.nih.gov/Genbank/index.html

7.8. End-to End Data Management Plan

Researchers and students in the future are going to collect larger data sets due to the
nature of the emerging technology so it is very important therefore to provide training in
issues relating to data management including ethical along with technological issues.

To assist in the culture-change required to promote archival of research data and assets,
it’s important to target students and early career researchers. In order to prepare students
for the large amounts of data they will be exposed to in the near future it would be
recommended for departments to develop Data Management courses and manuals such
as those developed at the Australian national University.

Further information available at: http://ilp.anu.edu.au/dm/

A data management component should also be developed in association with field trips
on which data is collected. The field work component of existing courses could emphasise
standard and correct ways to describe samples temporally and spatially at the time of data
collection so as to ensure the integrity and interoperability of the data from the outset.
Data Collection forms could be designed along with procedural information which
standardises the recording of key data elements as shown below in Figure 10.

31
Data Management Model Survey

Figure 10. Data collection in the Field outlining standardisation of key data elements.

It has been recognised that many research datasets are either collected or generated
through the school laboratory. This is a key location where university technicians can
promote the use of the local disk-space, as well as the creation of associated metadata. It
may even be possible for the lab to disseminate research data directly into the students’
dataset sub-folders as shown below in figure 11.

Figure 11. Data from the laboratory is placed into the student/researchers dataset folder
for further analysis to streamline data handling.

32
Data Management Model Survey

Recommendation: Standardised data management practices should be adopted so that the


process of data management and archival can become as automated as possible.

Recommendation: Data management courses and field work components be developed


that flow through from first year to the final year to introduce the concepts of the standard
approach to data management adopted by the department.

8. Physical Assets

Shortcoming: Currently no local policy or procedures to document significant physical


assets held within the school.

• As discussed by John Twyman in Section 3 of this document - In terms of physical


assets, the school houses a map collection (~10,000) which is administered by a
volunteer once a week.
• There is also a poorly maintained and largely undocumented rock store.
• In the short term an inventory of this physical material should be developed so that the
school can keep track of its assets.
• The development of a management plan for physical assets is beyond the scope of this
project and will need to be addressed in the future.

Recommendation: A high level inventory is developed and maintained to keep track of


physical assets.

33
Data Management Model Survey

9. Data Management Models


9.1. Digital Thesis Department Management Model

9.2. Digital Thesis Institutional Repository Management Model

34
Data Management Model Survey

9.3. Geosciences End-to-End Data Management Model

Data is collected and


entered into a spreadsheet
in a standard fashion.

Data is processed in the


laboratory using standard
procedures.

Metadata is recorded in a
form using a standard set
of metadata elements.

Data is archived into a


database and made
available for re-use in
further research.

35
Data Management Model Survey

9.4. Geosciences Data Centre Management Model

In order to contribute data effectively to a data centre academic data collections need to be
well managed from the outset and put into standard formats with the relevant level of
descriptive metadata using the approach outlined in section 9.3.
An example of how data could be submitted from the University of Sydney to a data
centre or national project is shown in the diagram below. In this Model data will be made
freely available through a discovery portal to the general community.

Discipline Specific Data Centre Repositories

Australian Antarctic Division (ADD)


Bureau of Meteorology (BOM)
Geosciences Australia (GA)
Royal Australian Navy 9RAN)
Commonwealth Science Industrial Organisation (CSIRO)
Australian Institute of Marine Science (AIMS)

36
Data Management Model Survey

9. Conclusion

In the short term the better management of information at the individual researcher and
department level is essential in the large, distributed and complex nature of the university
environment. The evolving infrastructure needed to support research in the digital era will
become easier if standards and methods are agreed upon. However it is essential not to
prescribe restrictive or prohibitive data management practices which are inflexible.

In the short term it is advised to develop workflow practices which make the best use of
existing standards and technologies however to keep in mind how to incorporate the latest
technologies. By tackling information management on various levels: Theses, unpublished
documents, digital data and physical material, we can move forward by degrees without
impacting severely on staff workloads. By using the available systems and methodologies
outlined in this report data collected by the school of Geosciences will be available to take
advantage of new and emerging technologies.

Final state research data (including publications, datasets, multimedia, etc.) should be
archived with the institutional repository. The ICT are developing strategies to help with
the evolving infrastructure required for the management of large spatial data sets. The
very large or complex data should also be archived with the discipline specific
repositories.

As said by one researched in the Data Tables from the Data Management Practices Survey
“In a large, distributed and complex beast like [this university], the diversity of practice
across the wide range of research disciplines means that evolving the infrastructure
needed to support research in the digital era is not going to be easy. It is, however,
essential. We have to be prepared to make mistakes, to try things out and experiment. We
have to be very conscious of the broader framework in which we are working and
constantly try to reveal the deeper principles of practice. Consequently, we have to be
careful not to limit our research record management practices to what current technology
offers - as this will have changed during the life of the project. On the other hand we do
have to use the latest technologies to the best of our abilities to bring increased
productivity and services to researchers. We have just entered the 'Wright Brothers' phase
of the Digital era”.13

While all of the above recommendations are a matter of instilling a change in routine
practices, one of the most important steps forward in the process is the cultural change
that relates to the attitude that researchers have to collaboration and data sharing. The key
to the success in implementing these changes is collaboration.

13
Data Tables from the Data Management Practices Survey

37
Data Management Model Survey

APPENDICES
Appendix 1 - Dublin Core Metadata Fields used in the Sydney Digital
Theses Program
Full metadata record

DC Field Value Language


dc.contributor.author Whittaker, Joanne -
dc.date.accessioned 2008-12-11T01:12:08Z -
dc.date.available 2008-12-11T01:12:08Z -
dc.date.issued 2008 -
dc.identifier.uri http://hdl.handle.net/2123/3971 -
dc.description Doctor of Philosophy(PhD) en
Mid-ocean ridges are a fundamental but insufficiently
understood component of the global plate tectonic
system. Mid-ocean ridges control the landscape of the
dc.description.abstract Earth's ocean basins through seafloor spreading and en
influence the evolution of overriding plate margins
during midocean ridge subduction. The majority of new
crust created at the surface of the Earth is formed at …..
dc.publisher University of Sydney. en
dc.publisher School of Geosciences en
dc.rights The author retains copyright of this thesis. -
dc.rights.uri http://www.library.usyd.edu.au/copyright.html -
dc.subject Mid-ocean ridges en
dc.subject Submarine geology. en
dc.subject Sea-floor spreading. en
Tectonic consequences of mid-ocean ridge evolution
dc.title en
and subduction
dc.type PhD Doctorate en
dc.creator.email theses@library.usyd.edu.au en
dc.date.valid 2008 en
dc.thesis.advisor Dietmar Muller, en
Appears in
Sydney Digital Theses
Collections:

38
Data Management Model Survey

Appendix 2 - Managing University of Sydney Faculty of Science


undergraduate and postgraduate theses within the Library’s eScholarship
repository.

Managing University of Sydney Faculty of Science undergraduate and postgraduate theses within the
Library’s eScholarship repository. School of Geosciences. Draft for discussion.
Rowan Brownlee, University of Sydney Library Digital Project Analyst, 25 January, 2008

Introduction
This document describes systems used for managing undergraduate and postgraduate theses within the
School of Geosciences and notes issues of relevance to the proposal to develop a Library/Faculty of Science
service for managing storage, access, rights and permissions and digital preservation of undergraduate and
postgraduate theses. The document also includes Edwina Tanner’s1 thoughts on the proposed service.

Background
The School of Geosciences uses a spreadsheet to record information about undergraduate and
postgraduate theses. The spreadsheet contains 1600 records covering 1904 to the present and is managed
by an administrative assistant. The most complete information is available for more recent records. Most
theses are not in digital format. 300 thesis abstracts have been digitally scanned. The number of wholly
digital theses is proportionally small and most were created during the past 3-5 years. Hardcopy current
and historical postgraduate theses are held in the University of Sydney Rare Books Library, and
undergraduate theses are held in Madsen and Edgeworth-David. Many theses have accompanying data
files and databases, though not all data files are available. If available, digital copies of theses and datasets
are stored on School fileservers while databases2 reside on School database management servers.

Metadata elements contained in the School of Geosciences spreadsheet.

1. Author

2. Supervisor

3. Abstract

4. Title

5. Year

6. Number of pages

7. Degree program name

8. Subject keywords

9. ASRC codes

10. Sponsors

11. Physical location of hardcopy original

12. Digital (indicating the extent of digital holdings. E.g. ‘A’ = abstract)

39
Data Management Model Survey

Issues for consideration within Geosciences and other Faculty units


The following sections describe issues concerning relationships between departmental administrative
systems and the repository, sustainability issues for submission form configuration, requirements for batch
submission of accompanying data files and the role of the repository in providing a navigable interface to
data collections.
Thesis submission forms
Edwina favours a submission form containing a minimal set of commonly required fields. This type of form
would be suitable for describing undergraduate theses across the University. It would include at least the
first 11 fields listed on page 1 (and be very similar to the form used for submission of postgraduate theses3).

Edwina does not favour provision of discipline-specific submission forms containing navigable taxonomies
of controlled terms. Within a Faculty unit, the task of uploading theses would most likely be assigned to
administrative assistants rather than subject specialists. Edwina recommends application of ASRC codes.
Although administrative assistants would need to be trained in using ASRC, this would provide a single
classification mechanism for the University using a vocabulary adopted by the Research Office and DEST.

 Might record creation be a collaborative process between the Faculty and Library, with the Library
supplementing administrative information with classification codes/subject headings?

Batch submission of existing metadata and theses


Previous tests have indicated that existing metadata records, theses and their associated data files held by
Geosciences can be batch-transferred to the Library’s repository4.

 Other Faculty units may hold existing metadata in local records management systems. Where will
responsibility lie for assessing data transfer requirements and implementing and maintaining processes for
batch-transfer?

 Will there be an ongoing need to manage periodic batch-transfer of records from Faculty administrative
systems?

Managing submission of data collections


A single thesis may contain hundreds of accompanying data files5. If management of these files reflects a
repository role, creation of batch submission services will be required.

 Will the repository accept submissions of data files accompanying theses?

Presentation and navigation of research data collections within the Library’s repository
Theses may be accompanied by collections of data files which are organized in a directory structure defined
by the thesis creator. The following example is taken from a set of 640 data files and illustrates the
importance of the contextual information provided by the directory structure.
Digital_Thesis/AnalyticalFlow_SteinbergerModel/Images/DynamicTopography/dynatopo_006_9.85.ps
Digital_Thesis/AnalyticalFlow_SteinbergerModel/Images/DynamicTopography/dynatopo_007_5.35.ps
Digital_Thesis/AnalyticalFlow_SteinbergerModel/Images/DynamicTopography/dynatopo_005_12.65.ps
Digital_Thesis/AnalyticalFlow_SteinbergerModel/Images/DynamicTopography/dynatopo_qpmp.tar
The default repository interface is very limited in its capacity to provide a meaningful view of a data
collection’s structure. There is no facility to navigate or ‘drill-down’ a complex directory structure. The
interface simply provides a fully expanded display of directories, subdirectories, sub-sub directories … … …
and data files6.

40
Data Management Model Survey

41
Data Management Model Survey

Options for providing a meaningful view of data collections accompanying theses


More information is needed regarding the feasibility of incorporating a flexible presentation layer, either by
XSLT within the repository (using MANAKIN) or via API to external presentation tools. If it were feasible to
customise presentation, extensive use could be made of dataset collection records. Batch import of dataset
collections could include a simple process to generate an XML collection record based on directory
structure. This could be rendered via XSLT to provide a fully navigable directory structure. This would
enable the user to drill-down from a top level directory.

 To what degree is it the role of the repository service to provide user interface tools? How would such a
service be resourced and sustained?
______________________________________________________________________________________

1 Edwina Tanner, School of Geosciences. http://www.geosci.usyd.edu.au/people/st_tanner.shtml


2 The issue of database preservation will most likely be addressed as a separate activity. Repository records should
contain information about the location of an associated database.
3 University of Sydney Library postgraduate theses collection. http://www.library.usyd.edu.au/theses/index.html
4 A collection of theses and data files is available on the Library’s development server at
http://sirius.library.usyd.edu.au:8080/handle/123456789/205
This sample record contains entries for 640 accompanying data files.
http://sirius.library.usyd.edu.au:8080/handle/123456789/576
5 This example contains 640 accompanying data files. http://sirius.library.usyd.edu.au:8080/handle/123456789/576
6 The example illustrates the difficulties in making sense of a collection of data files.
http://sirius.library.usyd.edu.au:8080/handle/123456789/576

42
Data Management Model Survey

Appendix 3 – eScholarship Copyright Release Form

43
Data Management Model Survey

Copyright Release - eScholarship Repository


Before Work can be included on the eScholarship Repository, Contributors
need to agree to the terms of this Release
____________________________________________________________________________________________
By this License, the Contributor, for the benefit of the (c) they have obtained consents in writing from third
University, grants the University following rights. parties which have any materials reproduced in the
Work to publish the Work;
1. Definitions (d) they can grant the rights under this License and the
University’s exercise of those rights will not infringe
Contributor means the author.contributor identified in the copyright or other intellectual property rights of
the Sydney eScholarship Repository Metadata. third parties;
(e) to the best of their knowledge, the Work is accurate
eScholarship Repository Metadata means the metadata as at the date in which the final version of the Work
encoded in the uploaded Works by the Contributor when is submitted to the University and as far as
accessing the Sydney eScholarship Repository. reasonably possible they have sought to verify all
statements in the Work which purport to be true and
University means The University of Sydney acting accurate;
through Sydney eScholarship Repository, a body (f) to the best of their knowledge, the Work does not
corporate under the University of Sydney Act 1989, ABN contain any scandalous, defamatory, or obscene
15 211 513 464, of University of Sydney Library F03 material or any material which is actionable for
University of Sydney, NSW 2006 Sydney NSW 2006. interference with privacy, infringement of copyright,
breach of confidence, passing off or contravention of
Work means the works listed [as “Titles” in the Sydney any other private right; and
eScholarship Repository Metadata/in the Schedule to this (g) they have not engaged in any practices in preparing
License]. the Work that would amount to plagiarism or any
other form of academic dishonesty or research
misconduct under University policies and rules or
2. Licence which would (or would be likely to) bring the
The Contributor grants the University the non-exclusive Contributor or the University into disrepute, and that
perpetual license to reproduce and communicate the Work they have complied with the University’s policies,
to the public via the Sydney eScholarship Repository and, procedures and rules.
without changing the content, to translate the Work to any (h) where the work is a thesis, it is a direct equivalent of
medium or format for the purposes of preservation, the final officially approved version that was
research and study provided such use is not for a submitted, and no emendation of content has
commercial purpose, The Contributor also agrees that the occurred other than minor variations in formatting,
University may keep more than one copy of the Work for that are the result of the conversion to digital format.
the purposes of security, backup and preservation.
7. Breach of warranty
3. Attribution The Contributor agrees to:
The eScholarship Repository will clearly identify the
Contributor as the author of the Work. (a) notify the University as soon as they become aware
of any circumstances relating to the breach or
4. Acknowledgements potential breach of a warranty in clause 6;
The Contributor acknowledges that: (b) allow the eScholarship Repository Coordinator to
take any action to manage the University’s exposure
(a) they will not receive any payment from the to such liability;
University for the grant of rights under this License; (c) provide the University with all reasonable assistance
(b) the Work is subject to the approval of the University in relation to the conduct or defence of any legal
and may not be accepted to the eScholarship proceedings which may be commenced by or against
Repository; the University in relation to the breach of a warranty
(c) the University may remove the Work from the in clause 6; and
eScholarship Repository at any time at its absolute (d) indemnify the University against any actions, costs
discretion; and or expenses arising out of the breach of a warranty in
(d) they have no termination rights under this License. clause 6.
5. Standard of work 8. Jurisdiction
In order for Work to be accepted to and remain on the The Contributor agrees that this License is governed by
eScholarship Repository, the Contributor acknowledges the law of New South Wales, submits to the non-exclusive
that: jurisdiction of the courts in New South Wales and waives
(a) the Work is academic and postgraduate (unless any right they have to object to an action being brought in
Work is an Honours Thesis or is otherwise approved those courts (including by claiming that the action has
by the University in writing); and been brought in an inconvenient forum or that those courts
(b) text material submitted is final draft or published do not have jurisdiction).
version, and non-text material submitted is in its
final form. EXECUTED as an Agreement on the terms above
by the CONTRIBUTOR
6. Warranties
The Contributor warrants that:
……………………………………………………..
(a) the Work is their original work; Signature
(b) they have obtained consents in writing from all 44
previous publishers of the Work to enter into this ………………………………………………………
License; Printed Name A
Data Management Model Survey

Appendix 4 – Example of Metadata Required to Describe a Dataset

Information about the dataset custodian


Contact
Name
Department / Location
Phone Number
Email Address
Dataset Details Description
Title Descriptive title describing dataset held
Abstract Brief description of the Dataset and purpose of collection
Type of Data Collected Temperature, Salinity, Sea Floor, Biological, Habitat
Digital or non-digital or the media type e.g. CD, Tape, paper,
Data Format hard disk.
Temporal Coverage Beginning and end date of data collection
Geographic Coverage Bounding Coordinates of data set or place name
Restrictions on use of data by research community or general
Access constraints to data public
References Papers or reports published referring to the dataset
Online Links Any online link to the data
Dataset Details
Title
Abstract

Type of Data
Data Format
Temporal Coverage
Geographic Coverage
Access constraints to data
Papers Published using the
data
Online Links

Note: This form outlines the minimum set of metadata elements required to describe
a dataset.

45
Data Management Model Survey

Appendix 5 – Spatial Data Sets held in the Department of Geosciences


(Information provided by John Twyman)

46
Data Management Model Survey

Appendix 6 –– Sample FGDC Content Standard


(http://www.fgdc.gov/metadata/csdgm// )

The following is the recommended bibliographic citation for this publication: Federal
Geographic Data Committee. FGDC-STD-001-1998. Content standard for digital geospatial
metadata (revised June 1998). Federal Geographic Data Committee. Washington, D.C.

47
Data Management Model Survey

Appendix 7 – University of Columbia – Data Catalogue


Earth's Environmental Systems
[http://eesc.columbia.edu/courses/ees/data/index.html]
Categories

• Atmospheric Data
• Oceanographic Data
• River Data
• Solar Radiation Data
• Paleoclimate Data
• Geographical Data
• Geological Data
• Miscellaneous Data

Atmospheric Data

• Atmospheric Measurements
o OBERHUBER latent heat flux (January)
o COADS Monthly Average Air Temperature (December 1980)
o COADS Monthly Average Specific Humidity (December 1980)
o COADS Meridional Wind Velocity (January 1981)
o COADS Zonal Wind Velocity (January 1981)
o COADS Wind Vectors (January 1981)
o COADS Wind Speed (January 1981)
o HANSEN Global Surface Temperature (for the last century)
o Keeling Mauna Loa CO2 (Carbon Dioxide)
o LEGWIL Precipitation (January)
o OORT Humidity (January)
o OORT Temperature (January)
o OORT Meridional Wind Speed (January)
o OORT Zonal Wind Speed (January)
• Oklahoma Weather Station Radiosonde Data
o Precipitation Timeseries
o Temperature Timeseries
o Pressure Timeseries
o Mixing Ratio (Relative Humidity) Timeseries
o Wind Direction Timeseries
o Wind Speed Timeseries
o Meridional (North) Component of Wind Velocity Timeseries
o Zonal (East) Component of Wind Velocity Timeseries
• Weather Information Centers
o National Huricane Center Coastal Watches and Warnings
o Space Sciences and Engineering Center (U. Wisconsin)
o WXP Weather Processor (Purdue)

Oceanographic data

• Annual
o LEVITUS Oxygen (0 meters depth)
o LEVITUS Phosphate (0 meters depth)
o LEVITUS Salinity (0 meters depth)
o LEVITUS Temperature (0 meters depth)
• Monthly
o LEVITUS Salinity (0 meters depth)
o LEVITUS Temperature (0 meters depth)
o COADS Average Sea Surface Temperature (December 1980)
o COADS Atmospheric Presssure at Sea Level (December 1980)

48
Data Management Model Survey

o GOSTA Sea Surface Temperature Anomaly (March 1979)


o IGOSS Sea Surface Temperature Anomaly (January 1982)
o IGOSS Sea Surface Height (February 1995)
• Seasonal
o LEVITUS Oxygen (0 meters depth)
o LEVITUS Salinity (0 meters depth)
o LEVITUS Temperature (0 meters depth)
• El Nino
o El Nino NINO3 Index Function
• Web Sites of Interest
o El Nino Theme Page (NOAA)

River Data

• Australian River Discharge


• River Chemistry Table (tab-separated values)
• Web Sites of Interest
o Surface Water Data Retrieval Service (USGS)

Solar Radiation Data

• Absorbed Solar Radiation (January and July)


• Emitted Thermal Radiation (January and July)
• Net Radiation (January and July)

Paleoclimate Data

• VOSTOK Ice Core


o CH4 (Methane)
o CO2 (Carbon Dioxide)
o Deuterium
o Dust
o Gas Age
o Ice Age

Geographical Data

• Tiger Map Browser (US Census)

Geological Data

• ETOPO5 World Topography and Bathymetry


• Earthquakes of the World
• Earthquakes of the World, depth
• Lithosphere Earthquake [ lon , lat , depth ] 1
• Lithosphere Earthquake [ lon , lat , depth ] 2
• Lithosphere Earthquake [ lon , lat , depth ] 3
• Indian Ocean Seismic Profiles
• Ocean Sediment Thickness
• Volcanos of the World
• Web Sites of Interest
o Mid-Ocean Ridge Multibeam Synthesis Project (LDEO)
o Current Earthquake Information (USGS/NEIC)
o Historic Earthquake Data (USGS/NEIC)
o Volcano World

Miscellaneous Data

• Radionuclides Chart

49
Data Management Model Survey

Appendix 8 - Summary of Recommendations

Recommendation: Promote the use of Sydney Digital Theses through various channels,
and identify and endorse a suitable digital repository to house Honours theses.

Recommendation: That the department advise or require all their students to submit a
final copy of their PhD and/or Masters theses using the digital thesis facility. This can be
supported by the post-graduate coordinator and promoted through the department
handbook.

Recommendation: That a representative from the digital thesis project provide additional
seminars to the post-graduate students at strategic times during the semester so that the
students are more aware of this facility and the copyright and intellectual property issues
associated with using this repository.

Recommendation: It is recommended that information about the Sydney Digital Thesis


program be outlined in all Postgraduate handbooks throughout the university.

Recommendation: Someone within the department is appointed to maintain an inventory


and ensure the effective management of all the departmental theses and in particular
honours theses.

Recommendation: That someone within the department be identified to ensure that all
theses and in particular Honours theses are safeguarded and made accessible through a
computer based management system.

Recommendation: Amend school of Geosciences website so Sydney Digital Theses is


promoted under the postgraduate banner.

Recommendation: Students to receive information leaflet about the Sydney Digital Thesis
Program on enrolment in postgraduate studies. This leaflet should cover copyright issues
and recommend that they digitally archive their thesis and datasets on completion of their
study.

Recommendation: The department insist on the submission of a digital copy of all theses,
including where appropriate, one copy to the Digital Thesis project and one copy to the
department.

Recommendation: The department creates a master CD or DVD annually of all theses


submitted by students for that year and includes a copy of the updated metadata
(spreadsheet) on the media.

Recommendation: The department maintains an on-line backup of the information in a


shared area on the network as CD and DVD media formats aren't for forever.

50
Data Management Model Survey

Recommendation: All students should sign a Copyright release form and the department
should manage/archive the signed release forms in association with the thesis and data
sets.

Recommendation: If students fail to submit to Sydney Digital Theses after a period of ~1yr
then the Department should digitally archive material to the repository on behalf of the
student (if appropriate permissions have been granted).

Recommendation: A section in the student handbook is included on appropriate data


management practices and expected behaviour.

Recommendation: Identify and promote a suitable repository through which unpublished


research material can be documented and circulated, encouraging discussion and debate.

Recommendation: Develop a policy in which all data collected using public funds are
required to be submitted to an appropriate repository.

Recommendation: The University of Sydney should consider purpose-built institutionally-


backed data management services based on requirements analysis.

Recommendation: Standardised data management practices should be adopted so that the


process of data management and archival can become as automated as possible.

Recommendation: Data management courses and field work components be developed


that flow through from first year to the final year to introduce the concepts of the standard
approach to data management adopted by the department.

Recommendation: A high level inventory is developed and maintained to keep track of


physical assets.

51
Data Management Model Survey

REFERENCES

Australian Government (2007) Revision of the Joint NHMRC/AVCC Statement and


Guidelines on Research Practice, Australian Code for the Responsible Conduct of Research
jointly issued by the National Health and Medical Research Council, the Australian
Research Council and Universities Australia. http://www.nhmrc.gov.au/index.htm

Australian National University (2008) ANU Data Management Manual: Managing Digital
Research Data at the Australian National University, Information Literacy Program,The
Australian National University, Document Version 1.03, August 15, 2008

Brownlee, Rowan (2008) Unpublished Communication – Managing University of Sydney


Faculty of Science undergraduate and postgraduate theses within the Library’s
eScholarship repository. School of Geosciences. (Draft for discussion)

Columbia University Department of Earth and Environmental Sciences


http://eesc.columbia.edu/courses/ees/data/index.html

Federal Geographic Data Committee - http://www.fgdc.gov/

Fitzgerald, A., Pappalardo, K. and Austin, A. (2007) Understanding the legal implications
of data sharing, access and reuse in the Australian research landscape. Chapter 7 In:
Building the Infrastructure for Data Access and Reuse in Collaborative research: An
analysis of the Legal Context, Oak Law Project. http://www.oaklaw.qut.edu.au/reports

Henty, Margaret (2007) Data Tables from the Data Management Practices Survey,
http://hdl.handle.net/1885/47108

Tanner Edwina (2007) Data Management and Researcher Support Presentation, eResearch
Forum 1 November 2007, The University of Sydney.

University of Oxford (2007) Scoping Digital Repositories Services for Research Data
Management - A Project of the Office of the Director of IT

University of Utah - University Institutional Data Management Policy.


http://www.regulations.utah.edu/it/4-001.html

Wikipedia (2007) Institutional Repositories


http://en.wikipedia.org/wiki/Institutional_repository

52

You might also like