Professional Documents
Culture Documents
Tanner Data Management Model Survey
Tanner Data Management Model Survey
Tanner Data Management Model Survey
by
Edwina Tanner
Andy Myers
Shaun Kim
Mark Richards
December 2008
This project was funded by the Australian Partnership for Sustainable Repositories
Data Management Model Survey
Table of Contents
APPENDICES....................................................................................................................38
Appendix 1 - Dublin Core Metadata Fields used in the Sydney Digital Theses Program
.......................................................................................................................................38
Appendix 2 - Managing University of Sydney Faculty of Science undergraduate and
postgraduate theses within the Library’s eScholarship repository. .................................39
Appendix 3 – eScholarship Copyright Release Form...................................................43
EXECUTIVE SUMMARY
The value of institutional data is increased through its widespread and appropriate use; its value
is diminished through misuse, misinterpretation, or unnecessary restrictions to its access.1
An increasing amount of digital objects are being produced on a daily basis by researchers at the
University of Sydney. These materials include journal articles, theses, images and datasets. The
University increasingly has the responsibility for providing services to store, curate, disseminate
and preserve outputs from these research activities.
Departments, faculties, research groups and individuals are responsible for the creation, reuse and
management of research outputs, including data, in conformance with both internal and external
policies. Ready access to data, whether as part of a collaboration, for wider reuse, long-term
preservation, or to meet security or ethical obligations, is increasingly important. The amount and
complexity of digital objects produced or used within a research context is growing beyond the point
at which a single project or unit can be expected to provide the supporting infrastructure.2
In the competitive field that research is becoming the key to success is collaboration and being able
to efficiently find and use quality data which is ready to be assimilated into a project. In striving for
a “handle once, use many times” approach to data management there needs to be standards and
policy guidelines issued which are balanced with technical requirements and time constraints of
academics and researchers.
With the loss of data curator positions and a lack of rigid workflow guidelines or policies within a
University department the lines of responsibility for data management duties have become blurred
resulting in loss of data and duplication of effort. In order to implement best data management
practices within a department we examined the structure and workflows of existing systems so as to
exploit their maximum potential and streamline activities.
Using the school of Geosciences as an example this survey describes a data management model
which identifies how the valuable documents and data created by a University department can be
managed effectively using existing systems through the implementation of rules and guidelines
along with workflow strategies.
1
University of Utah Institutional Data Management Policy
2
Scoping Digital Repositories Services for Research Data Management
Data Management Model Survey
Research policies are required that address the ownership of research materials and data,
their storage, their retention beyond the end of the project, and appropriate access to them by
the research community. 3
The responsible conduct of research includes the proper management and retention of the
research data. Retaining the research data is important because it may be all that remains of
the research work at the end of the project. While it may not be practical to keep all the
primary material (such as ore, biological material, questionnaires or recordings), durable
records derived from them (such as assays, test results, transcripts, and laboratory and field
notes) must be retained and accessible. 4
3
Australia Code for the Responsible Conduct of Research
4
Australian Code for the Responsible Conduct of Research
1
Data Management Model Survey
By conducting a “survey” of the key data handlers within a department the range of
data assets including the publications and datasets created and/or owned by the School
of Geosciences has been identified and how they are managed reviewed. The survey
has also reviewed the structure and workflows of existing systems within the
University of Sydney such as RIMS, HERDC, the library’s eResearch and Digital Thesis
initiatives to determine how the department can best take advantage of these and other
evolving data auditing systems. The data management strategy outlined in this
document has been developed based on the results of this research.
One of the main results of applying this audit strategy has been to identify a mechanism
to streamline data asset management and delivery to an identified repository so that it
can be safeguarded as well as copyright protected. This pilot project has been
conducted using the School of Geosciences, who provided the initial data to test the
model. This School has been selected as a case study to support the work conducted by
an earlier ad-hoc project to manage the School’s theses.
Apart from the technical issues discussed in this document there are also a number of data
policy issues that a department will need to addressed in order to effectively deploy a data
management model. Data management initiatives need to be supported and followed
through with policies which govern how the data collected by a department is to be
managed. Data licensing agreements should be developed to cover issues of security,
confidence and ownership of intellectual property. There will be some situations where it
is not appropriate to make information available and such data will be kept confidential.
This case study is the first step in what may be adopted as an accepted strategy for data
management within a university department. This strategy will ensure that the data
assets of a department are readily available and poised to take advantage of emerging
technologies and the valuable data collected by researchers is backed up in an
2
Data Management Model Survey
appropriate archive for safeguarding and can be made readily available for re-use in
additional research projects.
The objectives of the model are dependant on the needs of researchers, students and
administrative staff within the department. Understanding and appreciating the desires
and constraints on these users is pivotal not only for constructing the model, but also to
ensure its long-term use and viability.
Once implemented the management model will have a secondary benefit of promoting the
School of Geosciences within the university, and plausibly the wider higher education
sector. This will result from increased exposure, which in turn may promote collaboration,
resource sharing, whilst at the same time reducing the potential for research duplication.
Sydney eScholarship is built upon the Library's recognised expertise in creating, archiving,
managing, publishing and providing access to digital content, working in partnership
with other University services. Storage and access of data are managed by the systems:
Sydney eScholarship Repository, The Sydney Electronic Text and Image Service (SETIS)
digital collections, Sydney Digital Theses. This report will focus on the eScolarship
repository and Sydney Digital Theses Program.
3
Data Management Model Survey
The Sydney Digital Thesis Project is a component of the Australian Digital Thesis
Program (http://adt.caul.edu.au/ ). Its aim was to provide the necessary framework and
operational infrastructure to enable the creation of a national database of postgraduate
theses produced at Australian universities. There are currently 28 full members of the
program which provides access to over 1500 postgraduate theses. The theses are provided
in Portable Document Format (pdf) from individual university's collections, and are
searched at a central database consisting of Dublin Core metadata provided at the time of
lodging the thesis. Appendix 1 describes the Dublin Core metadata fields used in this
project. This project accepts PhD and Masters (Research) theses exclusively.
In March 2006 the Research Office purchased the Integrated Research Management
Application (IRMA) as an interim system to capture publication data for the Higher
Education Research Data Collection (HERDC) and later for the Research Quality
Framework (RQF).
4
Data Management Model Survey
The purchase of IRMA met some of the Research Office’s immediate operational needs,
but provided a tactical rather than a strategic solution to the challenges of research
management.
All research staff and affiliates of the University must ensure that they participate in the
HERDC. For the department of Geosciences this information is collected on a monthly
basis to ensure publications are submitted to the Coordinator in a timely fashion.
Previously these were submitted on an annual basis or directly to the Research Office
which left margins for omissions, errors and inefficiencies.
These efforts directly affect the amount of funding made available to the University for
research activities so there is great deal of motivation for academic staff to submit this
information. In the future funding allocation may be based on Excellence in Research for
Australia (ERA) initiative, instead of HERDC.
The Higher Education Research Data Collection comprises research income and research
publications data submitted by universities each year. Returns are due by the end of June
every year.
RIMS is an enterprise system, comprising a single, central data repository for research
administration information that is accessible to staff across the University and affiliated
institutions via a user-friendly web application.
This will remove the need for the Research Office and faculties to maintain multiple,
stand-alone data repositories (hard copies, electronic documents, spreadsheets, FileMaker
databases etc) containing duplicate and inconsistent data.
The term RIMS is now used to describe just the system component of the wider Research
Management Program. Research Management Program (RMP) is a joint business
improvement initiative, run by ICT in partnership with the University’s Research
portfolio.
The overall objective of the program is to formulate and deliver efficient and effective
research management and administration capability for the University. The Research
5
Data Management Model Survey
Management Program needs to deliver more than a new system however to achieve real
and lasting improvements.
Apart from RIMS, the program comprises three other linked components – processes,
culture/behaviour and organisation design – all of which are essential to the delivery of
enhanced University-wide research management capability.
• influence national policy in the area of data management in the Australian research
community
• inform best practice for the curation of data
• transform the disparate collections of research data around Australia into a
cohesive collection of research resources.
The long term (ten year) objectives for data management within the Australian National
Data Service (ANDS) are to:
6
Data Management Model Survey
A number of key information managers within the school of Geosciences were surveyed.
Instead of conducting a structured survey of many staff a few key individuals were
selected for a more in depth approach. A summary of their responses is outlined below.
7
Data Management Model Survey
• Asserted that having access to a database or inventory of research data and assets that
had been collected previously, would be a tool of great benefit. This would save
valuable time and make efficient use of university resources. “Time is one of the most
precious things that a student has.”
• Any database or inventory would also benefit supervisors with any newly created
research assets feeding back into the system. The volume of data would grow over time
and prevent research duplication.
• Current data management practice involves backing up research data on a hard-drive
and also to DVD. Considering the work being undertaken is contributing to a larger,
long-term dataset, it is hoped that the information will continue to be managed and
used within the department.
• It was thought that documenting this long-term dataset may pose some interesting
questions. The dataset has been collected over a period of 15-years by a number of
8
Data Management Model Survey
researchers and students, and so identifying all the owners of intellectual property may
be difficult. Furthermore, because of advances in software and technologies, the
custodian of the dataset may not be up-to-speed with the best way to present the
information.
• It was suggested that good data management practice and data repositories should be
introduced during undergraduate and honours. Such knowledge should be acquired
prior to commencing postgraduate study.
Human Geography 7 15 10
Physical Geography 5 6 8
Human Geography 18 17 18
Physical Geography 11 10 11
9
Data Management Model Survey
The School of Geosciences is currently host to over 1700 honours, MSc and PhD theses
most of which are in hardcopy format. This represents approximately 3400 years of
valuable research (average ~2 years research per thesis) sitting on the shelves with limited
access. A spreadsheet is maintained to provide an inventory of entries from 1904 to the
present which is managed by an administrative assistant. In many cases particularly in the
case of honours theses this represents the only copy of this information available
anywhere. Digital theses are available on an ad-hoc basis from 2003 and most of these
theses have accompanying data files. These are currently stored on CD-ROMs.
The Geosciences post graduate coordinators advice to postgraduates was mailed out to
students at the end of the second semester as follows:
“I would like to make you aware of the Sydney Digital Theses Project and encourage you
to submit a final version of your MSc and PhD theses to this repository. I think that you
should put your these into pdf format anyway, since it is much easier for you to distribute
and preserve it. In the past, my students theses have been cited in journals and
government reports many time, only because the thesis was distributed digitally”
10
Data Management Model Survey
management services will enable researchers to store, update, and share access to data
with colleagues and link to collaborative and analytic tools.
• Future developments in workflow will also enable 'curation by stealth' (automated
capture and management of metadata). Over time the Library will develop systematic
relationships with such services to enable archival transfer of selected collections.
Policy and service development regarding the scope of Library involvement in data
archiving will be informed by the Australian National Data Service (through their
gradual development of data management reference models). It will also be guided by
future versions of research assessment exercises which will at some stage include
consideration of data sets as research outputs (underpinned by review processes and
agreed archiving models).
• Exploratory work relating thesis data collections and the Library's repository prompted
development of options and consideration of service implications regarding
management of research data and metadata within the Library's repository. A
metadata management options paper is available online at
http://escholarship.library.usyd.edu.au/dpa/meta.shtml. This work is further
developed and placed within the context of Library roles and relationships regarding
data management and eResearch support in a paper in an upcoming April 2009 feature
issue of Cataloging and Classification Quarterly.
• The Digital Thesis Program at the University of Sydney is part of the Australian
Digital Thesis (ADT) Program (http://adt.caul.edu.au/). The digital thesis project is
gaining momentum however at this stage it is only capturing ~40% of theses output
from the university.
• There are a number of departments including economics and business, history and
health sciences who currently use the eResearch repository for managing parts of
their honours theses collection.
• The technical drawback to the use of this repository for managing the entire
honours theses collection is that this is a public access facility which can not easily
manage a closed collection.
• According to the geosciences post graduate coordinator “the fundamental
difference in making an honours theses available to the general public versus a
Masters or PhD is that the honours thesis has not undergone the same rigorous
review process to ensure the standard of the final product”.
• Thus as much as an institutional repository would be the ideal storage and access
facility for this research output giving the student some protection against
plagiarism and copyright it has the potential to expose “unpolished” products.
11
Data Management Model Survey
• The work around being used by the departments is that only the best honours
theses are selected for publication in this repository. This leaves the remaining
large percentage of the theses which still need to be considered and managed
through another workflow stream.
• The desired platform for theses submissions would result from the university
acceptance of mandated on-line submission for all theses. For this to occur students
would require further support with respect to formatting and copyright issues and
workflows need to be established to accommodate this.
This figure illustrates the typical, simplified workflow through to publication and
highlights the strong focus on journal publication and traditional research outputs (book
chapters, books, journals and conference publications). As this figure illustrates at no point
does the researcher reveal the details of data or assets that they collect during the course of
their research to others within the department.
12
Data Management Model Survey
Although there is the option of archiving digital data with the university’s institutional
repository, this is rare, with researchers instead relying largely on their own back-up
procedures, or using departmental scratch disks. Researchers typically have the option of
submitting data to a repository of their choice, however this in not overseen or regularly
undertaken. Digital repositories, such as E-scholarship, also have the limitation of not
being able to manage hard material.
All researchers create working papers, or drafts of articles, prior to publication. The
management of such ‘works in progress’ will vary considerably, as does the exposure of a
paper for peer comment prior to publication. These working papers may facilitate
discussion and debate on a topic, and provide impetus for future collaborations. Within
the school, there are currently no active channels for the circulation of working papers, or
grey material. The exchange of ideas and feedback appears to occur primarily through
verbal communication and electronic mail.
Figure 2. Schematic work-flow of the current situation for reporting of PhD and
Masters (research) students research outputs. (E-scholarship is representative of an
institutional digital repository)
The primary focus for students undertaking postgraduate studies by research is the
creation of a thesis. The subsequent publication of journal articles as a result of the
13
Data Management Model Survey
research is celebrated, but is not a prerequisite for graduation. The process of undertaking
research and collecting / gathering data unearths the same pitfalls that befall researchers.
Although students have the option of archiving digital data with the university’s
institutional repository and / or external, discipline-specific repositories, this is rare,
instead relying largely on their own back-up procedures, or using departmental scratch
disks. As indicated through interview with the senior computer systems officer, students
often neglect to manage information placed on department scratch disks, and this will
eventually be wiped as a result of inactivity and non-communication. Unless the student
decides to archive data in the E-scholarship repository or some other identified archive,
the raw research data collected in conjunction with the theses will be removed from
circulation with the student upon completion of a thesis.
Current policy regarding the submission of a PhD and Masters (Research) theses requires
that two hardcopies of the document are presented to the institution. One is to be archived
in the department compactus, while the other is submitted to the principal University
Archive (Fisher Library at the University of Sydney). This second copy resides in the rare
books collection and can be searched online by title or author. The contents however can
only be accessed physically, by visiting the archive and requesting the thesis. Very little, if
any documentation is recorded at the department level when a thesis is submitted to the
local compactus.
As intellectual property rights sit with the student, there is no-mandatory requirement
that the document be digitally archived with Sydney Digital Theses. This is a
recommendation that in many instances is overlooked by euphoric students relieved at
having survived the trials and tribulations of study. Despite the option to use the digital
theses facility being available to all postgraduate students, only a very small percentage
(~1% from the School of Geosciences) of students currently submit their thesis this way.
These figures indicate that the repository is massively underutilised, despite having the
potential to collate and freely market the major research output of students.
14
Data Management Model Survey
With a similar focus to PhD and Masters (Research) students, the primary aim of Honours
students is the creation of a thesis. Any publications that stem from the research are
applauded, but not necessary for graduation. As with researchers and postgraduate
students, a number of repositories are available for the submission of raw data, both
institutional based and external, discipline-specific archives. Following the endemic trend
in the higher education system, the active archival of raw data is negligible, with emphasis
remaining firmly on traditional research outputs (theses for students, publications for
researchers).
The submission policy for honours thesis differs significantly to that for PhD and Masters
(Research) students. Only one hardcopy of the thesis is submitted, which is archived in the
department compactus. Very little documentation is recorded when such archival occurs
at a local level. A hardcopy is not submitted to the main University Archives, nor is an
electronic copy permitted to be submitted to the Sydney Digital Theses repository.
Archival through the Sydney Digital Theses repository is not supported due to the limited
15
Data Management Model Survey
vetting that the document receives compared to a postgraduate thesis. This reveals an
inadequate situation therein the primary research output of a years endeavour is poorly
documented and filed away to gather dust. The thesis is effectively removed from
circulation upon a student’s graduation. As an interim, partial solution to this
Shortcoming, the School of Geosciences has recently started collecting CD-ROMS of
theses. One is retained by the department, whilst the second is provided to the students’
supervisor.
Ideally the management of these valuable documents would come under the realm of a
data curator. In some departments an administrative position is responsible for this task.
One mechanism which provides an immediate solution to mange theses from this point in
time to the future is through the formalisation of an honours thesis repository between the
institutional repository (e.g. eResearch project) and the department.
If this is done this valuable information can be made available on-line for department and
wider use using a computer based management system.
As identified in the previous sections, there are notable shortcomings in the management
of research assets within the School of Geosciences. These shortcomings are not limited to
a particular demographic group, as are associated with researchers (section 3.1), PhD and
Masters (Research) students (section 3.2) and Honours students (section 3.3). Some
deficiencies span all three groups (e.g. the management of raw research data), whereas
others are targeted based upon the primary research output (e.g. theses created by
students). Broadly speaking, the shortcomings can be classified into four target areas.
i) Digital Theses:
Under-utilisation of Sydney Digital Theses, and no digital theses
repository available for Honours students
16
Data Management Model Survey
Recommendation: Promote the use of Sydney Digital Theses through various channels,
and identify and endorse a suitable digital repository to house Honours theses.
This proposal considers a two-pronged approach, which meets the digital archival needs
of research students, and those desiring future access to the documents. Whilst this
solution cannot guarantee the digital archival of theses, by providing the facilities, and
endorsing the use of these amenities, it is hoped that a greater percentage of students will
archive these research assets.
Within the department it is feasible that the initial management of student theses and data
could be standardised across all levels of research. This consistency of management
practices will be uniform up until the point of final submission. It is proposed that every
student enrolled in Honours, PhD or Masters (Research) programs within the school be
provided with a networked, password–protected, online folder in which they can manage
draft material relating to their research. Subfolders may be used to distinguish between
the body of the thesis and datasets (both raw and processed), as illustrated in figure 4. The
importance of this distinction between the body of the thesis and the datasets relates to the
management of these resources following final submission, as discussed later in more
detail.
17
Data Management Model Survey
As shown in the above figure, within the thesis body sub-folder resides a distinct metadata
template forms. The metadata form in the thesis body folder outlines metadata fields that
are consistent with those that are required for a submission to the Sydney Digital Theses
repository as outlined in Appendix 1.
One benefit of creating these folders is that it provides the students with a designated
storage space on the local network. They can use this space to back-up material on a
regular basis, supplementing other customs such as saving to USB or portable hard-drives.
The key benefit of this system however, is that if managed and implemented correctly, it
allows the department to retain a digital copy of every student’s thesis, the associated
datasets and all metadata describing these research assets. The success of this proposed
system relies on the implementation strategy and the incorporation of appropriate
departmental procedures.
Preparation of metadata relating to the body of the thesis should occur after the thesis has
been completed, but prior to submission. The nature of the metadata fields documenting
the body of the thesis, make it impractical for the author to complete the form prior to this
time.
18
Data Management Model Survey
When a student hands in a hard-copy of their thesis to the department (as is required by
all PhD, Masters (Research) and Honours students), it is suggested that whoever takes
deliver should spend a short-time with the student verifying networked material. They
should ensure that a final version of the thesis is available, along with the completed
metadata, as well as key datasets similarly documented. All draft and superseded material
should then be deleted from the scratch disk.
Discussions with the School’s Senior Computer Systems Officer have indicated that
presently many students place items on the department scratch-disk, which they neglect to
delete upon graduation. Those that administer the system intermittently ‘clear-out’ old
information, uninformed of the importance of certain material. This is not a criticism of the
IT personnel, but of those who place information in the system and neglect to manage it
effectively over a longer period of time. The proposed checking of networked space, not
only ensures that the department retains digital copies of research assets, but also
functions to clear out cumbersome material of no prospective value.
Management of material after this point depends on the level of study that has been
undertaken. Management of PhD and Masters (Research) theses which have undergone
rigorous peer review differs significantly to the management of Honours theses.
As discussed in section 2, the university already has an established repository for PhD and
Masters (Research) Theses, the Sydney Digital Theses Project. As students typically own
the intellectual property rights to their thesis, submission to this repository is not a
mandatory requirement. It is however strongly recommended to students graduating
from the university with the necessary level of qualification. It is very important that all
postgraduate students be made explicitly aware of the repository and the benefits of
archiving their thesis.
At a local level within the department, the repository should be promoted through various
channels and should be actively endorsed by supervisors and postgraduate convenors at
appropriate times during a student’s tenure. Administrative staff that take delivery of a
hardcopy of the thesis upon completion, should also make the recommendation that the
document by digitally archived in Sydney Digital Theses. This endorsement to archive
may be verbal, or detailed in the form of a handout. Considering that the necessary
metadata fields would have already been completed to document the digital thesis held by
the department, it is hoped that the minor extra effort required to submit to Sydney Digital
Theses will encourage students to submit their thesis online. One simple procedure that
would be recommended is to place a link to the Sydney Digital Thesis project on the
Geosciences web page to further promote the archival of digital theses.
Repeated exposure to the digital theses link will hopefully remind students that they
should be digitally archiving material upon graduation. Considering that Sydney Digital
19
Data Management Model Survey
Theses can only accept PhD and Masters (Research) theses, the link should be provided
under the Postgraduate banner through the Geosciences homepage, as shown in figure 5.
Figure 5. Sample of information to provide regarding the benefit of archival and link to
Sydney Digital Thesis Project.
Information provided above the link will introduce PhD and Masters (Research) students
to the benefits of digitally archiving a thesis. It also explains that archival is with Sydney
Digital Theses, a component of the Australasian Digital Theses Program. This repository
provides an open collection, which is freely available online.
For a future researcher scouring for information, the thesis will be discoverable through
the Sydney Digital Theses repository. This situation however, relies upon a students
motivation to actively submit their thesis to the repository, an effort which many will
overlook with the euphoria of graduating. Considering that the department already has a
final digital version of the thesis, as well as the necessary metadata to document the asset
in the repository, it may be possible to formulate a back-up procedure. This would require
the student to sign any necessary copyright release forms at the time of hardcopy
submission.
Should the student neglect to submit to the Sydney Digital Theses repository within a
suggested time-frame of ~1yr, then submission may be done so on their behalf using the
material stored within the department and the signed copyright release forms. This
submission process may be automated, with Sydney Digital Theses coming into the
system and siphoning off the thesis and prepared metadata. Such an action would require
further consultation, however, it would be wasteful for research assets not to be
discoverable considering that digital versions of the asset, and metadata had been already
prepared and authorised for release.
20
Data Management Model Survey
For reasons outlined previously, honours theses are not supported within the Sydney
Digital Theses repository, and until recently the School of Geosciences were not actively
encouraging the retention of a digital copy of honours theses. The current solution has
been to request a CD-ROM from the student, an interim resolution that fails to utilise the
array of digital facilities available to the department.
After discussions with the Senior Project Analyst with eScholarship, it was confirmed that
only honours projects of a high quality should be allowed to be archived within this
repository. It was further suggested that limiting archival of honours theses to those
students who achieve a distinction, or high distinction, may even provide a mild incentive
for students to push themselves to strive for excellence. Indeed, archival of material
increases exposure of the students’ research and its quality, which may lead to
collaborations and opportunities to further their early career.
The eScholarship submission process requires that the department deposits material on
behalf of a student. Considering that the department has already taken steps to retain a
digital version of the thesis and associated metadata on the local network, this is not a
major endeavour. When a student hands in a hard-copy of their thesis which is of a
sufficient calibre (D or HD), they should be presented with the necessary copyright release
forms. Having completed these documents, the department is free to submit the thesis
online to the eScholarship repository.
Unfortunately, a large percentage of honours theses created do not reach the threshold for
submission to the institutional repository. An alternative workflow therefore needs to be
21
Data Management Model Survey
considered that advertises that research has been undertaken, whilst restricting access to
the resource. One solution would be to provide visible metadata relating to the thesis,
whilst managing the resource, and access to it, through the department. Metadata should
be provided through eScholarship, however details of the abstract need to be restricted.
This is so that potential users are not inadvertently misled into thinking that the synopsis
came from an authoritative source. It may be more appropriate to provide a disclaimer in
the abstract section, specifying that the thesis was generated by an honours student and
was not subject to extensive or external screening and review. Unlike honours theses of the
distinction / high distinction calibre, theses of a lower quality should not be made
available to download through the eScholarship website.
Obviously this is not an ideal solution, as the department needs to manage the digital
thesis and all related enquiries for an extended period of time. A departmental staff
member would therefore need to take responsibility for the ongoing management of the
resource, and the creation of metadata in eScholarship. It would be of great benefit if the
institution were to provide a repository, or adapt existing systems, to allow for the
deposition of all honours theses, not only those of the top grade brackets.
In anticipation of such changes, and to avoid retrospective actions, all honours students
(regardless of grade) should be presented with the eScholarship copyright release forms
when handing in a hard-copy. This will allow departmental staff members to provide
metadata details through the institutional repository, with the potential to upload the
thesis at a later date if currently policy and systems evolve to accept them. In the
meantime however, this proposal does serve to capture all digital theses generated
through the school, and the creation of metadata in searchable repositories.
22
Data Management Model Survey
Recommendation: That the department advise or require all their students to submit a
final copy of their PhD and/or Masters theses using the digital thesis facility. This can be
supported by the post-graduate coordinator and promoted through the department
handbook.
Recommendation: That a representative from the digital thesis project provide additional
seminars to the post-graduate students at strategic times during the semester so that the
students are more aware of this facility and the copyright and intellectual property issues
associated with using this repository.
23
Data Management Model Survey
Recommendation: That someone within the department be identified to ensure that all
theses and in particular Honours theses are safeguarded and made accessible through a
computer based management system.
Recommendation: Students to receive information leaflet about the Sydney Digital Thesis
Program on enrolment in postgraduate studies. This leaflet should cover copyright issues
and recommend that they digitally archive their thesis and datasets on completion of their
study.
Recommendation: The department insist on the submission of a digital copy of all theses,
including where appropriate, one copy to the Digital Thesis project and one copy to the
department.
Recommendation: All students should sign a Copyright release form and the department
should manage/archive the signed release forms in association with the thesis and data
sets.
Recommendation: If students fail to submit to Sydney Digital Theses after a period of ~1yr
then the Department should digitally archive material to the repository on behalf of the
student (if appropriate permissions have been granted).
Shortcoming: No active channels for the documentation and circulation of working papers
and other unpublished, digital material.
24
Data Management Model Survey
The proposal to this shortcoming is clear-cut. All staff and students at the University of
Sydney have the opportunity to document and archive high-quality digital material with
the institutional repository, eScholarship. Whilst some departments maintain and nurture
very active collections, others neglect to recognise this facility as a means to circulate and
archive unpublished, high-quality digital material. This discrepancy in usage may be due
to numerous factors, which may relate to discipline, department size and copyright and
confidentiality considerations among others. Whilst some of these factors are beyond the
control of the department, it is possible to increase the exposure of researchers to the
facilities that are available.
The school of Geosciences homepage has a tab titled ‘research’ which, when clicked, takes
users to a web page that describes the research profile of the department. It is suggested
that under this ‘research’ banner some information is provided to researchers detailing the
benefits of archiving and circulating unpublished digital material, such as working papers,
to encourage feedback and stimulate debate. Following this introduction to the digital
archival of grey material, a link should be provided to the Geosciences collection in
eScholarship. A link should also be provided to the ‘search’ homepage of the repository,
so that other researches in the department can very easily and quickly search through
submissions in the geosciences collection. An example of how these web pages might
appear, are shown below in figure 7.
25
Data Management Model Survey
Recommendation: Develop a policy in which all data collected using public funds are
required to be submitted to an appropriate repository.
The storage, curation and preservation of research data presents special challenges.
Further education, ideally coupled with easily accessible backup/storage facilities, is
required for long term storage for large datasets.
It is much more efficient and cost effective to manage data collected in association with
research at the time of collection and submission of final work rather than after the fact. In
the Geosciences department all data which is collected in association with research is
managed on an individual basis such that it is rarely in a form available for re-use in
further projects.
Much of the data collected over the past few decades by the department of Geosciences are
either lost or cannot be read with current computers or software. To parallel this data
curators have also disappeared from the department and funding bodies do not provide
grants for the management and long term storage of this data in repositories. The
responsibility for the management of the data thus falls upon the researcher who collected
them to ensure that the data they collect is stored in a digestible understandable form so as
to be available for new analyses. This comes at a time when researchers are under
increasingly heavy workloads.
A students number one priority in conducting research is – “to gain efficient access to the
research and data that has previously been completed in my area of study” –Research
student department of Geosciences.
A further quote from one geosciences student “I believe that having a database with all the
available data for students such as myself would be an ideal tool as it would save valuable
time and make efficient use of the resources the university has”.
Whenever the student collects or generates a dataset during the course of their research,
they should back this up in their password-protected, network space. At this time they
should also fill-in the provided metadata form, whilst details of the research are fresh and
easy to recall. If multiple datasets are generated that are suitably distinct, then multiple
metadata records may be required. This can be achieved by copying the original metadata
template provided in the folder. If multiple datasets and metadata forms are created, it is
26
Data Management Model Survey
important for the student to manage the files so that the metadata can be easily associated
with the correct dataset. The read_me file provided in the folder outlines that this can be
achieved through the creation of further sub-folders.
The metadata form in the dataset folder requests basic information that would be required
to ‘adequately’ document research data within a data repository. An example of this
metadata required to describe a dataset is provided in Appendix 3. The intention of these
metadata forms is not to replace the creation of detailed metadata when it comes to final
archival, but to encourage students to think about metadata, and to capture vital research
information at the time of data collection.
Researchers may want to scour the data for clues that were missed or not looking for in a
previous analylsis. Even if the raw data survives it is useless without the background
information (metadata) that gives it meaning.
27
Data Management Model Survey
The initial question any researcher needs to ask themselves at the outset of a data
gathering exercise is – “what is the repository into which the final data will be submitted?”
If this is done at the outset of a project then the appropriate standards can be adhered to so
that the data can be submitted at the end of the project in an appropriate format with
accompanying metadata.
The institutional repository represents a good option at the present time however the
future needs to considering purpose-built institutionally-backed data management
services based on requirements analysis.
With the large computing capacity now available on individuals desktops, through the
eResearch and ICT framework and various national and international initiatives there are
means available for a department to manage and safeguard this information. This involves
setting a number of policies or rules to follow so that data and information will flow
logically and efficiently within the department.
The data management models presented in section 9 of this document focus on managing
data collected now and in the future rather than consider how best to integrate the “mine”
of data which has been collected in the past. Once these management systems are set in
place however they will adequately serve to establish a project for the “rescue” of data
which falls outside the scope of this project.
It is critical that the geosciences department facilitates data intensive research with the
provision of analytical infrastructure. The issue of curating completed project's data
becomes much simpler if well documented datasets are established and utilised for
analysis during the project.
There are various disparate databases and data sets which reside within the School of
Geosciences. John Twyman (senior computing officer, School of Geosciences) has been
contacted by staff in ICT who are working on a project concerned with documenting the
major spatial datasets in use around the University. In doing so, ICT hope to identify
datasets which are in relatively common use across the University (eg. Census data) and
thus explore opportunities for reducing the costs involved in obtaining such data. Down
the line, ICT are also very interested in determining what role it can play in the hosting
28
Data Management Model Survey
and management of spatial data John Twyman is conducting an inventory of spatial data
sets held by the school of geoscience the information to date is presented in Appendix 4.
Ideally the spatial data sets will be integrated and managed in a Geodatabase on a server
platform however for this to occur spatial data collected by the department would need to
adhere to a standard. Data collected should meet recognised standards where they exist.
Minimum data collection standards ensure that data are stored digitally and with
maximum transferability. By using a geospatial database information can be integrated
and re-used readily for multiple purposes such as the Australian Beach and management
Program database developed by Andy Short of the department of Geosciences shown
below in Figure 9.
Figure 9. the Australian Beach and management Program database developed by Andy
Short of the department of Geosciences.
The recommendation on the standards required for spatial data are outside the scope of
this project however the United States have developed a structure which they have been
applying with a degree of success. The Federal Geographic Data Committee (FGDC) in the
United States provides an example of a model that works across institutions as it has
selected a standard, developed policies and a framework in which to operate. The Content
standard for digital geospatial metadata described by FGDC is shown in Appendix 5.
29
Data Management Model Survey
Currently there is no overall framework for managing or gaining access to the GIS data
and information within the School of Geosciences. An inventory can be used as a first step
in setting up access to the data in a catalogue system like the example from the Columbia
University Department of Earth and Environmental Sciences. Having access to datasets in
a catalogue at the point and click level can be a vey useful research tool for a department.
Appendix Columbia University Department of Earth and Environmental Sciences Data
Catalog: Datasets by Category.
For large and complex datasets data should be archived with discipline specific
repositories or data centre. A researcher could identify an appropriate repository or
project which would take the type of data that they are collecting and assess the data
structure and requirement for submitting data to this facility at the outset of a project.
There are various international and national archives which are available to preserve
publically funded research data generated by scientists a small selection of these are
shown below in Table 4.
5
http://ses.library.usyd.edu.au/
6
http://pet.gns.cri.nz/
7
http://www.aad.gov.au/
8
http://www.smso.net/National_Snow_and_Ice_Data_Center
9
http://www.ga.gov.au/
10
http://www.ngdc.noaa.gov/
11
http://www.aodc.gov.au/
12
http://www.nodc.noaa.gov/
30
Data Management Model Survey
Some journals encourage submission of datasets to their on-line repositories so if the data
is managed from the outset this could provide a mechanism to deal with data sets
associated with journal articles. This avenue could be further investigated by the
department. An example of this is GenBank.
Researchers and students in the future are going to collect larger data sets due to the
nature of the emerging technology so it is very important therefore to provide training in
issues relating to data management including ethical along with technological issues.
To assist in the culture-change required to promote archival of research data and assets,
it’s important to target students and early career researchers. In order to prepare students
for the large amounts of data they will be exposed to in the near future it would be
recommended for departments to develop Data Management courses and manuals such
as those developed at the Australian national University.
A data management component should also be developed in association with field trips
on which data is collected. The field work component of existing courses could emphasise
standard and correct ways to describe samples temporally and spatially at the time of data
collection so as to ensure the integrity and interoperability of the data from the outset.
Data Collection forms could be designed along with procedural information which
standardises the recording of key data elements as shown below in Figure 10.
31
Data Management Model Survey
Figure 10. Data collection in the Field outlining standardisation of key data elements.
It has been recognised that many research datasets are either collected or generated
through the school laboratory. This is a key location where university technicians can
promote the use of the local disk-space, as well as the creation of associated metadata. It
may even be possible for the lab to disseminate research data directly into the students’
dataset sub-folders as shown below in figure 11.
Figure 11. Data from the laboratory is placed into the student/researchers dataset folder
for further analysis to streamline data handling.
32
Data Management Model Survey
8. Physical Assets
33
Data Management Model Survey
34
Data Management Model Survey
Metadata is recorded in a
form using a standard set
of metadata elements.
35
Data Management Model Survey
In order to contribute data effectively to a data centre academic data collections need to be
well managed from the outset and put into standard formats with the relevant level of
descriptive metadata using the approach outlined in section 9.3.
An example of how data could be submitted from the University of Sydney to a data
centre or national project is shown in the diagram below. In this Model data will be made
freely available through a discovery portal to the general community.
36
Data Management Model Survey
9. Conclusion
In the short term the better management of information at the individual researcher and
department level is essential in the large, distributed and complex nature of the university
environment. The evolving infrastructure needed to support research in the digital era will
become easier if standards and methods are agreed upon. However it is essential not to
prescribe restrictive or prohibitive data management practices which are inflexible.
In the short term it is advised to develop workflow practices which make the best use of
existing standards and technologies however to keep in mind how to incorporate the latest
technologies. By tackling information management on various levels: Theses, unpublished
documents, digital data and physical material, we can move forward by degrees without
impacting severely on staff workloads. By using the available systems and methodologies
outlined in this report data collected by the school of Geosciences will be available to take
advantage of new and emerging technologies.
Final state research data (including publications, datasets, multimedia, etc.) should be
archived with the institutional repository. The ICT are developing strategies to help with
the evolving infrastructure required for the management of large spatial data sets. The
very large or complex data should also be archived with the discipline specific
repositories.
As said by one researched in the Data Tables from the Data Management Practices Survey
“In a large, distributed and complex beast like [this university], the diversity of practice
across the wide range of research disciplines means that evolving the infrastructure
needed to support research in the digital era is not going to be easy. It is, however,
essential. We have to be prepared to make mistakes, to try things out and experiment. We
have to be very conscious of the broader framework in which we are working and
constantly try to reveal the deeper principles of practice. Consequently, we have to be
careful not to limit our research record management practices to what current technology
offers - as this will have changed during the life of the project. On the other hand we do
have to use the latest technologies to the best of our abilities to bring increased
productivity and services to researchers. We have just entered the 'Wright Brothers' phase
of the Digital era”.13
While all of the above recommendations are a matter of instilling a change in routine
practices, one of the most important steps forward in the process is the cultural change
that relates to the attitude that researchers have to collaboration and data sharing. The key
to the success in implementing these changes is collaboration.
13
Data Tables from the Data Management Practices Survey
37
Data Management Model Survey
APPENDICES
Appendix 1 - Dublin Core Metadata Fields used in the Sydney Digital
Theses Program
Full metadata record
38
Data Management Model Survey
Managing University of Sydney Faculty of Science undergraduate and postgraduate theses within the
Library’s eScholarship repository. School of Geosciences. Draft for discussion.
Rowan Brownlee, University of Sydney Library Digital Project Analyst, 25 January, 2008
Introduction
This document describes systems used for managing undergraduate and postgraduate theses within the
School of Geosciences and notes issues of relevance to the proposal to develop a Library/Faculty of Science
service for managing storage, access, rights and permissions and digital preservation of undergraduate and
postgraduate theses. The document also includes Edwina Tanner’s1 thoughts on the proposed service.
Background
The School of Geosciences uses a spreadsheet to record information about undergraduate and
postgraduate theses. The spreadsheet contains 1600 records covering 1904 to the present and is managed
by an administrative assistant. The most complete information is available for more recent records. Most
theses are not in digital format. 300 thesis abstracts have been digitally scanned. The number of wholly
digital theses is proportionally small and most were created during the past 3-5 years. Hardcopy current
and historical postgraduate theses are held in the University of Sydney Rare Books Library, and
undergraduate theses are held in Madsen and Edgeworth-David. Many theses have accompanying data
files and databases, though not all data files are available. If available, digital copies of theses and datasets
are stored on School fileservers while databases2 reside on School database management servers.
1. Author
2. Supervisor
3. Abstract
4. Title
5. Year
6. Number of pages
8. Subject keywords
9. ASRC codes
10. Sponsors
12. Digital (indicating the extent of digital holdings. E.g. ‘A’ = abstract)
39
Data Management Model Survey
Edwina does not favour provision of discipline-specific submission forms containing navigable taxonomies
of controlled terms. Within a Faculty unit, the task of uploading theses would most likely be assigned to
administrative assistants rather than subject specialists. Edwina recommends application of ASRC codes.
Although administrative assistants would need to be trained in using ASRC, this would provide a single
classification mechanism for the University using a vocabulary adopted by the Research Office and DEST.
Might record creation be a collaborative process between the Faculty and Library, with the Library
supplementing administrative information with classification codes/subject headings?
Other Faculty units may hold existing metadata in local records management systems. Where will
responsibility lie for assessing data transfer requirements and implementing and maintaining processes for
batch-transfer?
Will there be an ongoing need to manage periodic batch-transfer of records from Faculty administrative
systems?
Presentation and navigation of research data collections within the Library’s repository
Theses may be accompanied by collections of data files which are organized in a directory structure defined
by the thesis creator. The following example is taken from a set of 640 data files and illustrates the
importance of the contextual information provided by the directory structure.
Digital_Thesis/AnalyticalFlow_SteinbergerModel/Images/DynamicTopography/dynatopo_006_9.85.ps
Digital_Thesis/AnalyticalFlow_SteinbergerModel/Images/DynamicTopography/dynatopo_007_5.35.ps
Digital_Thesis/AnalyticalFlow_SteinbergerModel/Images/DynamicTopography/dynatopo_005_12.65.ps
Digital_Thesis/AnalyticalFlow_SteinbergerModel/Images/DynamicTopography/dynatopo_qpmp.tar
The default repository interface is very limited in its capacity to provide a meaningful view of a data
collection’s structure. There is no facility to navigate or ‘drill-down’ a complex directory structure. The
interface simply provides a fully expanded display of directories, subdirectories, sub-sub directories … … …
and data files6.
40
Data Management Model Survey
41
Data Management Model Survey
To what degree is it the role of the repository service to provide user interface tools? How would such a
service be resourced and sustained?
______________________________________________________________________________________
42
Data Management Model Survey
43
Data Management Model Survey
Type of Data
Data Format
Temporal Coverage
Geographic Coverage
Access constraints to data
Papers Published using the
data
Online Links
Note: This form outlines the minimum set of metadata elements required to describe
a dataset.
45
Data Management Model Survey
46
Data Management Model Survey
The following is the recommended bibliographic citation for this publication: Federal
Geographic Data Committee. FGDC-STD-001-1998. Content standard for digital geospatial
metadata (revised June 1998). Federal Geographic Data Committee. Washington, D.C.
47
Data Management Model Survey
• Atmospheric Data
• Oceanographic Data
• River Data
• Solar Radiation Data
• Paleoclimate Data
• Geographical Data
• Geological Data
• Miscellaneous Data
Atmospheric Data
• Atmospheric Measurements
o OBERHUBER latent heat flux (January)
o COADS Monthly Average Air Temperature (December 1980)
o COADS Monthly Average Specific Humidity (December 1980)
o COADS Meridional Wind Velocity (January 1981)
o COADS Zonal Wind Velocity (January 1981)
o COADS Wind Vectors (January 1981)
o COADS Wind Speed (January 1981)
o HANSEN Global Surface Temperature (for the last century)
o Keeling Mauna Loa CO2 (Carbon Dioxide)
o LEGWIL Precipitation (January)
o OORT Humidity (January)
o OORT Temperature (January)
o OORT Meridional Wind Speed (January)
o OORT Zonal Wind Speed (January)
• Oklahoma Weather Station Radiosonde Data
o Precipitation Timeseries
o Temperature Timeseries
o Pressure Timeseries
o Mixing Ratio (Relative Humidity) Timeseries
o Wind Direction Timeseries
o Wind Speed Timeseries
o Meridional (North) Component of Wind Velocity Timeseries
o Zonal (East) Component of Wind Velocity Timeseries
• Weather Information Centers
o National Huricane Center Coastal Watches and Warnings
o Space Sciences and Engineering Center (U. Wisconsin)
o WXP Weather Processor (Purdue)
Oceanographic data
• Annual
o LEVITUS Oxygen (0 meters depth)
o LEVITUS Phosphate (0 meters depth)
o LEVITUS Salinity (0 meters depth)
o LEVITUS Temperature (0 meters depth)
• Monthly
o LEVITUS Salinity (0 meters depth)
o LEVITUS Temperature (0 meters depth)
o COADS Average Sea Surface Temperature (December 1980)
o COADS Atmospheric Presssure at Sea Level (December 1980)
48
Data Management Model Survey
River Data
Paleoclimate Data
Geographical Data
Geological Data
Miscellaneous Data
• Radionuclides Chart
49
Data Management Model Survey
Recommendation: Promote the use of Sydney Digital Theses through various channels,
and identify and endorse a suitable digital repository to house Honours theses.
Recommendation: That the department advise or require all their students to submit a
final copy of their PhD and/or Masters theses using the digital thesis facility. This can be
supported by the post-graduate coordinator and promoted through the department
handbook.
Recommendation: That a representative from the digital thesis project provide additional
seminars to the post-graduate students at strategic times during the semester so that the
students are more aware of this facility and the copyright and intellectual property issues
associated with using this repository.
Recommendation: That someone within the department be identified to ensure that all
theses and in particular Honours theses are safeguarded and made accessible through a
computer based management system.
Recommendation: Students to receive information leaflet about the Sydney Digital Thesis
Program on enrolment in postgraduate studies. This leaflet should cover copyright issues
and recommend that they digitally archive their thesis and datasets on completion of their
study.
Recommendation: The department insist on the submission of a digital copy of all theses,
including where appropriate, one copy to the Digital Thesis project and one copy to the
department.
50
Data Management Model Survey
Recommendation: All students should sign a Copyright release form and the department
should manage/archive the signed release forms in association with the thesis and data
sets.
Recommendation: If students fail to submit to Sydney Digital Theses after a period of ~1yr
then the Department should digitally archive material to the repository on behalf of the
student (if appropriate permissions have been granted).
Recommendation: Develop a policy in which all data collected using public funds are
required to be submitted to an appropriate repository.
51
Data Management Model Survey
REFERENCES
Australian National University (2008) ANU Data Management Manual: Managing Digital
Research Data at the Australian National University, Information Literacy Program,The
Australian National University, Document Version 1.03, August 15, 2008
Fitzgerald, A., Pappalardo, K. and Austin, A. (2007) Understanding the legal implications
of data sharing, access and reuse in the Australian research landscape. Chapter 7 In:
Building the Infrastructure for Data Access and Reuse in Collaborative research: An
analysis of the Legal Context, Oak Law Project. http://www.oaklaw.qut.edu.au/reports
Henty, Margaret (2007) Data Tables from the Data Management Practices Survey,
http://hdl.handle.net/1885/47108
Tanner Edwina (2007) Data Management and Researcher Support Presentation, eResearch
Forum 1 November 2007, The University of Sydney.
University of Oxford (2007) Scoping Digital Repositories Services for Research Data
Management - A Project of the Office of the Director of IT
52