This report, authored by Rob Johnson, Tom Parsons, myself, and John Kaye involved the preparation of a revised version of the data asset framework (DAF) survey to inform the development of Jisc’s Research Data Shared Service. We also prepared a DAF toolkit to help HEIs implement the survey should they wish to do so. Please find the Executive Summary of the main report below.
Context
Jisc is developing a pilot research data shared service (RDSS), which is intended to offer easy deposit of data along with features such as discovery, safe storage, long-term archiving, and preservation. Thus, Jisc requires information on the current perception of Research Data Management (RDM) in Higher Education Institutions (HEIs), which are meant to be the primary users of the service.
In order to achieve this, Research Consulting developed a refined version of the existing data assessment framework (DAF) tool and run a survey in six pilot institutions. Such a refinement was achieved by analysing the existing DAF toolkit and RDM surveys previously used in other institutions. The survey was further improved by seeking the feedback of the pilot institutions involved in the RDSS project. The topics analysed in the survey are active data, data preservation, data sharing, and institutional support services.
The results of this survey were analysed by Research Consulting and are gathered in this report. The anonymised survey results are available online. In this report, the survey results are analysed in aggregated form, by institution, by role, and by REF panel. The analysis by role shows differences between junior and senior members of research staff, while the analysis by REF panels compares the habits of researchers from different fields.
Active data
Documents, reports and spreadsheets are the most commonly used types of digital data. Several survey respondents also store non-digital data (58%), with the most common types being notebooks/lab books, paper records/portfolios, and samples.
Use on data management plans remains generally low. The most common reasons for having a data management plan are that it constitutes good research practice (72%) and that it is required by the project funder (53%). On the other hand, reasons for not having data management plans include that it is not required or appropriate for the field of research (47%) or funder (45%), or that researchers lack the skills or knowledge to create one (32%).
Survey respondents expect that their data will be mostly accessed by themselves and other researchers (in the same or in other institutions). Researchers apply security measures to protect their data in 59% of cases, with 41% holding some form of personal or sensitive data. Respondents from REF panels A, C, and D mostly hold personal data about identifiable individuals, while those from panel B mostly hold commercially sensitive information. Where security measures are applied, the most common are password protection of files (59%), physical security (47%), and access logging (45%).
A large share of survey respondents hold less than 50GB of data by volume (40%), however, a minority hold up to 10TB, with one as much as 2PB of data. These data were mostly collected in the past 1-3 years (39%). Only 10% of respondents continue to store data that was gathered more than 10 years ago. Respondents holding at least 501GB of data expect their storage requirements to increase in the next 5 years, either slightly or substantially. Postgraduate students and professors have slightly different habits when storing data. The former prefer external flash drives and hard drives of privately-owned computers, while the latter mostly use hard drives of university-owned computes and university network storage. When using cloud storage, personal accounts are much more common than institutional ones.
Data preservation
92% of survey respondents backup at least some of their data, usually personally with a weekly or daily frequency. The most common backup solutions are external drives or memory sticks, university-managed backup storage, and cloud drives.
17% of the respondents reported that they had lost data during their career. In these cases, the most common causes for the data loss were hardware failure, human error, and stolen property. Among the impacts of the data losses the most common is wasted research effort due to the need to replicate research.
Most respondents from REF panels A and B move data with long-term value to a different location for preservation and storage, while this is not as common in panels C and D. The amount of data that the survey respondents believe is worth storing for long-term preservation is not materially different from the total volume held. Survey respondents would generally expect to store their data in an institutional repository, though some respondents indicated they would simply use personal external hard drive for this purpose.
The results suggest that few researchers are aware of digital preservation methods outside of bit-level preservation and that training and awareness raising around preservation is required.
Survey respondents generally do not track their archived data and very few follow guidelines for the preparation of metadata.
Data sharing
Sharing of research data is usually handled through the use of cloud storage services (67%), emailing data files (60%), and using portable storage devices (32%). Survey respondents are generally happy to share their data, mostly because research is a public good, there is potential for re-use, and their research findings can be verified independently. For public sharing of data (i.e., data publication), respondents mentioned the use of ArrayExpress, the British Atmospheric Data Centre (BADC), the EMBL Nucleotide Sequence Database (ENA), GenBank, the Gene Expression Omnibus (GEO), Github, the NCBI Sequence Read Archive (SRA), and Zenodo. In these databases, documentation in the form of metadata is usually present.
Among the reasons not to share data, the respondents mentioned confidentiality, issues with permissions, and the desire to hold the data to work more on it. Only a minority of respondents stated that they have already re-used someone else’s data.
Institutional support services
Survey respondents showed low levels of awareness of the institutional support services available to them on data management and sharing. Even among those who are aware of these services, most are not currently using them and 10% do not expect to use them in the future.
In terms of training, the largest gaps relate to long-term storage, sharing (including publication), and security of data.