Date: 
28 September 2020 - 11:00 to 12:30
Location: 
Online

 

SSHOC is developing a data repository service for SSH institutions. The new service is built upon the Dataverse software and will be adjusted to the needs of the European research infrastructures.

Dataverse is a community-driven open source software platform which enables integrations with other data services such as DataCite or ROpenScience. Its modular design principle uses API’s to allow for distributed file storage and support the addition of further microservices.

We invite interested researchers and institutes from across the SSH community to attend the webinar and share your ideas and requirements.

  • Would your research institute like to use the SSHOC Dataverse, once it's available?
  • Would you as an individual researcher like to use Dataverse?
  • What are your requirements?
  • How much localisation do you need?
  • Would you like to use a central service in the cloud or an installation in your institutional environment?

 

The webinar starts with a presentation of the current functionality, followed by a presentation of new features to be developed. After these presentations we will collect input from the audience. The discussion will focus on essential requirements for such a service, preferences, organisation, and necessary training.

The webinar will be chaired by Marion Wittenberg, service manager of DataverseNL at Data Archiving and Networked Services (DANS), together with her colleague Laura Huis in ‘t Veld, functional manager, and Péter Király, researcher and software developer at Georg-August-Universität Göttingen.

The event is intended for researchers, research institutes, and university and faculty staff across SSH domains and is not limited to the DARIAH community


The outcomes from the webinar discussion are presented below in Q&A format

Data management

Q: For how long can the data be deposited in Dataverse?

A: This depends on the policy of the institute or repository responsible for the Dataverse service. For repositories using DOIs, DataCite requires that access to data is provided for at least 10 years.

Functionality of Dataverse

Q: Dataverse seems to offer more description and search capacities, but could you please highlight its key differences with Zenodo?

A: Maybe you can consult this article, it is a comparison of different repositories https://fairsharing.org/collection/GeneralRepositoryComparison

and this blog:

https://dataverse.org/blog/comparative-review-various-data-repositories 

Dataverse has a nice feature that you can use a Private URL to share the dataset with for example a journal before publishing it. Dataverse has also other possibilities to share data with others without making the data publicly available. This is a nice feature when the datasets isn’t yet finalised.

 

Q: As an addition to previous question: Could you point out if/how controlled vocabularies (e.g., in a German-speaking context, GND) are implemented?

A: It is possible to link to controlled vocabularies. The choice of the CV's is depending on the policy of the institute responsible for the Dataverse service. For the SSHOC project, we are now for example connecting with CVs that are needed for the CESSDA community. 

 

Q: Can Dataverse allocate DOIs?

A: Yes, DOIs, Handle or other PIDs are possible. You can connect Dataverse with Datacite to mint DOIs.

Business model

Q: Do we have any information available on the business models of maintaining an institutional Dataverse repository? (E.g. what this means in terms of cost, labour, running support, whether there are available funding sources etc. )

A: We will work on a business model during the SSHOC project. It depends whether the dataverse is maintained centrally or at an institution

 

Q: Could you share more on the timeline for the SSHOC project?

A: Setting up translation service currently. End of 2020 we would like to have a test instance running for testing. End of next year a staging service should be running. But it depends on who/where to host this instance. We will also present an archive-in-a-box solution, for smaller institutions for each community.

 

Q: "Whether the dataverse is maintained centrally or at an institution" is indeed a crucial question, with countless implications. Will you evaluate this aspect in detail (as far as it can be) during the project?

A: This will be part of the policy document. We will need to discuss this with the involved institutions. We can do recommendations, but it is a project, so we are not the ones who decide on this. 

Privacy sensitive data / Security / Restricted Access

Q: Dataverse seems to assume that all data can be made public. Sometimes, interviews, surveys or records cannot be made publicly available due to ethical or legal issues. Does Dataverse support different levels of access – perhaps with user authentication?

A: Yes: there are different levels of access; Open or  restricted. Once you have published the dataste, is the metadata always public. It is possible to have a ‘request access button’ to restricted files. 

see also: http://guides.dataverse.org/en/latest/user/dataset-management.html#restricted-files

Metadata

Q: Another question on adding keywords: Are there ways to make adding keywords easier than manually selecting a CV and inserting a link for every new keyword?
A:
Yes, we are working on the linkage to controlled vocabularies. This work is part of the SSHOC project

Data Quality

Q: About Data Quality - how do you measure Data Quality? What are your metrics?
A1:
There are no tools on measuring the quality part of the software at the moment. 
A2: Data quality check is done manually by data curators for most of the instances.

Software

Q: Can we already access the code (git repository?) of the "SSHOC version" of dataverse, and/or associated tools?
A1:
The code is at different Git repositories, [address will follow]
A2: The aim is that everything we develop during the SSHOC project will be evaluated by Harvard IQSS and if possible be included in the Dataverse Main Branch. We are collaborating with the IQSS team. It is open source software, so also contributions from others to our developments.

 

Video Embed: