Slava Tykhonov from the Data Archiving and Networked Services (DANS-KNAW) and Philipp Conzett (UiT/DataverseNO) presented the vision for the development of the Dataverse SSHOC project at the Dataverse Community Meeting 2019 at Harvard University, Cambridge, MA, USA, 19-22 June 2019. This event, hosted by Harvard’s Institute for Quantitative Social Science, gathered an international representation of researchers, librarians, publishers, developers and anyone interested in sharing data or building repositories from all over the world.
The SSHOC project started in January, 2019. Within the project partners CESSDA, DARIAH, CLARIN and ERIHS work together to create a reliable and production ready Open Source data infrastructure that institutes can install and reuse for their own needs and requirements, based on the Dataverse software.
By the end of the project a data repository service running on EOSC will be delivered, together with a report on the principles of governance and sustainability of the service.
The work approach of two different development teams was presented. The first team is responsible for core development and will change the Dataverse repository functionality by adding new features. The second team will develop applications that can be integrated with Dataverse. The goal of both teams is to deliver production-ready services that will be deployed on the cloud infrastructure using continuous integration pipelines. The teams will follow the CESSDA Maturity Model, so the service will be accepted as a mature service in the EOSC. In a mature networked infrastructure, all connected networked services should be able to test each other automatically during building and deployment processes. Reports should be produced when services are out of order or outdated.
SSHOC Dataverse has a fairly ambitious roadmap; the plan is to build services such as a flexible federated authentication and a NESSTAR DDI import tool. There is a significant demand for the development of data preview services such as DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML, Images, video render, audio, JSON, GeoJSON/Shapefiles/Map and XML. Some are already implemented on the demonstration level and will be integrated into the Dataverse infrastructure.
The first integration with the CESSDA CV Service was presented. It enables the linking of Dataverse metadata fields to the CESSDA Metadata controlled vocabularies. The development of services to provide support for external controlled vocabularies, was well received by the audience.
Another aim of the project is to provide curated, multilingual support for the Dataverse web interface, the SOLR components and the internal controlled vocabularies. The translation process should be as easy as possible for all translators and should keep track of all translated versions of all properties. The project will use the Weblate tool as a multilingual shared service where users can work together and collaborate on the various translations.
The services to be produced will be highly useful both for researchers within the partner organizations and for users of existing Dataverse installations. Notable features include multilingual support, domain-specific metadata compliance through controlled vocabularies and enhanced support for multiple authentication protocols within the same installation of Dataverse.
Slides from Slava Tykhonov (DANS/CESSDA), Phillipp Conzett (UiT/CLARIN), Marion Wittenberg (DANS/CESSDA)