This white paper provides the necessary basis for understanding the requirements and specifications for remote access to sensitive data (data with potentially harmful effects in the event of their disclosure) in the social sciences and the humanities (SSH). It is result of the work implemented in SSHOC Task 5.4 Remote Access to Sensitive Data. It is intended to provide guidance and recommendations to the EOSC stakeholders for future infrastructure investment for remote access to sensitive data in the SSH. To ensure that this guidance can, in fact, be implemented, the recommendations are based on the knowledge of numerous data professionals who have direct experience planning, implementing, managing, and sustaining diverse forms of remote access and secure facilities. In doing so, our goal has been to maintain the vision of expanding such infrastructure, while remaining grounded in the practicalities of operating such facilities in a sustainable manner.
In this domain, it is now recognized that the ideal of “open data” needs to be balanced with privacy and other factors that can require moderating access to sensitive data, as reflected in the EU Commission’s (2016) stance of “as open as possible, as closed as necessary.” Developments in the past five years have advanced data access, primarily through “safe enclaves”, i.e., physical rooms that provide security for data access (see Glossary). This represents a major improvement for data accessibility, but international, comparative, efficient research requires augmenting the research infrastructure by enabling remote access to data from a researcher’s desktop. Solutions have operated for several years (e.g., UK Data Archive Secure Lab, ICPSR Virtual Data Enclave), but most of these still face limitations on the scope of data available, geographic limitations, etc. More recently, new infrastructures are being developed, some spanning several countries. These efforts are commendable and represent major improvements. However, limited resources, and complex legal variations (national implementations of GDPR), as well as other factors, have prevented implementation of a broader solution.
As countries across Europe look at the emerging multi-national infrastructures, it is crucial to address the need for a European answer, at scale, with sustainable funding. The recommendations offered here are guided by our observations that most successful infrastructures embody two features: 1) they are human as well as technical, and 2) they are neither purely centralised nor decentralised, but well-crafted hybrids.