DMP linkable icons
DMP-2: Data access
Data will be accessible via online services, including, at a minimum, direct download but preferably user-customizable services for access, visualization and analysis.
Related terms: Authentication/Authorization, Online, SSO (Single Sign-On)
Explanation of the principle
The storage and distribution of data has evolved dramatically in recent decades. These developments include the vast increases in the availability of data online and the speed of transfer, as well as the ability to run queries over numerous datasets using Application Program Interfaces (APIs). Users now expect data to be available on demand, via online services, i.e. a web address. Currently, this means a URL responding to HTTP, HTTPS, or FTP based protocols. To meet a wide variety of use cases, particularly analysis at scale, users expect data to be usable by a human via a user interface (providing at least download but also tools for visualisation and analysis) and to be ‘machine-usable’ via an API.
There are several types of online services. A few of these are:
- Direct access service, allowing the user to download data to their computer;
- Direct Web service, allowing a machine to download a large number of files;
- Browse services, which allow users to inspect representations of candidate files before ordering;
- Visualisation services allowing a user to view images of data and possibly to superpose it on other data. For geospatial data this would typically be via a Web Map Service (e.g. OGC WMS / WMTS);
- In place processing of the data:
· Since the volume of data is increasing dramatically, it is desirable to perform processing and analysis of the data in place, i.e. before downloading the source data;
· The OGC WPS provides a standardized way to remotely execute processing.;
· In order to ease the transfer of the processors, some techniques can be used: e.g. virtualization, or docker techniques.
Guidance on Implementation, with Examples
1. Simple architecture: The data access architecture should be simple to implement
2. Use of standards: The data access system should rely on standards. Examples of standards are:
- OGC standards,
- CEOS OpenSearch,
3. Archived data repackaging/reformatting: Data should be provided in the standard formats that are needed by the designated communities and in exchange formats to facilitate interchange between archives.
In order to ease the work of the user, the URL for accessing the data should be present within the metadata provided by the catalogue service. The use of a standardized interface (like OPeNDAP OpenSearch, OGC, etc.) is preferred. This allows the use of existing tools and also helps resources to be more widely used.
Many data repositories require knowledge of the identity of those requesting data. For this reason it is desirable to enable automatic user authentication and authorization. SSO is recommended to open data more widely and ease use. As several SSO protocols exist, a common protocol or a federation of interoperable protocols is recommended.
Metrics to measure level of adherence to the principle
Online data accessibility using a standard browser or web service indicates adherence.
Resource Implications of Implementation
Simple data accessibility can be accomplished with minimal cost using freely available resources. Costs increase when providing and maintaining access to additional tools, services, and related information.
Text extracted from the Data Management Principles Implementation Guidelines