DMP linkable icons


DMP-6: Data quality control



The concept


Data will be quality-controlled and the results of quality control shall be indicated in metadata; data made available in advance of quality control will be flagged in metadata as unchecked.


Related terms: Data Quality Indicator, Quality Control.


Category: Usability


Explanation of the principle


The quality-control of data is necessary to enable use of the data, especially by individuals who were not involved in the creation of the data. A data quality review should verify consistency, accuracy, and precision of values, fitness for use, completeness and correctness of documentation, and validity and fullness of metadata (Peer et al., 2014), as well as other aspects of the data. Ideally, the data quality review should be conducted prior to dissemination so that prospective user communities can determine the potential for using the data by consulting the results of the data quality review. Prospective users should be able to easily determine the potential for use for their own purposes by assessing data quality review results recorded in data quality indicators of the metadata that describe the data. The absence of values for data quality indicators in metadata is an indication that a data quality review has not been conducted.


Guidance on Implementation, with Examples


One or more tiers of data quality review should be completed, either independently or in succession. The review also can be conducted as an internal review, an open review, a blind review, or a double-blind review, depending on community practices. An internal quality review may be officiated by the data producer, either manually or automatically. External open reviews offer opportunities for the research community to review and comment on data quality. Blind or double-blind data quality reviews also may be conducted externally by members of the research community. Ideally, an external party, such as a data center, archive, repository, or publisher will officiate an external review to ensure that it is conducted independently of the data producer. The officiator facilitates the review by providing access to the data, any dependent tools, services, related information, and documentation. They specify the review criteria, recruit reviewers, ensure the integrity of the process, receive commentary, and report the results.


Officiators should enable reviewers to determine the extent to which the data meet each criterion. Besides providing context by describing the profile, purpose, scope, collection period, phenomenon studied, and lineage or provenance, documentation should describe collection methods, processes, each variable measured, instrumentation, meaning of each variable value, any input data, previous versions, reasons for missing values, descriptions of uncertainties, and post-collection processing. Sources of support for data collection should be described as well as any considerations for interpretation or restrictions for collection, storage, transmission, access, or use, including any approvals or licenses received with regard to such conditions or restrictions. Names and affiliations of data producers and contributors should be documented for the review process, except for double-blind reviews.


The data quality review should evaluate the data, in terms of relevant criteria that are applicable to a variety of uses of the potential user community. Data quality indicators should distinguish between the dataset level and the individual file level. In consultation with the community, the data quality review officiator should define each criterion to be used for the review. Archives, data centers, and publishers may consult with their respective community representatives to define the criteria for data quality reviews to be conducted on data acquired for their collections.


Metrics to measure level of adherence to the principle


The officiator of the data quality review provides capabilities to ensure that the results of and justification for each reviewer’s decisions, including area of expertise, are documented to complete the data quality report and determine the score for each data quality indicator. The results should record each reviewer’s decisions, the criteria used for the data quality review, a definition for each criterion and the meaning of each value, and the extent to which the data met each criterion within data quality indicators to clearly communicate the results determined for each criterion. The officiator should resolve discrepancies between decisions of individual reviewers for a particular criterion to provide a decisive determination about the quality of the reviewed data. For example, the officiator may request clarification from individual reviewers or request a review by an additional reviewer to break a tie vote for any particular criterion.


The value of the data quality indicator should be included in the metadata that describe the data along with the definition of the indicator or a reference to the definition. If a data quality review was not conducted prior to metadata creation, the metadata should state that the data quality review was not completed. If a particular criterion was not included in the data quality review, the indicator for that criterion should state that the data quality review was not completed.


Resource Implications of Implementation


Except for automated reviews, at least two reviewers should be recruited to conduct independent data quality reviews. Each data quality reviewer should possess expertise relevant to the use of the data and their type of use should be recorded. Candidate data quality reviewers must report to the officiator, any potential conflicts of interest prior to accepting a review assignment and recuse themselves from the review process when conflicts exist. Determinations of conflicts of interest should be completed prior to conducting the review.


Each reviewer should be provided with access to the review criteria, the data, documentation, metadata, and any tools or services needed to access or use the data (Callahan, 2015). Associated products, tools, or services should be accessible by the reviewers and described to enable inspection and use. Each reviewer should be provided with capabilities for rendering and inspecting these resources and with instructions to enable unimpeded use of the data and related resources.



Text extracted from the Data Management Principles Implementation Guidelines