Skip to Main Content

Research Data: Data Documentation

 

Data documentation allows others to understand your data. It is key to make your data reusable and your research reproducible. Documentation in the form of structured metadata supports the findability and interoperability of your data.

Data Documenation

Documentation is all information that is required to understand your research data sets and should be done continuously throughout the research project.

Good documentation is necessary to:

  • Keep track of all data processing steps.
    Documentation of the steps helps you to find errors and to verify findings.
  • Make your results reproducible.
    Documentation helps others to verify and reproduce your results.
  • Help you write your publications.  
    Documentation is a key resource for preparing your publication(s).
  • Enable re-use of your data.
    Documentation is crucial to determine if a data set can be used for a different research question. The findability of your data also depends largely on the quality of the documentation.

Documentation is needed on (at least) two levels. At the project level general questions need to be answered:

  • What is the aim of the project and the purpose of the data?
  • Which methodologies were used in data generation?
  • How, when and by whom  was the data collected?

On a data level individual objects or files in your dataset might need further explanation, for example the meaning of column names in an Excel sheet or some context for an individual interview in an Oral History project.   

When publishing your data set(s), documentation should be included in one or more separate text-files, accompanying your data.  

The key to producing good documentation is to switch your perspective. Put yourself in the shoes of a potential data-re-user: What information would you need to understand the data and to be able to re-use it? Start documentation at the beginning of your research process and continuously work on it throughout the project.

Metadata

Documentation that is structured in a standardised way is called metadata. Metadata standards provide defined elements into which information can be mapped to become machine readable (and actionable).  

When you publish your data, some descriptive metadata (like title and description) is generated to enable your data set to be found.

Structuring information on your research data according to discipline specific metadata standards greatly enhances the interoperability of your data with data generated by others. The following resources list existing metadata standards:

README files

It is recommended to include a readme file alongside data uploaded into a repository. This plain text file should contain all documentation needed to understand your data. A good guide to writing a readme file is provided by Cornell University here. Another template and an example provided by Carnegie Mellon University might be helpful as well.

Organising Data

Documentation, Metadata and Citation