Skip to Main Content

Research Data: Tidy Data in Spreadsheets

This page is used in conjunction with training sessions on "Tidy Data in Spreadsheets"

Tidy Data Exercises

Exercise 1: Applying Tidy Data Principles

  • Open SAFI_messy.xlsx
  • Notice the two tabs. Data was recorded by two different researchers in Mozambique and in Tanzania. You’re task is to enable data analysis.
  • Identify what is wrong with the data in this spreadsheet from a tidy data perspective. Think of the steps you would need to take to clean up the two tabs, and to put them all together in one new spreadsheet tab.
  • Let’s talk about it after ca. 5 min.

Exercise 2: Creating a Tidy Data Set

  • Start a new tab (or file) with the tidy, combined data.
  • Let’s compare progress in about 20 min (no need to be ready by then).

Do not forget : create a new file (or tab) for the cleaned data, and never modify the original (raw) data.

Exercise 3: Extracting Dates

  • Open SAFI_dates.xlsx
  • Use the data from the DD_MM_YEAR tab
  • Let’s extract day, month and year from the dates to new columns (named Day, Month and Year). For this we can use the built in Excel functions
    =DAY()
    =MONTH()
    =YEAR()
  • Apply each of these formulas to its entire column. Make sure the new column is formatted as a number and not as a date.
  • Add another data point in the interview_date column by typing 11/17. What happens in the Day, Months and Year columns?

Do not forget : create a new file (or tab) for your processed data, and never modify the original (raw) data.

Exercise 4: Quality Control Options

Restrict entry options in a copy of the SAFI_clean dataset (or your own cleaned up data). To set the no_membrs column to only allow whole numbers:
•    Select the no_membrs column
•    On the Data tab select Data Validation
•    In the Allow box select Whole number
•    Set the minimum and maximum values  to 1 and (maybe) 24.
•    Try to enter a number outside the range. What happens?

Sorting:
•   Make sure to select the whole spreadsheet before sorting by one column (otherwise your data will get wrecked).
•   Select Data then Sort.
•   Sort by no_membrs in the order Smallest to Largest

Conditional Formatting:
•    Select a column.
•    Go to Styles in the Home tab then Conditional Formatting.
•    Under New Rule, use 2-Color Scale with Lowest to Highest for the orange colours.

Exercise 5: Exporting Data

  • Finish the cleaning and aggregation of your dataset (if you’re not done yet)
  • In Excel from the top menu select File and Save as
  • In the Format field, from the list, select Comma Separated Values (*.csv).
  • Optional: document your metadata (including your column names) in a .txt file. Look at the documentation for the SAFI data as an example: https://datacarpentry.org/socialsci-workshop/data/

 

These exercises are based on Data Carpentry lessons.