Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Tidy Data Exercises
Exercise 1: Applying Tidy Data Principles
- Open SAFI_messy.xlsx
- Notice the two tabs. Data was recorded by two different researchers in Mozambique and in Tanzania. You’re task is to enable data analysis.
- Identify what is wrong with the data in this spreadsheet from a tidy data perspective. Think of the steps you would need to take to clean up the two tabs, and to put them all together in one new spreadsheet tab.
- Let’s talk about it after ca. 5 min.
Exercise 2: Creating a Tidy Data Set
- Start a new tab (or file) with the tidy, combined data.
- Let’s compare progress in about 20 min (no need to be ready by then).
Do not forget : create a new file (or tab) for the cleaned data, and never modify the original (raw) data.
Exercise 3: Extracting Dates
- Open SAFI_dates.xlsx
- Use the data from the DD_MM_YEAR tab
- Let’s extract day, month and year from the dates to new columns (named Day, Month and Year). For this we can use the built in Excel functions
- Apply each of these formulas to its entire column. Make sure the new column is formatted as a number and not as a date.
- Add another data point in the interview_date column by typing 11/17. What happens in the Day, Months and Year columns?
Do not forget : create a new file (or tab) for your processed data, and never modify the original (raw) data.
Exercise 4: Quality Control Options
Restrict entry options in a copy of the SAFI_clean dataset (or your own cleaned up data). To set the no_membrs column to only allow whole numbers:
• Select the no_membrs column
• On the Data tab select Data Validation
• In the Allow box select Whole number
• Set the minimum and maximum values to 1 and (maybe) 24.
• Try to enter a number outside the range. What happens?
• Make sure to select the whole spreadsheet before sorting by one column (otherwise your data will get wrecked).
• Select Data then Sort.
• Sort by no_membrs in the order Smallest to Largest
• Select a column.
• Go to Styles in the Home tab then Conditional Formatting.
• Under New Rule, use 2-Color Scale with Lowest to Highest for the orange colours.
Exercise 5: Exporting Data
- Finish the cleaning and aggregation of your dataset (if you’re not done yet)
- In Excel from the top menu select File and Save as
- In the Format field, from the list, select Comma Separated Values (*.csv).
- Optional: document your metadata (including your column names) in a .txt file. Look at the documentation for the SAFI data as an example: https://datacarpentry.org/socialsci-workshop/data/
These exercises are based on Data Carpentry lessons.