Exercise 1: Applying Tidy Data Principles
- Open SAFI_messy.xlsx
- Notice the two tabs. Data was recorded by two different researchers in Mozambique and in Tanzania. You’re task is to enable data analysis.
- Identify what is wrong with the data in this spreadsheet from a tidy data perspective. Think of the steps you would need to take to clean up the two tabs, and to put them all together in one new spreadsheet tab.
- Let’s talk about it after ca. 5 min.
Exercise 2: Creating a Tidy Data Set
- Start a new tab (or file) with the tidy, combined data.
- Let’s compare progress in about 20 min (no need to be ready by then).
Do not forget : create a new file (or tab) for the cleaned data, and never modify the original (raw) data.
Exercise 3: Extracting Dates
- Open SAFI_dates.xlsx
- Use the data from the DD_MM_YEAR tab
- Let’s extract day, month and year from the dates to new columns (named Day, Month and Year). For this we can use the built in Excel functions
- Apply each of these formulas to its entire column. Make sure the new column is formatted as a number and not as a date.
- Add another data point in the interview_date column by typing 11/17. What happens in the Day, Months and Year columns?
Do not forget : create a new file (or tab) for your processed data, and never modify the original (raw) data.
Exercise 4: Quality Control Options
Restrict entry options in a copy of the SAFI_clean dataset (or your own cleaned up data). To set the no_membrs column to only allow whole numbers:
• Select the no_membrs column
• On the Data tab select Data Validation
• In the Allow box select Whole number
• Set the minimum and maximum values to 1 and (maybe) 24.
• Try to enter a number outside the range. What happens?
• Make sure to select the whole spreadsheet before sorting by one column (otherwise your data will get wrecked).
• Select Data then Sort.
• Sort by no_membrs in the order Smallest to Largest
• Select a column.
• Go to Styles in the Home tab then Conditional Formatting.
• Under New Rule, use 2-Color Scale with Lowest to Highest for the orange colours.
Exercise 5: Exporting Data
- Finish the cleaning and aggregation of your dataset (if you’re not done yet)
- In Excel from the top menu select File and Save as
- In the Format field, from the list, select Comma Separated Values (*.csv).
- Optional: document your metadata (including your column names) in a .txt file. Look at the documentation for the SAFI data as an example: https://datacarpentry.org/socialsci-workshop/data/
These exercises are based on Data Carpentry lessons.