Primary Health Network data set
The following exercises (exercise 5-8) are based on the PHN data set (PHN.xlsx). This data set has the percentage of children who are immunised against Polio and Hepatitis B in Australia by age group and location in October 2017 - September 2017. PHN
data set is taken from https://beta.health.gov.au/resources/publications/2017-phn-childhood-immunisation-coverage-data with 93 observations, containing variables:
PHN Number
[Character]: ID of the Primary Health Network Area
PHN Name
[Character]: Name of the Primary Health Network Area
Age Group
[Character]: Factor with Levels 12-<15 Months, 24-<27 Months, 60-<63 Months
%Polio
[Numeric]: Percentage of children immunised against Polio
%HEP
[Numeric]: Percentage of children immunised against Hepatitis B
First six observations of the PHN data set:
PHN Number | PHN Name | Age Group | %Polio | %HEP |
---|---|---|---|---|
PHN101 | Central and Eastern Sydney | 12-<15 Months | 94.20931 | 94.09876 |
PHN102 | Northern Sydney | 12-<15 Months | 94.75332 | 94.58254 |
PHN103 | Western Sydney | 12-<15 Months | 94.07182 | 93.99051 |
PHN104 | Nepean Blue Mountains | 12-<15 Months | 94.94118 | 94.94118 |
PHN105 | South Western Sydney | 12-<15 Months | 93.85654 | 93.92611 |
PHN106 | South Eastern NSW | 12-<15 Months | 95.44135 | 95.54297 |
Read in the PHN.xlsx data set using a suitable function. Check out the classes of each variable and convert Age Group column into factor, use
ordered
argument then check its levels.Assume that we want to calculate the mean percentage as a column of all age groups of children immunised against Polio and Hepatitis B. Before calculating the mean, select the tidy functions that we need in order to form the data set as:
PHN Number | PHN Name | Vaccination Type | 12-<15 Months | 24-<27 Months | 60-<63 Months |
---|---|---|---|---|---|
PHN101 | Central and Eastern Sydney | %HEP | 94.09876 | 95.37775 | 0.00000 |
PHN101 | Central and Eastern Sydney | %Polio | 94.20931 | 95.54502 | 92.50768 |
PHN102 | Northern Sydney | %HEP | 94.58254 | 95.11925 | 0.00000 |
PHN102 | Northern Sydney | %Polio | 94.75332 | 95.56295 | 92.16040 |
PHN103 | Western Sydney | %HEP | 93.99051 | 95.89433 | 0.00000 |
PHN103 | Western Sydney | %Polio | 94.07182 | 95.99414 | 94.36424 |
a) unite()
, gather()
b) split()
, unite()
c) gather()
, spread()
d) separate()
, split()
Tidy the
PHN
data set into the form in exercise 6. Once you tidy, userowMeans
function from baseR package to calculate the mean percentage of children immunised and save this average as a vector. (Hint: Remember how to select columns as a sequence i.e.,df[,3:5]
.) Repeat this calculation by combining your tidy code and calculating the mean in one line whilst saving it as a vector.Bonus exercise: Use
round
function from base R package to round the values to 2 decimals places in PHN data set.