class: center, middle, inverse, title-slide # Week 12 --- class: inverse, center, middle # Revision 2 --- # Question 1 Consider the following string. Which command would you use to replace the `x` with blank (whitespace)? ```r string <- c("169 millimeters x 117 millimeters x 9.1 millimeters") ``` - `A. chartr(string, x)` - `B. chartr(string, "x", "~")` - `C. chartr(string, old = "x", new=" ")` - `D. chartr(string, "x", " - ")` -- <br> <br> <br> CORRECT ANSWER: C --- # Question 2 What is the result of the following R code? ```r df1 <- c("VIC", "NSW", "TAS", "WA", "SA") df2 <- c("WA", "SA", "NSW", "TAS", "VIC") identical(df1, df2) ``` - `A. TRUE` - `B. FALSE` - `C. "WA", "SA", "NSW"` - `D. "TAS", "VIC"` -- <br> <br> <br> CORRECT ANSWER: B --- # Question 3 Which one of the following is NOT one of the print functions? - `A. cat()` - `B. print()` - `C. noquote` - `D. quote` -- <br> <br> <br> CORRECT ANSWER: D --- # Question 4 Which one of the following removes all punctuations in the vector x? ```r x <- c("hello!", "good-day.", "hi 5:)") ``` - `A. str_subset(x, "[:alnum:]")` - `B. str_extract(x, "[:alnum:]")` - `C. str_remove(x, "[:punct:]")` - `D. str_replace_all(x, "[:punct:]", "")` -- <br> <br> <br> CORRECT ANSWER: D --- # Question 5 According to the following code, what will be the result of y? ```r x <- "Now, I am HAPPY" y <- length(x) y ``` - `A. 4` - `B. 1` - `C. 2` - `D. 5` -- <br> <br> <br> CORRECT ANSWER: B <!-- --- --> <!-- # Question 6 --> <!-- Consider the following data frame. `date_col` variable is in a --> <!-- factor format. What command would you use to convert it to a --> <!-- date format? --> <!-- - `A. ymd()` --> <!-- - `B. dmy()` --> <!-- - `C. is.date()` --> <!-- - `D. mdy()` --> <!-- -- --> <!-- <br> --> <!-- <br> --> <!-- <br> --> <!-- CORRECT ANSWER: B --> <!-- --- --> <!-- # Question 7 --> <!-- The header of the Flights data frame is given in the following output. Which one of the following can be used to extract the day of the month information of the `Flights$departure_time`? --> <!-- - `A. mday(Flights$departure_time)` --> <!-- - `B. month(Flights$departure_time)` --> <!-- - `C. year(Flights$departure_time)` --> <!-- - `D. hour(Flights$departure_time)` --> <!-- -- --> <!-- <br> --> <!-- <br> --> <!-- <br> --> <!-- CORRECT ANSWER: A --> --- # Question 6 Which one of the following functions from `lubridate` package will convert `z` into a date format? ```r z <- c("08.06.2018", "29062018", "23/03/2018", "30-01-2018") ``` - `A. ymd(z)` - `B. dmy(z)` - `C. ydm(z)` - `D. hms(z)` -- <br> <br> <br> CORRECT ANSWER: B --- # Question 7 In which one of the following, values are divided by their standard deviation (or root mean square)? - `A. Box-Cox transformation` - `B. logarithmic transformation` - `C. z-score standardisation` - `D. square root transformation` -- <br> <br> <br> CORRECT ANSWER: C --- # Question 8 According to the following code, what will be the result of `y`? ```r minmaxnormalise <- function(x) {(x - min(x)) / (max(x) - min(x))} x <- c(5, 4, NA, 2, 5) y <- minmaxnormalise(x) y ``` - `A. 1.00 1.00 NA 1.00 1.00` - `B. 1.00 0.67 NA 0.00 1.00` - `C. NA NA NA NA NA` - `D. 0.00 0.00 NA 1.00 1.00` -- <br> <br> <br> CORRECT ANSWER: C --- # Question 9 Which one of the following packages has a function to detect multivariate outliers? - `A. library(dplyr)` - `B. library(MVN)` - `C. library(tidyr)` - `D. library(validate)` -- <br> <br> <br> CORRECT ANSWER: B --- # Question 10 Which of the following can be used to deal with outliers? - `A. Capping` - `B. Transforming` - `C. Imputing` - `D. All of them` -- <br> <br> <br> CORRECT ANSWER: D --- # Question 11 Which one of the following is the reason for the error given below? ```r df <- data.frame(col1 = c(2, 0 / 0, NA, 1 / 0,-Inf, Inf), col2 = c(NA, Inf / 0, 2 / 0, NaN,-Inf, 4)) is.infinite(df) ``` - `A. is.infinite() function accepts only vectorial input.` - `B. there is no infinite value in the data frame.` - `C. data frame has missing values.` - `D. there is a division by zero problem in the data frame.` -- <br> <br> <br> CORRECT ANSWER: A --- # Question 12 Consider the following data frame. What command would you use to find the total missing values in each column? ```r df <- data.frame(col1 = c(1:3, NA), col2 = c("this", NaN, "is", "text"), col3 = c(TRUE, FALSE, TRUE, TRUE), col4 = c(2.5, 4.2, 3.2, NA)) ``` - `A. sum(is.na(df))` - `B. is.na(df)` - `C. is.nan(df)` - `D. colSums(is.na(df))` -- <br> <br> <br> CORRECT ANSWER: D --- # Question 13 According to the following code, what will be the result of y? ```r x <- c(1:3, NA, 5, NA) y <- which(is.na(x)) y ``` - `A. 4 6` - `B. TRUE` - `C. FALSE FALSE FALSE TRUE FALSE TRUE` - `D. NA` -- <br> <br> <br> CORRECT ANSWER: A --- # Dataset scenario for Questions 14 & 15 A relational database contains 2 data sets namely `sales` and `employees`. The `sales` data set gives information about the each sale with an id followed by customer id and salesperson id with quantity of the item and payment type. Here is the `sales` data set: ```r sales ``` ``` ## # A tibble: 4 x 6 ## sales_id sales_person_id customer_id product_id quantity payment_type ## <dbl> <chr> <dbl> <dbl> <dbl> <chr> ## 1 201 A1 1 102 2 Debit ## 2 202 B3 2 101 3 Credit ## 3 203 A1 3 101 1 Cash ## 4 204 A2 1 103 5 Debit ``` --- # Dataset scenario for Questions 14 & 15 Cont. The `employees` data set allows you to look up the name and surname of the sales person using the sales person id. Here is the `employees` data set: ```r employees ``` ``` ## # A tibble: 6 x 3 ## sales_person_id first_name last_name ## <chr> <chr> <chr> ## 1 A1 John Doe ## 2 A2 Jane Smith ## 3 A3 Micheal Brown ## 4 B1 Jim Johnson ## 5 B2 Karen Wilson ## 6 B3 Kate Taylor ``` * `employees` connects to `sales` via the `sales_person_id` variable. ??? ```r sales ``` ``` ## # A tibble: 4 x 6 ## sales_id sales_person_id customer_id product_id quantity payment_type ## <dbl> <chr> <dbl> <dbl> <dbl> <chr> ## 1 201 A1 1 102 2 Debit ## 2 202 B3 2 101 3 Credit ## 3 203 A1 3 101 1 Cash ## 4 204 A2 1 103 5 Debit ``` ```r employees ``` ``` ## # A tibble: 6 x 3 ## sales_person_id first_name last_name ## <chr> <chr> <chr> ## 1 A1 John Doe ## 2 A2 Jane Smith ## 3 A3 Micheal Brown ## 4 B1 Jim Johnson ## 5 B2 Karen Wilson ## 6 B3 Kate Taylor ``` ```r # Q16: How would you find the names of sales people who made a sale while dropping all the information in `sales` data set? employees %>% semi_join(sales) ``` ``` ## Joining, by = "sales_person_id" ``` ``` ## # A tibble: 3 x 3 ## sales_person_id first_name last_name ## <chr> <chr> <chr> ## 1 A1 John Doe ## 2 A2 Jane Smith ## 3 B3 Kate Taylor ``` ```r # Q17: How would you find the names of sales people who didn't make a sale? employees %>% anti_join(sales) ``` ``` ## Joining, by = "sales_person_id" ``` ``` ## # A tibble: 3 x 3 ## sales_person_id first_name last_name ## <chr> <chr> <chr> ## 1 A3 Micheal Brown ## 2 B1 Jim Johnson ## 3 B2 Karen Wilson ``` --- # Question 14 According to the given information, how would you find the names of sales people (employees) who made a sale while dropping all the information in the sales data set? - `A. anti_join(employees, sales)` - `B. semi_join(employees, sales)` - `C. union(employees, sales)` - `D. bind_cols(employees, sales)` -- <br> <br> <br> CORRECT ANSWER: B --- # Question 15 According to the given information, how would you find the names of sales people who didn't make a sale? - `A. anti_join(employees, sales)` - `B. semi_join(employees, sales)` - `C. union(employees, sales)` - `D. bind_cols(employees,sales)` -- <br> <br> <br> CORRECT ANSWER: A --- # For Questions 16 and 17 <center><img src="../images/q19.png" width="400px" /></center> --- # For Questions 16 and 17 - Picture 1: <center><img src="../images/q19_1.png" width="400px" /></center> - Picture 2: <center><img src="../images/q19_2.png" width="400px" /></center> --- # For Questions 16 and 17 - Picture 3: <center><img src="../images/q19_3.png" width="300px" /></center> - Picture 4: <center><img src="../images/q19_4.png" width="300px" /></center> --- # Question 16 Consider the id_lookup and ratings data sets, what would be the result of: ```r ratings %>% left_join(id_lookup) #OR left_join(ratings, id_lookup) ``` - `A. Picture 1` - `B. Picture 2` - `C. Picture 3` - `D. Picture 4` -- <br> <br> <br> CORRECT ANSWER: A --- # Question 17 Consider the id_lookup and ratings data sets, what would be the result of: ```r id_lookup %>% anti_join(ratings) #OR anti_join(id_lookup, ratings) ``` - `A. Picture 1` - `B. Picture 2` - `C. Picture 3` - `D. Picture 4` -- <br> <br> <br> CORRECT ANSWER: D --- # Question 18 Which one of the following will order this data frame in an ascending order using col2 , col3 and col1 , respectively? ```r df <- data.frame(col1 = c(4, 3, 1), col2 = c(81, 12, 4), col3 = c(54, 22, 66)) ``` - `A. df %>% select(col1, col2, col3)` - `B. df %>% filter(col1, col2, col3)` - `C. df %>% arrange(col1, col2, col3)` - `D. df %>% arrange(col2, col3, col1)` -- <br> <br> <br> CORRECT ANSWER: D --- # Question 19 According to the following code, what will be the class of df? ```r df <- data.frame(col1 = 1:3, col2 = c("this", "is", "text"), col3 = c(TRUE, FALSE, TRUE), col4 = c(25.5, 44.2, 54.9)) df <- as.matrix(df) class(df) ``` - `A. list` - `B. vector` - `C. matrix` - `D. data.frame` -- <br> <br> <br> CORRECT ANSWER: C --- # Question 20 According to the following code, what will be the ordering of the levels for y? ```r y <- factor(c("low", "moderate", "low", "severe", "low", "high", "moderate", "severe"), levels = c("low" , "moderate", "high" , "severe"), ordered = TRUE) y ``` - `A. moderate < high < severe < low` - `B. low < severe < high < moderate` - `C. low < moderate < high < severe` - `D. severe < high < moderate < low` -- <br> <br> <br> CORRECT ANSWER: C <!-- --- --> <!-- # Question 20 --> <!-- What is the class of y? --> <!-- ```{r, eval = FALSE} --> <!-- y <- c(1, 2, 3, TRUE, FALSE) --> <!-- class(y) --> <!-- ``` --> <!-- - `A. numeric` --> <!-- - `B. character` --> <!-- - `C. factor` --> <!-- - `D. logical` --> <!-- -- --> <!-- <br> --> <!-- <br> --> <!-- <br> --> <!-- CORRECT ANSWER: A --> <!-- --- --> <!-- # Question 20 --> <!-- According to the following code, what will be the ordering of the levels for y? --> <!-- ```{r, eval = FALSE} --> <!-- y <- factor(c("low", "moderate", "low", "severe", "low", "high", "moderate", "severe"), --> <!-- levels = c("low" , "moderate", "high" , "severe"), --> <!-- ordered = TRUE) --> <!-- ``` --> <!-- - `A. moderate < high < severe < low` --> <!-- - `B. low < severe < high < moderate` --> <!-- - `C. low < moderate < high < severe` --> <!-- - `D. severe < high < moderate < low` --> <!-- -- --> <!-- <br> --> <!-- <br> --> <!-- <br> --> <!-- CORRECT ANSWER: C --> <!-- --- --> <!-- <center><iframe src="https://giphy.com/embed/xT0xezQGU5xCDJuCPe" width="480" height="217" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/congratulations-congrats-xT0xezQGU5xCDJuCPe"></a></p></center> --> --- <br> <br> <br> [Return to Course Website](../index.html)