Working in small groups or pairs, complete the following exercises.
First, introduce yourself briefly to the people at your table (or in your group). Decide a name for your table/group. Now imagine that you and your group mates are working together in a company as data analysts and you received the following data set:
id | age | marital | education | job | balance | day | month | duration |
---|---|---|---|---|---|---|---|---|
1 | 44 | married | secondary | blue-collar | 16178 | 21 | nov | 297 |
2 | 88 | married | secondary | admin. | 330 | 2 | dec | 357 |
3 | 36 | divorced | secondary | blue-collar | 853 | 20 | jun | 15 |
4 | 25<= | single | secondary | technician | 616 | 28 | jul | 117 |
5 | 33 | single | secondary | services | 310 | 12 | m | 54 |
6 | 37 | married | tertiary | management | 0 | 16 | jul | -268 |
7 | 42 | married | tertiary | management | 1205 | 15 | mar | 129 |
8 | 43 | married | secondary | blue-collar | 130 | 5 | may | 156 |
9 | 58 | married | primary | u | 99999 | 26 | aug | 168 |
10 | 41 | married | secondary | admin. | 3634 | 14 | may | 216 |
11 | 0 | married | primary | management | 92 | 2 | feb | 447 |
12 | 34 | single | secondary | services | 528D | 2 | sep | 121 |
13 | 28 | single | secondary | admin. | 350 | 19 | may | 5 |
14 | 58 | widowed | tertiary | management | 136 | 8 | jul | 199 |
15 | 34 | married | unknown | blue-collar | 41 | 6 | may | 34 |
The dataset is randomly sampled from bank marketing data and manipulated for the purpose of the task which is located at UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/Bank+Marketing containing the variables:
id
: Customer ID number
age
: Numerical variable
marital
: Categorical variable with
three levels (married,single,divorced where widowed counted as
divorced)
education
: Categorical variable with
three levels (primary, secondary, tertiary)
job
: Categorical variable containing
type of jobs
balance
: Numerical variable, balance in
the bank account
day
: Numerical variable, last contacted
month of the day
month
: Categorical variable, last
contacted month
duration
: Numerical variable, duration
of the contact time
Identify possible problems/errors in this data set. Collaboratively decide three major problems/errors that would be most problematic for your data analysis.
Post your group’s opinion on discussion board. Don’t forget to read and comment on other groups’ responses.
swirl
The swirl
package (you can read more on
swirl
project here )
teaches you R programming interactively, at your own pace, and right in
the R console! You will use swirl in class to complete “R
Programming - Basic Building Blocks” course.
In order to run swirl, you must have R 3.1.0 or later installed on your computer. In addition to R, it’s highly recommended that you install RStudio, which will make your experience with R much more enjoyable. If you need to install RStudio, you can do so here by selecting the appropriate installer for your operating system.
Open RStudio (I assume you have already installed R and RStudio).
Install swirl
by typing the following into the
console:
install.packages("swirl")
swirl
.library(swirl)
install_course("R Programming E")
swirl()
When swirl opens a session, you will enter your name and make a course selection. Select “1: Basic Building Blocks” topic by typing 1 in the console.
Follow the instructions and complete the Basic Building Blocks course. Don’t rush, you will have plenty of time to complete it.
If you have finished the above tasks, work through the weekly list of tasks posted on the Canvas announcement page.