Changes in Office Hours
R programming
The R language is available as a free download from the R Project website at:
RStudio
RStudio offers a graphical interface to assist in creating R code:
RStudio Environment
RStudio Environment
RStudio Environment
RStudio Environment
RStudio Environment
# Answer "no" to:# Do you want to install from sources the packages which need compilation?update.packages(ask = FALSE, checkBuilt = TRUE)pkgs <- c("tidyverse", "nycflights13", "gapminder", "skimr")install.packages(pkgs, dependencies = c("Depends", "Imports", "LinkingTo"))
Materials for the book, Practical Data Science with R
Finding the path name of the file
\
) with double-backslash(\\
) in the path name.r
at the beginning of the path name:\
) with double-backslash(\\
) in the path name.r
at the beginning of the path name:Code and comment style
The two main principles for coding and managing data are:
The #
mark is R's comment character.
#
indicates that the rest of the line is to be ignored.Consider using block commenting for separating code sections.
####
defines a coding block.Break down long lines and long algebraic expressions.
RStudio Options Setting
11/2'Joe'"Joe""Joe"=='Joe'c()is.null(c())is.null(5)
c()
notation.c(1)c(1, 2)c("Apple", 'Orange')length(c(1, 2))vec <- c(1, 2)vec
Code and comment style
<-
, =
, ->
).<-
.x <- 2x < - 3print(x)x <- 5x = 55 -> x
Shortcuts
Mac
<-
.Windows
<-
.R data types
R data types
x <- TRUEy <- 1z <- 'Data Analytics'productCategory <- c('fruit', 'vegetable', 'dry goods', 'fruit', 'vegetable', 'dry goods')productCategoryFactor <- factor(productCategory)
class()
function returns the data type of an object.x
, y
, z
, productCategory
, and productCategoryFactor
?R data types
a <- c(1, 2)b <- aprint(b)# Alters aa[[1]] <- 5print(a)print(b)
Lists
$
operator and the [[]]
operator.x <- list('a' = 6, b = 'fred')names(x)x$ax$bx[['a']]x[c('a', 'a', 'b', 'b')]
R data types
example_vector <- c(10, 20, 30)example_list <- list(a = 10, b = 20, c = 30)example_vector[1]example_list[1]example_vector[[2]]example_list[[2]]example_vector[c(FALSE, TRUE, TRUE)]example_list[c(FALSE, TRUE, TRUE)]example_list$bexample_list[["b"]]
Errors
Errors are just R's way of saying it safely refused to complete an ill-formed operation
Fear of errors should not limit experiments.
x <- 1:5print(x)x <- meanMISSPELLED(x) print(x) x <- mean(x) print(x)
Data Frames
d <- data.frame(x=c(1,2), y=c('a','b'))d[['x']]d$xd[[1]]
dd[1,]d[,1]d[1,1]d[1, 'x']
Data Frames
d <- data.frame(col1 = c(1, 2, 3), col2 = c(-1, 0, 1))print(d)d$col3 <- d$col1 + d$col2print(d)
NULL and NA values
NULL
is just an alias for c()
, the empty vector.NA
indicates missing or unavailable data.c(c(), 1, NULL)c("a", NA, "c")
NULL and NA values
NULL
is just an alias for c()
, the empty vector.NA
indicates missing or unavailable data.c(c(), 1, NULL)c("a", NA, "c")
NULL and NA values
d <- data.frame(x = 1, y = 2)d2 <- dd$x <- 5print(d)print(d2)
Step 1. Find the path name for the file, car.data.csv
, from the sub-folder, 'UCICar' in the folder, 'PDSwR2-main'.
Step 2. In the code below, replace 'PATH_NAME_FOR_THE_FILE_car.data.csv' with the path name for the file, car.data.csv
.
Step 3. Run the following R code:
uciCar <- read.table( 'PATH_NAME_FOR_THE_FILE_car.data.csv', sep = ',', header = TRUE, stringsAsFactor = TRUE )View(uciCar)
Examining data frame
class()
tells you what kind of R object you have. dim()
shows how many rows and columns are in the data for data.frame
.head()
shows the top few rows of the data.help()
provides the documentation for a class. help(class(uciCar))
.str()
gives us the structure for an object.Examining data frame
summary()
provides a summary of almost any R object. skimr::skim()
provides a more detailed summary.print()
prints all the data. View()
displays the data in a simple spreadsheet-like grid viewer.dplyr::glimpse()
displays brief information about the data.Examining data frame
print(uciCar)class(uciCar)dim(uciCar)head(uciCar)help(class(uciCar))str(uciCar)summary(uciCar)library(skimr)skim(uciCar)library(tidyverse)glimpse(uciCar)
Reading from an URL
tvshows <- read.table( 'https://bcdanl.github.io/data/tvshows.csv', sep = ',', header = TRUE, stringsAsFactor = TRUE)
ggplot
ggplot()
:ggplot(tvshows) + geom_point(aes(x=GRP, y=PE, color=Genre))ggplot(tvshows) + geom_point(aes(x=GRP, y=PE)) + facet_wrap(~Genre)
Changes in Office Hours
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Changes in Office Hours
R programming
The R language is available as a free download from the R Project website at:
RStudio
RStudio offers a graphical interface to assist in creating R code:
RStudio Environment
RStudio Environment
RStudio Environment
RStudio Environment
RStudio Environment
# Answer "no" to:# Do you want to install from sources the packages which need compilation?update.packages(ask = FALSE, checkBuilt = TRUE)pkgs <- c("tidyverse", "nycflights13", "gapminder", "skimr")install.packages(pkgs, dependencies = c("Depends", "Imports", "LinkingTo"))
Materials for the book, Practical Data Science with R
Finding the path name of the file
\
) with double-backslash(\\
) in the path name.r
at the beginning of the path name:\
) with double-backslash(\\
) in the path name.r
at the beginning of the path name:Code and comment style
The two main principles for coding and managing data are:
The #
mark is R's comment character.
#
indicates that the rest of the line is to be ignored.Consider using block commenting for separating code sections.
####
defines a coding block.Break down long lines and long algebraic expressions.
RStudio Options Setting
11/2'Joe'"Joe""Joe"=='Joe'c()is.null(c())is.null(5)
c()
notation.c(1)c(1, 2)c("Apple", 'Orange')length(c(1, 2))vec <- c(1, 2)vec
Code and comment style
<-
, =
, ->
).<-
.x <- 2x < - 3print(x)x <- 5x = 55 -> x
Shortcuts
Mac
<-
.Windows
<-
.R data types
R data types
x <- TRUEy <- 1z <- 'Data Analytics'productCategory <- c('fruit', 'vegetable', 'dry goods', 'fruit', 'vegetable', 'dry goods')productCategoryFactor <- factor(productCategory)
class()
function returns the data type of an object.x
, y
, z
, productCategory
, and productCategoryFactor
?R data types
a <- c(1, 2)b <- aprint(b)# Alters aa[[1]] <- 5print(a)print(b)
Lists
$
operator and the [[]]
operator.x <- list('a' = 6, b = 'fred')names(x)x$ax$bx[['a']]x[c('a', 'a', 'b', 'b')]
R data types
example_vector <- c(10, 20, 30)example_list <- list(a = 10, b = 20, c = 30)example_vector[1]example_list[1]example_vector[[2]]example_list[[2]]example_vector[c(FALSE, TRUE, TRUE)]example_list[c(FALSE, TRUE, TRUE)]example_list$bexample_list[["b"]]
Errors
Errors are just R's way of saying it safely refused to complete an ill-formed operation
Fear of errors should not limit experiments.
x <- 1:5print(x)x <- meanMISSPELLED(x) print(x) x <- mean(x) print(x)
Data Frames
d <- data.frame(x=c(1,2), y=c('a','b'))d[['x']]d$xd[[1]]
dd[1,]d[,1]d[1,1]d[1, 'x']
Data Frames
d <- data.frame(col1 = c(1, 2, 3), col2 = c(-1, 0, 1))print(d)d$col3 <- d$col1 + d$col2print(d)
NULL and NA values
NULL
is just an alias for c()
, the empty vector.NA
indicates missing or unavailable data.c(c(), 1, NULL)c("a", NA, "c")
NULL and NA values
NULL
is just an alias for c()
, the empty vector.NA
indicates missing or unavailable data.c(c(), 1, NULL)c("a", NA, "c")
NULL and NA values
d <- data.frame(x = 1, y = 2)d2 <- dd$x <- 5print(d)print(d2)
Step 1. Find the path name for the file, car.data.csv
, from the sub-folder, 'UCICar' in the folder, 'PDSwR2-main'.
Step 2. In the code below, replace 'PATH_NAME_FOR_THE_FILE_car.data.csv' with the path name for the file, car.data.csv
.
Step 3. Run the following R code:
uciCar <- read.table( 'PATH_NAME_FOR_THE_FILE_car.data.csv', sep = ',', header = TRUE, stringsAsFactor = TRUE )View(uciCar)
Examining data frame
class()
tells you what kind of R object you have. dim()
shows how many rows and columns are in the data for data.frame
.head()
shows the top few rows of the data.help()
provides the documentation for a class. help(class(uciCar))
.str()
gives us the structure for an object.Examining data frame
summary()
provides a summary of almost any R object. skimr::skim()
provides a more detailed summary.print()
prints all the data. View()
displays the data in a simple spreadsheet-like grid viewer.dplyr::glimpse()
displays brief information about the data.Examining data frame
print(uciCar)class(uciCar)dim(uciCar)head(uciCar)help(class(uciCar))str(uciCar)summary(uciCar)library(skimr)skim(uciCar)library(tidyverse)glimpse(uciCar)
Reading from an URL
tvshows <- read.table( 'https://bcdanl.github.io/data/tvshows.csv', sep = ',', header = TRUE, stringsAsFactor = TRUE)
ggplot
ggplot()
:ggplot(tvshows) + geom_point(aes(x=GRP, y=PE, color=Genre))ggplot(tvshows) + geom_point(aes(x=GRP, y=PE)) + facet_wrap(~Genre)