SlideShare a Scribd company logo
R Programming
Topic To Be Covered
1. About R
2. Variables in R
3. Data types
4. Data Import
5. Logical Statements
6. Loops
7. Functions
8. Data plotting and Visualization
9. Introduction to Basic statistical
function and Package
10. Case Study 1
11. Case Study 2
Introduction to R
➢What is R?
➢Why R?
➢R Environment
➢Comparison with Other Languages Specifically Python
➢R installation
➢R Studio Installation
What is R
➢'R' is a programming language for data analysis and statistics.
➢It is free, and very widely used by professional statisticians.
➢R provides a wide variety of statistical (linear and nonlinear modelling, classical
statistical tests, time-series analysis, classification, clustering, …) and graphical
techniques, and is highly extensible.
Why R
➢R is also perfect for exploration.
➢It can be used in any kind of analysis work, as it has many tools and is also very extensible.
➢Additionally, it is a perfect fit for big data solutions.
Following are some of the highlights which show why R is important for data science:
➢Data analysis software: R is a data analysis software. It is used by data scientists for statistical
analysis, predictive modeling and visualization.
➢Statistical analysis environment: R provides a complete environment for statistical analysis. It is
easy to implement statistical methods in R. Most of the new research in statistical analysis and
modeling is done using R. So, the new techniques are first available only in R.
➢Open source: R is open source technology, so it is very easy to integrate with other applications.
➢Community support: R has the community support of leading statisticians, data scientists from
different parts of the world and is growing rapidly.
R Environment
R is an integrated suite of software facilities for data manipulation, calculation and
graphical display. It includes
•an effective data handling and storage facility,
•a suite of operators for calculations on arrays, in particular matrices,
•a large, coherent, integrated collection of intermediate tools for data analysis,
•graphical facilities for data analysis and display either on-screen or on hardcopy,
and
•a well-developed, simple and effective programming language which includes
conditionals, loops, user-defined recursive functions and input and output facilities.
Comparison With Python
➢R has a huge community and the biggest number of statistical packages and
libraries available.
➢R is a function-based language. If you are coming from a purely statistical
background, and are not looking to take over major software engineering tasks
when productizing your models, R is an easier option, than Python.
➢Graphics capabilities are generally considered better
About CRAN
➢Comprehensive R Achieve Network
➢CRAN is a network of ftp and web servers around the world that store identical,
up-to-date, versions of code and documentation for R.
➢It is always recommended to use our nearest CRAN Mirror to minimize the load.
➢https://cran.r-project.org/mirrors.html
➢For India -
https://mirror.niser.ac.in/cran/
R Installation
R installation is Very simple.
https://cran.r-project.org/
R Studio
Installation
https://rstudio.com/product
s/rstudio/download/
Variables in R
Valid Variables Invalid Variables
var_name2. var_name%
.var_name 2var_name
var.name .2var_name
_var_name
Rules To Declare Variables in R
1. Variable can Have letters, numbers, dot and underscore
2. Character '%'. Only dot(.) and underscore allowed.
3. Can start with a dot(.)
4. Can start with a dot(.) but the dot(.)should not be followed by a number.
5. Starts with _ which is not valid
Data types
1. Vectors - single Dimensional
2. Matrix - 2 dimensional
3. Array - Multi Dimensional
4. Lists - Combine any type of data objects together
5. Data Frames - data in rows and columns like a table or database
Vectors
➢ Basic Important Data Type
Types of Vectors
1. Logical Vector Vect1 = c(“TRUE”, “FALSE”)
2. Numeric Vector Vect2 = c(123, 1, 1.5)
3. Integer Vector Vect3 = c(2L, 0L)
4. Complex Vector Vect4 = c(55 + 2i)
5. Character Vector Vect5 = c(“R”, “Programming”, “TRUE”, ‘3.17’)
6. Raw Vector Vect6 = charToRaw(“Hello”) => [1] 48 65 6c 6c 6f
Matrix
➢A matrix is a two-dimensional rectangular data
set.
➢It can be created using a vector input to the matrix
function.
mat = matrix( no of elements, nrow = , ncol = )
Eg. 1 - mat1 <-matrix(1:20, nrow=5, ncol=4)
Eg. 2 –
cells <- c(1,26,24,68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2")
mymatrix <- matrix(cells, nrow=2, ncol=2,
byrow=TRUE, dimnames=list(rnames, cnames))
Array
➢Can be of Any Dimension
➢“dim” attribute is must
Eg.
arr = array(c(“g”, “b”, “r”), dim=c(1, 2, 3))
List
➢An ordered collection of objects (components).
➢A list allows you to gather a variety of (possibly unrelated) objects under one name.
w <- list(name="Fred", mynumbers=5, mymatrix=c(2, 3, 5), age=5.3)
v <- c(list1,list2)
Factors
➢Factors are the r-objects which are created using a vector.
➢It stores the vector along with the distinct values of the elements in the vector as
levels.
Data Frames
➢ A data frame is a table or a two-dimensional array-like structure in which each
column contains values of one variable and each row contains one set of values
from each column.
> emp.data = data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23",
"2014-11-15", "2014-05-11", "2015-03-27")),
stringsAsFactors = FALSE
)
Basic & Useful Operations
➢length(object) # number
of elements or components
➢str(object) # structure of an object
➢class(object) # class or type of an
object
➢names(object) #columns in a
dataframe
➢c(object,object,...) # combine objects
into a vector
➢cbind(object, object, ...) # combine
objects as columns
➢rbind(object, object, ...) # combine
objects as rows
➢object # prints the object
➢ls() # list current objects
➢rm(object) # delete an object
➢newobject <- edit(object) # edit copy and
save as newobject
➢fix(object) # edit in place
Operators in R
➢Arithmetic operators
➢Relational operators
➢Logical operators
➢Assignment operators
Arithmetic Operators in R
Operator Description
+ Addition
– Subtraction
* Multiplication
/ Division
^ Exponent
%% Modulus (Remainder from division)
%/% Integer Division(Quotient after division)
R Relational Operators
Operator Description
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
== Equal to
!= Not equal to
Small Operations with Vectors
➢d<-c(3,5,3,6,8,5,4,6,7)
➢d[3] => 3
➢d[2:5] => 5, 3, 6, 8
➢d < 7 => TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
➢sample(1: 50, 6) => Random output everytime
> x <- c(2,1,8,3)
> y <- c(9,4)
➢x + y => 11 5 17 7
R Logical Operators
Operator Description
! Logical NOT
& Element-wise logical AND (for Vectors)
&& Logical AND
| Element-wise logical OR
|| Logical OR
R Assignment Operators
Operator Description
<-, <<-, = Leftwards assignment
->, ->> Rightwards assignment
Var_name1 = 123
.varName2 -> “R Programming”
Var.name3 <- 1.5
R if….else Statement
➢Simple If Statement
if (test_expression)
{ statement }
➢Simple If…Else Statement
if (test_expression)
{ statement1 } else
{ statement2 }
R if..Else Statement
➢If else Ladder
if ( test_expression1) {
statement1
} else if ( test_expression2) {
statement2
} else if ( test_expression3) {
statement3
} else {
statement4
}
R ifelse() function (New)
Syntax - ifelse(test_expression, x, y)
>a = c(5,7,2,9)
>ifelse(a %% 2 == 0,"even","odd")
>[1] "odd" "odd" "even" "odd"
R For Loop
Basic Syntax
Output :-
R While Loop
Basic Syntax
Output :-
R Break Statement
Basic Syntax
Output :-
R Next Statement
Basic Syntax
Output : -
R Repeat Loop
A repeat loop is used to iterate over a
block of code multiple number of
times.
There is no condition check in repeat
loop to exit the loop.
Basic Syntax
Output:-

More Related Content

R Programming - part 1.pdf

  • 2. Topic To Be Covered 1. About R 2. Variables in R 3. Data types 4. Data Import 5. Logical Statements 6. Loops 7. Functions 8. Data plotting and Visualization 9. Introduction to Basic statistical function and Package 10. Case Study 1 11. Case Study 2
  • 3. Introduction to R ➢What is R? ➢Why R? ➢R Environment ➢Comparison with Other Languages Specifically Python ➢R installation ➢R Studio Installation
  • 4. What is R ➢'R' is a programming language for data analysis and statistics. ➢It is free, and very widely used by professional statisticians. ➢R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.
  • 5. Why R ➢R is also perfect for exploration. ➢It can be used in any kind of analysis work, as it has many tools and is also very extensible. ➢Additionally, it is a perfect fit for big data solutions. Following are some of the highlights which show why R is important for data science: ➢Data analysis software: R is a data analysis software. It is used by data scientists for statistical analysis, predictive modeling and visualization. ➢Statistical analysis environment: R provides a complete environment for statistical analysis. It is easy to implement statistical methods in R. Most of the new research in statistical analysis and modeling is done using R. So, the new techniques are first available only in R. ➢Open source: R is open source technology, so it is very easy to integrate with other applications. ➢Community support: R has the community support of leading statisticians, data scientists from different parts of the world and is growing rapidly.
  • 6. R Environment R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes •an effective data handling and storage facility, •a suite of operators for calculations on arrays, in particular matrices, •a large, coherent, integrated collection of intermediate tools for data analysis, •graphical facilities for data analysis and display either on-screen or on hardcopy, and •a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
  • 7. Comparison With Python ➢R has a huge community and the biggest number of statistical packages and libraries available. ➢R is a function-based language. If you are coming from a purely statistical background, and are not looking to take over major software engineering tasks when productizing your models, R is an easier option, than Python. ➢Graphics capabilities are generally considered better
  • 8. About CRAN ➢Comprehensive R Achieve Network ➢CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. ➢It is always recommended to use our nearest CRAN Mirror to minimize the load. ➢https://cran.r-project.org/mirrors.html ➢For India - https://mirror.niser.ac.in/cran/
  • 9. R Installation R installation is Very simple. https://cran.r-project.org/
  • 11. Variables in R Valid Variables Invalid Variables var_name2. var_name% .var_name 2var_name var.name .2var_name _var_name Rules To Declare Variables in R 1. Variable can Have letters, numbers, dot and underscore 2. Character '%'. Only dot(.) and underscore allowed. 3. Can start with a dot(.) 4. Can start with a dot(.) but the dot(.)should not be followed by a number. 5. Starts with _ which is not valid
  • 12. Data types 1. Vectors - single Dimensional 2. Matrix - 2 dimensional 3. Array - Multi Dimensional 4. Lists - Combine any type of data objects together 5. Data Frames - data in rows and columns like a table or database
  • 13. Vectors ➢ Basic Important Data Type Types of Vectors 1. Logical Vector Vect1 = c(“TRUE”, “FALSE”) 2. Numeric Vector Vect2 = c(123, 1, 1.5) 3. Integer Vector Vect3 = c(2L, 0L) 4. Complex Vector Vect4 = c(55 + 2i) 5. Character Vector Vect5 = c(“R”, “Programming”, “TRUE”, ‘3.17’) 6. Raw Vector Vect6 = charToRaw(“Hello”) => [1] 48 65 6c 6c 6f
  • 14. Matrix ➢A matrix is a two-dimensional rectangular data set. ➢It can be created using a vector input to the matrix function. mat = matrix( no of elements, nrow = , ncol = ) Eg. 1 - mat1 <-matrix(1:20, nrow=5, ncol=4) Eg. 2 – cells <- c(1,26,24,68) rnames <- c("R1", "R2") cnames <- c("C1", "C2") mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE, dimnames=list(rnames, cnames))
  • 15. Array ➢Can be of Any Dimension ➢“dim” attribute is must Eg. arr = array(c(“g”, “b”, “r”), dim=c(1, 2, 3))
  • 16. List ➢An ordered collection of objects (components). ➢A list allows you to gather a variety of (possibly unrelated) objects under one name. w <- list(name="Fred", mynumbers=5, mymatrix=c(2, 3, 5), age=5.3) v <- c(list1,list2)
  • 17. Factors ➢Factors are the r-objects which are created using a vector. ➢It stores the vector along with the distinct values of the elements in the vector as levels.
  • 18. Data Frames ➢ A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column. > emp.data = data.frame( emp_id = c (1:5), emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), stringsAsFactors = FALSE )
  • 19. Basic & Useful Operations ➢length(object) # number of elements or components ➢str(object) # structure of an object ➢class(object) # class or type of an object ➢names(object) #columns in a dataframe ➢c(object,object,...) # combine objects into a vector ➢cbind(object, object, ...) # combine objects as columns ➢rbind(object, object, ...) # combine objects as rows ➢object # prints the object ➢ls() # list current objects ➢rm(object) # delete an object ➢newobject <- edit(object) # edit copy and save as newobject ➢fix(object) # edit in place
  • 20. Operators in R ➢Arithmetic operators ➢Relational operators ➢Logical operators ➢Assignment operators
  • 21. Arithmetic Operators in R Operator Description + Addition – Subtraction * Multiplication / Division ^ Exponent %% Modulus (Remainder from division) %/% Integer Division(Quotient after division)
  • 22. R Relational Operators Operator Description < Less than > Greater than <= Less than or equal to >= Greater than or equal to == Equal to != Not equal to
  • 23. Small Operations with Vectors ➢d<-c(3,5,3,6,8,5,4,6,7) ➢d[3] => 3 ➢d[2:5] => 5, 3, 6, 8 ➢d < 7 => TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE ➢sample(1: 50, 6) => Random output everytime > x <- c(2,1,8,3) > y <- c(9,4) ➢x + y => 11 5 17 7
  • 24. R Logical Operators Operator Description ! Logical NOT & Element-wise logical AND (for Vectors) && Logical AND | Element-wise logical OR || Logical OR
  • 25. R Assignment Operators Operator Description <-, <<-, = Leftwards assignment ->, ->> Rightwards assignment Var_name1 = 123 .varName2 -> “R Programming” Var.name3 <- 1.5
  • 26. R if….else Statement ➢Simple If Statement if (test_expression) { statement } ➢Simple If…Else Statement if (test_expression) { statement1 } else { statement2 }
  • 27. R if..Else Statement ➢If else Ladder if ( test_expression1) { statement1 } else if ( test_expression2) { statement2 } else if ( test_expression3) { statement3 } else { statement4 }
  • 28. R ifelse() function (New) Syntax - ifelse(test_expression, x, y) >a = c(5,7,2,9) >ifelse(a %% 2 == 0,"even","odd") >[1] "odd" "odd" "even" "odd"
  • 29. R For Loop Basic Syntax Output :-
  • 30. R While Loop Basic Syntax Output :-
  • 31. R Break Statement Basic Syntax Output :-
  • 32. R Next Statement Basic Syntax Output : -
  • 33. R Repeat Loop A repeat loop is used to iterate over a block of code multiple number of times. There is no condition check in repeat loop to exit the loop. Basic Syntax Output:-