### Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

# Data Science with R | Sabyasanchi | March 13 - April 11 | 2021

#### Hitesh H S

##### Well-Known Member
Staff member
Simplilearn Support
Hi Learners,

This thread is for you to discuss the queries and concepts related to Data Science with R course only.

Happy Learning !!

Regards,
Team Simplilearn

#### Abdullahi Ahmed Omar Bare

##### Member
Great Class Today, Can't wait to Learn and Do the projects

#### Avneesh Arun

##### Member
Sir , It will be great help if you let us know which are the topics , you are planning to take for next class.

#### Avneesh Arun

##### Member
Hi Sir,

Please find 2nd Day class assignment answer. Request you to check the solution.

Regards
Avneesh

#### Attachments

• Day 2 assignment 1.zip
650 bytes · Views: 24

#### karthigeyan0312

Alumni
Customer
Hello Sir,
In the self-learning practice for Data frames, i get the following error while trying to add a row to the dataframe.

Error:
Warning message in `[<-.factor`(`*tmp*`, ri, value = "DrStrangelove"):
“invalid factor level, NA generated”

The data frame was declared as follows :
movie3 <- data.frame(name= c("Toy Story", "Akira", "The Breakfast Club", "The Artist", "Modern Times", "Fight Club", "City of God", "The Untouchables"), year = c(1995, 1998, 1985, 2011, 1936, 1999, 2002, 1987))

I am able to add a column. However, while trying to add the row as described in the course, i am facing this error.
The code i tried to add the row is as follows :

movie3 <- rbind(movie3, c(name="DrStrangelove", year=1964, length=94))

#### Attachments

51.2 KB · Views: 9

#### Vijay Dattatray Pasalkar

##### Active Member
Hi sir,
I need to understand below operator

Difference between Elementwise logical AND and Logical AND, similarly about OR also.

#### NAGARAJA RAO DIWAKAR

##### Active Member
Odd and Even number assignment in vector. Don't know how to code.

#### NAGARAJA RAO DIWAKAR

##### Active Member
In the assignment given for Day 2 for vector v1 <- c("Ravi","Shashi","Jaya","Reshma","Arun","Pulkit","Ravi"). Finding Ravi no. of times repeating finding is easy as we can count physically. Hence the coding given in answer sheet is probably right. However if the dataset is huge in vector v1 I think this coding doesn't help if I am not wrong.
Coding in such cases might be as below:
## Assignment : Print the number of times the name Ravi appears in the given vector .
as.data.frame(table(v1))
Please correct me, if I am wrong. I am also very new to coding. I am doing first time coding.

#### J TEJASWINI

##### Member
for (names in v2){
x = 0
if(names == 'ravi')
x = x + 1
print(paste("the name ravi appears:", x))
}

i am getting this output
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"

#sir correct me where i am wrong give me a hint pls

Last edited:

#### Sabyasachi Tripathy

##### Customer
Customer
Sir , It will be great help if you let us know which are the topics , you are planning to take for next class.
You can refer the LMS for the next topics to be covered

#### Sabyasachi Tripathy

##### Customer
Customer
Hello Sir,
In the self-learning practice for Data frames, i get the following error while trying to add a row to the dataframe.

Error:
Warning message in `[<-.factor`(`*tmp*`, ri, value = "DrStrangelove"):
“invalid factor level, NA generated”

The data frame was declared as follows :
movie3 <- data.frame(name= c("Toy Story", "Akira", "The Breakfast Club", "The Artist", "Modern Times", "Fight Club", "City of God", "The Untouchables"), year = c(1995, 1998, 1985, 2011, 1936, 1999, 2002, 1987))

I am able to add a column. However, while trying to add the row as described in the course, i am facing this error.
The code i tried to add the row is as follows :

movie3 <- rbind(movie3, c(name="DrStrangelove", year=1964, length=94))

Try converting this second part as a dataframe and do an rbind

#### Sabyasachi Tripathy

##### Customer
Customer
Hi sir,
I need to understand below operator

Difference between Elementwise logical AND and Logical AND, similarly about OR also.

View attachment 14661
We will cover this after data structures

#### Sabyasachi Tripathy

##### Customer
Customer
Odd and Even number assignment in vector. Don't know how to code.
Please refer the assignments posted by others . We have solutions there .

#### Sabyasachi Tripathy

##### Customer
Customer
for (names in v2){
x = 0
if(names == 'ravi')
x = x + 1
print(paste("the name ravi appears:", x))
}

i am getting this output
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"

#sir correct me where i am wrong give me a hint pls
Take the print statement outside the if loop . It should work .

#### Gokul Kanna R

##### New Member
hello sir ,
While doing hospital cost analysis project i couldnt split the dataset as it hits up with an error i reffered blogs but couldnt rectify it .
the error is :
Error: Must subset rows with a valid subscript vector.
i Logical subscripts must match the size of the indexed input.
x Input has size 500 but subscript `r` has size 6.

#### Reshma Kolambkar

##### Member
for (names in v2){
x = 0
if(names == 'ravi')
x = x + 1
print(paste("the name ravi appears:", x))
}

i am getting this output
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"

#sir correct me where i am wrong give me a hint pls
Hello Tejaswini,
Try placing print statement outside the {} braces

v2 <-c("Ravi", "Tejaswini", "Arun","Ravi", "Kavita")
cnt <- 0

for(val in v2){
if(val == "Ravi"){
cnt <- cnt +1
}

}
print(paste("The values is :", cnt))

#### Reshma Kolambkar

##### Member
Odd and Even number assignment in vector. Don't know how to code.
Check if element Mod 2 is zero i.e. val%%2==0
vec <- c(1:10)
for(val in vec){

if(val%%2==0){
print(paste(val, "is a even number"))
}else{
print(paste(val, "is a odd number"))
}
}

Customer

Customer
This is test

#### J TEJASWINI

##### Member
for (names in v2){
x = 0
if(names == 'ravi')
x = x + 1
}
print(paste("the name ravi appears:",x))
output:​
"the name ravi appears: 1"
 >this is the output i am getting sir ,it should be 2 right​
 >​

Take the print statement outside the if loop . It should work .

#### J TEJASWINI

##### Member
##Assignement 2
##add 5 to the elements if the elements is odd,else add 10

v4 <- c(1,2,3,4,5)
for(num in v4) {
if((num%%2)== 0){
print(num+10)
}else{
print(num+5)
}
}

output

[1] 6
[1] 12
[1] 8
[1] 14
[1] 10
 >​

#### nsen59341

##### Member
Hello Sir,

I have run the code:
v1 <- c(2,5,7,8)
arr1 <- array(1:24, dim=c(2,3,4))
lst1 <- list(v1,arr1)
summary(lst1)

The output is:
summary(lst1)
Length Class Mode
[1,] 4 -none- numeric
[2,] 24 -none- numeric

How the Classes of the vector and array came -none- in summary? It should be vector and array respectively.

#### nsen59341

##### Member
Hello Sir,

### Assignment : Provide user defined names to the row names and col names
## use dimnames while creating the matrix

vect <- c(2,4,7,6)
row.names <- c('row1','row2')
col.names <- c('col1','col2')
m1 <- matrix(vect,ncol=2,nrow=2, dimnames=list(row.names,col.names))
print(m1)

Output:

Last edited:

#### harsha007

##### Member
extracting odd and even from vector

vector=c(22,44,56,88,89)
> vector
[1] 22 44 56 88 89
> for(num in vector){
+ if(num%%2==0){
+ print('num is even')
+ }else{
+ print('num is odd')
+ }
+ }
[1] "num is even"
[1] "num is even"
[1] "num is even"
[1] "num is even"
[1] "num is odd"

i tried and got which are odd and even but how to add even num by 5 and sub 4 from odd i tried abd did not get it

#### harsha007

##### Member
f=c('sai','harsha','sai','sri','harsha','venkat','sai')
> f
[1] "sai" "harsha" "sai" "sri" "harsha" "venkat"
[7] "sai"
> for(names in f){
+ x=0
+ if(names=='sai'){
+ x=x+1
+ print(x)
+ }
+
+ }
[1] 1
[1] 1
[1] 1

i am not getting num like 3

#### Shubham Audichya

##### Member
### Assignment : Provide user defined names to the rownames and col names
## use dimnames while creating the matrix .

rownames<-c("alpha1","alpha2","alpha3","alpha4")
columnnames<-c("beta1","beta2","beta3","beta4","beta5")
m5<-matrix((4:20),nrow=4,dimnames=list(rownames,columnnames))
m5

#### Shubham Audichya

##### Member
## Assignment : Print the number of times the name Ravi appears in the given vector .

v1 <- c("Ravi","shashi","Jaya","reshma","Arun","Pulkit","Ravi")
P<- length(which(v1 == "Ravi"))
P

This solves the issue but could anybody lemme know how to execute it with "count" ??

#### nsen59341

##### Member
f=c('sai','harsha','sai','sri','harsha','venkat','sai')
> f
[1] "sai" "harsha" "sai" "sri" "harsha" "venkat"
[7] "sai"
> for(names in f){
+ x=0
+ if(names=='sai'){
+ x=x+1
+ print(x)
+ }
+
+ }
[1] 1
[1] 1
[1] 1

i am not getting num like 3
Put x=0 before starting the for loop and print(x) after ending the loop.

#### Shubham Audichya

##### Member
### Assignment : inner merge of two dfs with different name for common column
# by.x = <colname> by.y = <colname>

first_name<-c("first1","first2","first3")
second_name<-c("second1","second2","second3")
roll_no<-c("1","2","3")

class1<-data.frame(roll_no,first_name,second_name)
class1
first_name<-c("first4","first5","first6")
second_name<-c("second4","second5","second6")
roll_no<-c("4","5","6")

class2<-data.frame(roll_no,first_name,second_name, stringsAsFactors = FALSE)
class2
class_total<-rbind(class1,class2)
class_total

first_name<-c("first4","first5","first6")
second_name<-c("second4","second5","second6")
rol_no<-c("4","5","6")
class7<-data.frame(first_name,second_name,rol_no,stringsAsFactors = FALSE)
class7

class8<-merge(class_total, by.x="roll_no",class7, by.y="rol_no")
class8

Output -
roll_no first_name.x second_name.x first_name.y second_name.y
1 4 first4 second4 first4 second4
2 5 first5 second5 first5 second5
3 6 first6 second6 first6 second6

Please confirm if it is correct ?

#### J TEJASWINI

##### Member
Take the print statement outside the if loop . It should work .

extracting odd and even from vector

vector=c(22,44,56,88,89)
> vector
[1] 22 44 56 88 89
> for(num in vector){
+ if(num%%2==0){
+ print('num is even')
+ }else{
+ print('num is odd')
+ }
+ }
[1] "num is even"
[1] "num is even"
[1] "num is even"
[1] "num is even"
[1] "num is odd"

i tried and got which are odd and even but how to add even num by 5 and sub 4 from odd i tried abd did not get it
#Extract the even numbers out of a vector

my_vec <- c(5,10,15,20,25,30,35,40,45,50,55,60)
for(val in my_vec) {
if((val%%2) == 0) {
print(val)
}
}

#### J TEJASWINI

##### Member
#provide user defined names to the rows and colums using dimnames()

rownames <- c('telugu','english','maths','science','social','hindhi')

columnames <-c('unit 1','unit 2', 'quaterly', 'halfyearly', 'annual')

p<- matrix(c(61:90),nrow = 6, ncol =5, dimnames = list(rownames, columnames))
print(p)

output:
unit 1 unit 2 quaterly halfyearly annual
telugu 61 67 73 79 85
english 62 68 74 80 86
maths 63 69 75 81 87
science 64 70 76 82 88
social 65 71 77 83 89
hindhi 66 72 78 84 90
 >​

#### Reshma Kolambkar

##### Member
Working on---
#2 - Extract the time out of it and create a separate column with values " afternoon , morning , evening , night " based on the hour .
I wrote a function getTimeperiod to derive whether its " afternoon , morning , evening , night " based on the hour value passed.
Its returning value as expected.
getTimePeriod(8) # morning
getTimePeriod(13) # Afternoon
getTimePeriod(18) # evening
getTimePeriod(23) # night

However, when I call this function recursive, its returning "Evening" for all rows. I tried couple of methods. I have list 2 below.
my_df\$timePeriod <- getTimePeriod(my_df\$hHour) # Not working , returning "evening" for all rows
my_df %>% mutate(timePeriod = getTimePeriod(my_df\$hHour)) # not working, returning "evening" for all row.

Please note: my_df\$hHour hold integer value from 0 to 23.

Hi

#### Reshma Kolambkar

##### Member
Working on---
#2 - Extract the time out of it and create a separate column with values " afternoon , morning , evening , night " based on the hour .
I wrote a function getTimeperiod to derive whether its " afternoon , morning , evening , night " based on the hour value passed.
Its returning value as expected.
getTimePeriod(8) # morning
getTimePeriod(13) # Afternoon
getTimePeriod(18) # evening
getTimePeriod(23) # night

However, when I call this function recursive, its returning "Evening" for all rows. I tried couple of methods. I have list 2 below.
my_df\$timePeriod <- getTimePeriod(my_df\$hHour) # Not working , returning "evening" for all rows
my_df %>% mutate(timePeriod = getTimePeriod(my_df\$hHour)) # not working, returning "evening" for all row.

Please note: my_df\$hHour hold integer value from 0 to 23.
Hello Subya,
I manage to solve the problem however, I believe there should be a built-in function to do this.
I created an empty character vector. Iterate through the loop and assigned appropriate time period(afternoon, morning, evening and night) based on hr and then assigned vector to the data frame.

setTimeVector <- function(){
timePeriod <- vector(mode = "character", length(my_df\$customerName))
i<- 1
while (i <= length(my_df\$hHour)){
timePeriod <- getTimePeriod(my_df\$hHour)
#print( timePeriod )
i <- i+1
}
return(timePeriod)
}

my_df\$timePeriod <- setTimeVector()

o/p:
timePeriod Abbr FirstName conditionInHrs
1 Morning AA Panalopa 21 hours left
2 Morning AA Morgan 21 hours left
3 Morning AA Roiby 21 hours left
4 evening AA Tayyeb 21 hours left
5 afternoon AA Roiby 21 hours left
6 Morning AA Roiby 21 hours left

#### Sabyasachi Tripathy

##### Customer
Customer
Hello Subya,
I manage to solve the problem however, I believe there should be a built-in function to do this.
I created an empty character vector. Iterate through the loop and assigned appropriate time period(afternoon, morning, evening and night) based on hr and then assigned vector to the data frame.

setTimeVector <- function(){
timePeriod <- vector(mode = "character", length(my_df\$customerName))
i<- 1
while (i <= length(my_df\$hHour)){
timePeriod <- getTimePeriod(my_df\$hHour)
#print( timePeriod )
i <- i+1
}
return(timePeriod)
}

my_df\$timePeriod <- setTimeVector()

o/p:
timePeriod Abbr FirstName conditionInHrs
1 Morning AA Panalopa 21 hours left
2 Morning AA Morgan 21 hours left
3 Morning AA Roiby 21 hours left
4 evening AA Tayyeb 21 hours left
5 afternoon AA Roiby 21 hours left
6 Morning AA Roiby 21 hours left
Try to use cut function

#### nsen59341

##### Member
## Assignment :
### plot multiple lines for multiple continuous data in a single plot.

p <- ggplot(airquality) +
geom_density(mapping = aes(x = Ozone, y = Solar.R), stat = "identity", position = "dodge", color="red", fill="yellow") +
geom_density(mapping = aes(x = Ozone, y = Temp), stat = "identity", position = "dodge", color="blue", fill="#87CEEB") +
theme_minimal()
p

Output:

Used geom_label() function. But not able to put labels for the two different variables(Temp,Solar.R).

Last edited:

#### Attachments

• Helath Care Question 2 Error.png
21.8 KB · Views: 7

#### nsen59341

##### Member
Hi all,

I am doing Project 7. I am getting predicted LOS output as an decimals. Can anyone help me on that? It should be integer values, right?

#### Vijay Dattatray Pasalkar

##### Active Member
Hi all,
I am doing healthcare project. I have code for first statement can you help how to extract output for AGE group with max expenditure.
As I know AGE group '0' has max. expenditure, then what will be the line of code for extracting Group & expenditure combined.

age_group <- aggregate(hospital_data\$TOTCHG, by=list(hospital_data\$AGE), FUN=sum)
> age_group
Group.1 x
1 0 678118
2 1 37744
3 2 7298
4 3 30550
5 4 15992
6 5 18507
7 6 17928
8 7 10087
9 8 4741
10 9 21147
11 10 24469
12 11 14250
13 12 54912
14 13 31135
15 14 64643
16 15 111747
17 16 69149
18 17 174777

Also I used to sapply to extract max expenditure but can't know how to extract combined output with AGE with Max expenditure. means output must show the AGE group and TOTCHG. please suggest if I am wrong with using 'sapply' function.
i.e.
> sapply(list(age_group),max)
[1] 678118

#### NAGARAJA RAO DIWAKAR

##### Active Member
Hi all,
I am doing healthcare project. I have code for first statement can you help how to extract output for AGE group with max expenditure.
As I know AGE group '0' has max. expenditure, then what will be the line of code for extracting Group & expenditure combined.

age_group <- aggregate(hospital_data\$TOTCHG, by=list(hospital_data\$AGE), FUN=sum)
> age_group
Group.1 x
1 0 678118
2 1 37744
3 2 7298
4 3 30550
5 4 15992
6 5 18507
7 6 17928
8 7 10087
9 8 4741
10 9 21147
11 10 24469
12 11 14250
13 12 54912
14 13 31135
15 14 64643
16 15 111747
17 16 69149
18 17 174777

Also I used to sapply to extract max expenditure but can't know how to extract combined output with AGE with Max expenditure. means output must show the AGE group and TOTCHG. please suggest if I am wrong with using 'sapply' function.
i.e.
> sapply(list(age_group),max)
[1] 678118
age_group[which.max(age_group\$x , )]
you can fetch the row which age group has maximum expense.

#### Vijay Dattatray Pasalkar

##### Active Member
age_group[which.max(age_group\$x , )]
you can fetch the row which age group has maximum expense.
Hi Nagaraja,
Thank you for feedback. I tried this code but it is showing below error.

> age_group[which.max(age_group\$x , )]
Error in which.max(age_group\$x, ) : unused argument (alist())

#### Vijay Dattatray Pasalkar

##### Active Member
Hi Nagaraja,
Thank you for feedback. I tried this code but it is showing below error.

> age_group[which.max(age_group\$x , )]
Error in which.max(age_group\$x, ) : unused argument (alist())
Hi Nagaraja,
I tweaked my code little and I got the answer thanks.

Reagrds,
Vijay

#### Avneesh Arun

##### Member
Hi Sir,

Thank you for your guidance , I have completed the certification.

Regards
Avneesh

#### Vijay Dattatray Pasalkar

##### Active Member
Hi Sabya Sir,
I created the code for healthcare project, I am stuck up at creating train and test data set.

Project 7 - Healthcare project

hospital_data
dim(hospital_data)

I used following code for creating train and test .

## Problem Statement-5:
# Since the length of stay is the crucial factor for inpatients, the agency
# wants to find if the length of stay can be predicted from age, gender, and race.

# Step-5: Prepare linear regression model to predict length of stay based on age, gender and race.

# Dropping variables that are not needed for prediction.

hospital_data = subset(hospital_data, select = -c(TOTCHG,APRDRG) )

names(hospital_data)

# Seed initializes the randomness

set.seed(100)

split = sample.split(hospital_data,SplitRatio = 0.7)

trainingSet = subset(hospital_data, split == T)

testSet = subset(hospital_data, split == F)

str(trainingSet)

dim(trainingSet)

OUTPUT: showing some error

> hospital_data = subset(hospital_data, select = -c(TOTCHG,APRDRG) )
Error in eval(substitute(select), nl, parent.frame()) :
>
> names(hospital_data)
[1] "AGE" "FEMALE" "LOS" "RACE"
>
>
> # Seed initializes the randomness
>
> set.seed(100)
>
> split = sample.split(hospital_data,SplitRatio = 0.7)
>
> trainingSet = subset(hospital_data, split == T)
Warning message:
Length of logical index must be 1 or 500, not 4
>
> testSet = subset(hospital_data, split == F)
Warning message:
Length of logical index must be 1 or 500, not 4
>
>
> str(trainingSet)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 250 obs. of 4 variables:
\$ AGE : num 17 17 17 16 17 15 15 15 14 14 ...
\$ FEMALE: num 0 1 0 1 1 0 1 1 1 1 ...
\$ LOS : num 2 1 0 2 2 2 4 4 4 3 ...
\$ RACE : num 1 1 1 1 1 1 1 1 1 1 ...
>
> dim(trainingSet)
[1] 250 4

#### Prahas Onteddu

##### Member
Hi everyone! I am doing the healthcare project, can anyone who has finished share the write up please?

##### New Member
Hi Everyone,

For healthcare project, point5 - we are having very less co-relation among the column hence the prediction is not possible.

Is there any other opinion, anyone is having.

@sabyasachi (3882)

#### Reshma Kolambkar

##### Member
Hi Everyone,

For healthcare project, point5 - we are having very less co-relation among the column hence the prediction is not possible.

Is there any other opinion, anyone is having.

@sabyasachi (3882)

For question 5, even I tried multiple ways to predict LOS however, Coefficiency was always low. So submitted project with the conclusion "Length of stay(LOS) can not be predicted based on age, gender, and race."

Project is accepted so I assume that's the expected answer for Q5.

#### Vijay Dattatray Pasalkar

##### Active Member
Hi Team,
I got certified with R programming. Thanks for Sabya sir and support team from Simplilearn.

Regards,
Vijay