Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Data Science with R | Sabyasanchi | March 13 - April 11 | 2021

Hi Sir,

Please find 2nd Day class assignment answer. Request you to check the solution.

Regards
Avneesh
 

Attachments

  • Day 2 assignment 1.zip
    650 bytes · Views: 24

karthigeyan0312

Administrator
Alumni
Customer
Hello Sir,
In the self-learning practice for Data frames, i get the following error while trying to add a row to the dataframe.


Error:
Warning message in `[<-.factor`(`*tmp*`, ri, value = "DrStrangelove"):
“invalid factor level, NA generated”

The data frame was declared as follows :
movie3 <- data.frame(name= c("Toy Story", "Akira", "The Breakfast Club", "The Artist", "Modern Times", "Fight Club", "City of God", "The Untouchables"), year = c(1995, 1998, 1985, 2011, 1936, 1999, 2002, 1987))

I am able to add a column. However, while trying to add the row as described in the course, i am facing this error.
The code i tried to add the row is as follows :

movie3 <- rbind(movie3, c(name="DrStrangelove", year=1964, length=94))

Please guide me.
 

Attachments

  • Add_Row_Error.jpg
    Add_Row_Error.jpg
    51.2 KB · Views: 9
In the assignment given for Day 2 for vector v1 <- c("Ravi","Shashi","Jaya","Reshma","Arun","Pulkit","Ravi"). Finding Ravi no. of times repeating finding is easy as we can count physically. Hence the coding given in answer sheet is probably right. However if the dataset is huge in vector v1 I think this coding doesn't help if I am not wrong.
Coding in such cases might be as below:
## Assignment : Print the number of times the name Ravi appears in the given vector .
as.data.frame(table(v1))
Please correct me, if I am wrong. I am also very new to coding. I am doing first time coding.
 

J TEJASWINI

Member
v2<-c('radha','aruna','pankaj','ravi','fareeda','leena','ravi')
for (names in v2){
x = 0
if(names == 'ravi')
x = x + 1
print(paste("the name ravi appears:", x))
}

i am getting this output
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"


#sir correct me where i am wrong give me a hint pls
 
Last edited:

Sabyasachi Tripathy

Customer
Customer
Hello Sir,
In the self-learning practice for Data frames, i get the following error while trying to add a row to the dataframe.


Error:
Warning message in `[<-.factor`(`*tmp*`, ri, value = "DrStrangelove"):
“invalid factor level, NA generated”

The data frame was declared as follows :
movie3 <- data.frame(name= c("Toy Story", "Akira", "The Breakfast Club", "The Artist", "Modern Times", "Fight Club", "City of God", "The Untouchables"), year = c(1995, 1998, 1985, 2011, 1936, 1999, 2002, 1987))

I am able to add a column. However, while trying to add the row as described in the course, i am facing this error.
The code i tried to add the row is as follows :

movie3 <- rbind(movie3, c(name="DrStrangelove", year=1964, length=94))

Please guide me.
Try converting this second part as a dataframe and do an rbind
 

Sabyasachi Tripathy

Customer
Customer
v2<-c('radha','aruna','pankaj','ravi','fareeda','leena','ravi')
for (names in v2){
x = 0
if(names == 'ravi')
x = x + 1
print(paste("the name ravi appears:", x))
}

i am getting this output
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"


#sir correct me where i am wrong give me a hint pls
Take the print statement outside the if loop . It should work .
 

Gokul Kanna R

New Member
hello sir ,
While doing hospital cost analysis project i couldnt split the dataset as it hits up with an error i reffered blogs but couldnt rectify it .
the error is :
Error: Must subset rows with a valid subscript vector.
i Logical subscripts must match the size of the indexed input.
x Input has size 500 but subscript `r` has size 6.
 
v2<-c('radha','aruna','pankaj','ravi','fareeda','leena','ravi')
for (names in v2){
x = 0
if(names == 'ravi')
x = x + 1
print(paste("the name ravi appears:", x))
}

i am getting this output
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 0"
[1] "the name ravi appears: 1"


#sir correct me where i am wrong give me a hint pls
Hello Tejaswini,
Try placing print statement outside the {} braces

v2 <-c("Ravi", "Tejaswini", "Arun","Ravi", "Kavita")
cnt <- 0

for(val in v2){
if(val == "Ravi"){
cnt <- cnt +1
}

}
print(paste("The values is :", cnt))
 

J TEJASWINI

Member
v2<-c('radha','aruna','pankaj','ravi','fareeda','leena','ravi')
for (names in v2){
x = 0
if(names == 'ravi')
x = x + 1
}
print(paste("the name ravi appears:",x))
output:​
"the name ravi appears: 1"
>this is the output i am getting sir ,it should be 2 right​
>​

Take the print statement outside the if loop . It should work .
 

J TEJASWINI

Member
##Assignement 2
##add 5 to the elements if the elements is odd,else add 10


v4 <- c(1,2,3,4,5)
for(num in v4) {
if((num%%2)== 0){
print(num+10)
}else{
print(num+5)
}
}


output

[1] 6
[1] 12
[1] 8
[1] 14
[1] 10
>​
 

nsen59341

Member
Hello Sir,

I have run the code:
v1 <- c(2,5,7,8)
arr1 <- array(1:24, dim=c(2,3,4))
lst1 <- list(v1,arr1)
summary(lst1)

The output is:
summary(lst1)
Length Class Mode
[1,] 4 -none- numeric
[2,] 24 -none- numeric

How the Classes of the vector and array came -none- in summary? It should be vector and array respectively.
 

nsen59341

Member
Hello Sir,

### Assignment : Provide user defined names to the row names and col names
## use dimnames while creating the matrix

vect <- c(2,4,7,6)
row.names <- c('row1','row2')
col.names <- c('col1','col2')
m1 <- matrix(vect,ncol=2,nrow=2, dimnames=list(row.names,col.names))
print(m1)


Output:
1616257542283.png
 
Last edited:

harsha007

Member
extracting odd and even from vector

vector=c(22,44,56,88,89)
> vector
[1] 22 44 56 88 89
> for(num in vector){
+ if(num%%2==0){
+ print('num is even')
+ }else{
+ print('num is odd')
+ }
+ }
[1] "num is even"
[1] "num is even"
[1] "num is even"
[1] "num is even"
[1] "num is odd"

i tried and got which are odd and even but how to add even num by 5 and sub 4 from odd i tried abd did not get it
 

harsha007

Member
f=c('sai','harsha','sai','sri','harsha','venkat','sai')
> f
[1] "sai" "harsha" "sai" "sri" "harsha" "venkat"
[7] "sai"
> for(names in f){
+ x=0
+ if(names=='sai'){
+ x=x+1
+ print(x)
+ }
+
+ }
[1] 1
[1] 1
[1] 1

i am not getting num like 3
 
### Assignment : Provide user defined names to the rownames and col names
## use dimnames while creating the matrix .

rownames<-c("alpha1","alpha2","alpha3","alpha4")
columnnames<-c("beta1","beta2","beta3","beta4","beta5")
m5<-matrix((4:20),nrow=4,dimnames=list(rownames,columnnames))
m5
 
## Assignment : Print the number of times the name Ravi appears in the given vector .

v1 <- c("Ravi","shashi","Jaya","reshma","Arun","Pulkit","Ravi")
P<- length(which(v1 == "Ravi"))
P

This solves the issue but could anybody lemme know how to execute it with "count" ??
 

nsen59341

Member
f=c('sai','harsha','sai','sri','harsha','venkat','sai')
> f
[1] "sai" "harsha" "sai" "sri" "harsha" "venkat"
[7] "sai"
> for(names in f){
+ x=0
+ if(names=='sai'){
+ x=x+1
+ print(x)
+ }
+
+ }
[1] 1
[1] 1
[1] 1

i am not getting num like 3
Put x=0 before starting the for loop and print(x) after ending the loop.
 
### Assignment : inner merge of two dfs with different name for common column
# by.x = <colname> by.y = <colname>

first_name<-c("first1","first2","first3")
second_name<-c("second1","second2","second3")
roll_no<-c("1","2","3")

class1<-data.frame(roll_no,first_name,second_name)
class1
first_name<-c("first4","first5","first6")
second_name<-c("second4","second5","second6")
roll_no<-c("4","5","6")

class2<-data.frame(roll_no,first_name,second_name, stringsAsFactors = FALSE)
class2
class_total<-rbind(class1,class2)
class_total

first_name<-c("first4","first5","first6")
second_name<-c("second4","second5","second6")
rol_no<-c("4","5","6")
class7<-data.frame(first_name,second_name,rol_no,stringsAsFactors = FALSE)
class7

class8<-merge(class_total, by.x="roll_no",class7, by.y="rol_no")
class8

Output -
roll_no first_name.x second_name.x first_name.y second_name.y
1 4 first4 second4 first4 second4
2 5 first5 second5 first5 second5
3 6 first6 second6 first6 second6

Please confirm if it is correct ?
 

J TEJASWINI

Member
Take the print statement outside the if loop . It should work .

extracting odd and even from vector

vector=c(22,44,56,88,89)
> vector
[1] 22 44 56 88 89
> for(num in vector){
+ if(num%%2==0){
+ print('num is even')
+ }else{
+ print('num is odd')
+ }
+ }
[1] "num is even"
[1] "num is even"
[1] "num is even"
[1] "num is even"
[1] "num is odd"

i tried and got which are odd and even but how to add even num by 5 and sub 4 from odd i tried abd did not get it
#Extract the even numbers out of a vector

my_vec <- c(5,10,15,20,25,30,35,40,45,50,55,60)
for(val in my_vec) {
if((val%%2) == 0) {
print(val)
}
}
 

J TEJASWINI

Member
#provide user defined names to the rows and colums using dimnames()

rownames <- c('telugu','english','maths','science','social','hindhi')

columnames <-c('unit 1','unit 2', 'quaterly', 'halfyearly', 'annual')

p<- matrix(c(61:90),nrow = 6, ncol =5, dimnames = list(rownames, columnames))
print(p)


output:
unit 1 unit 2 quaterly halfyearly annual
telugu 61 67 73 79 85
english 62 68 74 80 86
maths 63 69 75 81 87
science 64 70 76 82 88
social 65 71 77 83 89
hindhi 66 72 78 84 90
>​
 
Working on---
#2 - Extract the time out of it and create a separate column with values " afternoon , morning , evening , night " based on the hour .
I wrote a function getTimeperiod to derive whether its " afternoon , morning , evening , night " based on the hour value passed.
Its returning value as expected.
getTimePeriod(8) # morning
getTimePeriod(13) # Afternoon
getTimePeriod(18) # evening
getTimePeriod(23) # night

However, when I call this function recursive, its returning "Evening" for all rows. I tried couple of methods. I have list 2 below.
my_df$timePeriod <- getTimePeriod(my_df$hHour) # Not working , returning "evening" for all rows
my_df %>% mutate(timePeriod = getTimePeriod(my_df$hHour)) # not working, returning "evening" for all row.

Please note: my_df$hHour hold integer value from 0 to 23.
 
Working on---
#2 - Extract the time out of it and create a separate column with values " afternoon , morning , evening , night " based on the hour .
I wrote a function getTimeperiod to derive whether its " afternoon , morning , evening , night " based on the hour value passed.
Its returning value as expected.
getTimePeriod(8) # morning
getTimePeriod(13) # Afternoon
getTimePeriod(18) # evening
getTimePeriod(23) # night

However, when I call this function recursive, its returning "Evening" for all rows. I tried couple of methods. I have list 2 below.
my_df$timePeriod <- getTimePeriod(my_df$hHour) # Not working , returning "evening" for all rows
my_df %>% mutate(timePeriod = getTimePeriod(my_df$hHour)) # not working, returning "evening" for all row.

Please note: my_df$hHour hold integer value from 0 to 23.
Hello Subya,
I manage to solve the problem however, I believe there should be a built-in function to do this.
I created an empty character vector. Iterate through the loop and assigned appropriate time period(afternoon, morning, evening and night) based on hr and then assigned vector to the data frame.

setTimeVector <- function(){
timePeriod <- vector(mode = "character", length(my_df$customerName))
i<- 1
while (i <= length(my_df$hHour)){
timePeriod <- getTimePeriod(my_df$hHour)
#print( timePeriod )
i <- i+1
}
return(timePeriod)
}

my_df$timePeriod <- setTimeVector()

o/p:
timePeriod Abbr FirstName conditionInHrs
1 Morning AA Panalopa 21 hours left
2 Morning AA Morgan 21 hours left
3 Morning AA Roiby 21 hours left
4 evening AA Tayyeb 21 hours left
5 afternoon AA Roiby 21 hours left
6 Morning AA Roiby 21 hours left
 

Sabyasachi Tripathy

Customer
Customer
Hello Subya,
I manage to solve the problem however, I believe there should be a built-in function to do this.
I created an empty character vector. Iterate through the loop and assigned appropriate time period(afternoon, morning, evening and night) based on hr and then assigned vector to the data frame.

setTimeVector <- function(){
timePeriod <- vector(mode = "character", length(my_df$customerName))
i<- 1
while (i <= length(my_df$hHour)){
timePeriod <- getTimePeriod(my_df$hHour)
#print( timePeriod )
i <- i+1
}
return(timePeriod)
}

my_df$timePeriod <- setTimeVector()

o/p:
timePeriod Abbr FirstName conditionInHrs
1 Morning AA Panalopa 21 hours left
2 Morning AA Morgan 21 hours left
3 Morning AA Roiby 21 hours left
4 evening AA Tayyeb 21 hours left
5 afternoon AA Roiby 21 hours left
6 Morning AA Roiby 21 hours left
Try to use cut function
 

nsen59341

Member
## Assignment :
### plot multiple lines for multiple continuous data in a single plot.


p <- ggplot(airquality) +
geom_density(mapping = aes(x = Ozone, y = Solar.R), stat = "identity", position = "dodge", color="red", fill="yellow") +
geom_density(mapping = aes(x = Ozone, y = Temp), stat = "identity", position = "dodge", color="blue", fill="#87CEEB") +
theme_minimal()
p


Output:
1616949050947.png


Used geom_label() function. But not able to put labels for the two different variables(Temp,Solar.R).
 
Last edited:

Hi All My Code is not working for the Healthcare project.​

Can anyone suggest the correct code ?​

Question 2 - In order of severity of the diagnosis and treatments and to find out the expensive treatments, the agency wants to find the diagnosis related group that has maximum hospitalization and expenditure.​

Solution :-​

library(dplyr)​

hosp <-read.csv('HospitalCosts.csv',header = T)​

View(hosp)​

hosp$APRDRG <- as.factor(hosp$APRDRG)​

summary(hosp$APRDRG)​

which.max(summary(hosp$APRDRG))​

tapply(hosp$TOTCHG,hosp,sum)​

diag_cost <- aggregate(TOTCHG ~ APRDRG, FUN = sum, data = hosp)​

diag_cost[which.max(diag_cost$TOTCHG),]​

diag_cost​

I am attaching error code.​

 

Attachments

  • Helath Care Question 2 Error.png
    Helath Care Question 2 Error.png
    21.8 KB · Views: 7

nsen59341

Member
Hi all,

I am doing Project 7. I am getting predicted LOS output as an decimals. Can anyone help me on that? It should be integer values, right?
 
Hi all,
I am doing healthcare project. I have code for first statement can you help how to extract output for AGE group with max expenditure.
As I know AGE group '0' has max. expenditure, then what will be the line of code for extracting Group & expenditure combined.

age_group <- aggregate(hospital_data$TOTCHG, by=list(hospital_data$AGE), FUN=sum)
> age_group
Group.1 x
1 0 678118
2 1 37744
3 2 7298
4 3 30550
5 4 15992
6 5 18507
7 6 17928
8 7 10087
9 8 4741
10 9 21147
11 10 24469
12 11 14250
13 12 54912
14 13 31135
15 14 64643
16 15 111747
17 16 69149
18 17 174777


Also I used to sapply to extract max expenditure but can't know how to extract combined output with AGE with Max expenditure. means output must show the AGE group and TOTCHG. please suggest if I am wrong with using 'sapply' function.
i.e.
> sapply(list(age_group),max)
[1] 678118
 
Hi all,
I am doing healthcare project. I have code for first statement can you help how to extract output for AGE group with max expenditure.
As I know AGE group '0' has max. expenditure, then what will be the line of code for extracting Group & expenditure combined.

age_group <- aggregate(hospital_data$TOTCHG, by=list(hospital_data$AGE), FUN=sum)
> age_group
Group.1 x
1 0 678118
2 1 37744
3 2 7298
4 3 30550
5 4 15992
6 5 18507
7 6 17928
8 7 10087
9 8 4741
10 9 21147
11 10 24469
12 11 14250
13 12 54912
14 13 31135
15 14 64643
16 15 111747
17 16 69149
18 17 174777


Also I used to sapply to extract max expenditure but can't know how to extract combined output with AGE with Max expenditure. means output must show the AGE group and TOTCHG. please suggest if I am wrong with using 'sapply' function.
i.e.
> sapply(list(age_group),max)
[1] 678118
Please use below code:
age_group[which.max(age_group$x , )]
you can fetch the row which age group has maximum expense.
 
Hi Sabya Sir,
I created the code for healthcare project, I am stuck up at creating train and test data set.

Project 7 - Healthcare project

hospital_data <- read_excel('1555054100_hospitalcosts.xlsx')
hospital_data
dim(hospital_data)

I used following code for creating train and test .

## Problem Statement-5:
# Since the length of stay is the crucial factor for inpatients, the agency
# wants to find if the length of stay can be predicted from age, gender, and race.

# Step-5: Prepare linear regression model to predict length of stay based on age, gender and race.

# Dropping variables that are not needed for prediction.

hospital_data = subset(hospital_data, select = -c(TOTCHG,APRDRG) )

names(hospital_data)

# Seed initializes the randomness

set.seed(100)

split = sample.split(hospital_data,SplitRatio = 0.7)

trainingSet = subset(hospital_data, split == T)

testSet = subset(hospital_data, split == F)


str(trainingSet)

dim(trainingSet)

OUTPUT: showing some error

> hospital_data = subset(hospital_data, select = -c(TOTCHG,APRDRG) )
Error in eval(substitute(select), nl, parent.frame()) :
object 'TOTCHG' not found
>
> names(hospital_data)
[1] "AGE" "FEMALE" "LOS" "RACE"
>
>
> # Seed initializes the randomness
>
> set.seed(100)
>
> split = sample.split(hospital_data,SplitRatio = 0.7)
>
> trainingSet = subset(hospital_data, split == T)
Warning message:
Length of logical index must be 1 or 500, not 4
>
> testSet = subset(hospital_data, split == F)
Warning message:
Length of logical index must be 1 or 500, not 4
>
>
> str(trainingSet)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 250 obs. of 4 variables:
$ AGE : num 17 17 17 16 17 15 15 15 14 14 ...
$ FEMALE: num 0 1 0 1 1 0 1 1 1 1 ...
$ LOS : num 2 1 0 2 2 2 4 4 4 3 ...
$ RACE : num 1 1 1 1 1 1 1 1 1 1 ...
>
> dim(trainingSet)
[1] 250 4
 
Hi Everyone,

For healthcare project, point5 - we are having very less co-relation among the column hence the prediction is not possible.

Is there any other opinion, anyone is having.

Thank you for your reply

@sabyasachi (3882)
Hello Pradeepkumar,

For question 5, even I tried multiple ways to predict LOS however, Coefficiency was always low. So submitted project with the conclusion "Length of stay(LOS) can not be predicted based on age, gender, and race."

Project is accepted so I assume that's the expected answer for Q5.
 
Top