Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

data-science-with-r-sabyasanchi-march-13-april-11-2021.65148

Vijay Dattatray Pasalkar

Member
Hi Sir,
I need to understand the meaning of following

What is difference between Element-wise logical AND and Logical AND
Also difference between Element-wise logical OR and Logical OR

Regards,
Vijay

Vijay Dattatray Pasalkar

Member
Hi Sir,
I write the code as below

Quest: How that output value came as 11, 14 - 5 times, 12,15- 5 times and 13 -5 times

v <- c(11,12,13,14,15)

for (val in v){

print(paste("The Value is:" , v))
}

Output:
> v <- c(11,12,13,14,15)
>
> for (val in v){
+
+ print(paste("The Value is:" , v))
+ }
[1] "The Value is: 11" "The Value is: 12" "The Value is: 13"
[4] "The Value is: 14" "The Value is: 15"
[1] "The Value is: 11" "The Value is: 12" "The Value is: 13"
[4] "The Value is: 14" "The Value is: 15"
[1] "The Value is: 11" "The Value is: 12" "The Value is: 13"
[4] "The Value is: 14" "The Value is: 15"
[1] "The Value is: 11" "The Value is: 12" "The Value is: 13"
[4] "The Value is: 14" "The Value is: 15"
[1] "The Value is: 11" "The Value is: 12" "The Value is: 13"
[4] "The Value is: 14" "The Value is: 15"

DHRUV GAGNEJA_1

New Member

print(paste("The Value is:" , v), unique= TRUE)

sujit_35

Member
### Assignment1 : Print the number of times the name Ravi appears in the given vector
v1 = c("Ravi","sashi","Jaya","reshma","Arun","Pulkit","Ravi")
count_name = 0
for (names in v1){
if (names == "Ravi"){
count_name = count_name + 1
print(count_name)
}
}

I am sure there will be much simpler way to count the number of times the word appear in a vector. Please advise

#### Assignment 2: Add 5 to the element if it is odd or add 10

v3 =c(1,2,3,4,5,6)

for (num in v3)
{
if (num%%2 == 0)
{
print(num +5)
}else
{
print(num+10)
}
}

## Assignment 3 - Extract even numbers out of this vector my_vec
my_vec = c(0,5,10,15,20,25,30,35,40,45,50,55,60)
for (val in my_vec){
if (val%%2 ==0 & val!=0){
print(val)
}
}

sujit_35

Member
## exercise: add 5 to 4th element and 6 to 5th element

vec_1 =c(34,23,56,89,67)
i=1

for (num in vec_1) {
if (vec_1==vec_1[4]){
print(vec_1+5)
}
else if (vec_1==vec_1[5]){
print(vec_1+6)
}

else {
print(vec_1)
}
i=i+1
}

sujit_35

Member
# Assignment 4 : Provide user defined names to the rownames and col names
## use dimnames while creating the matrix

column.names=c("Col1","Col2","Col3")
row.names=c("Row1","Row2","Row3")
m2 = matrix(1:12, nrow =3,ncol=3, dimnames = list(row.names,column.names))
m2

sujit_35

Member
Hi Sir,
I have below cases where in the first data frame, i have f_name, s_name and Age. Similarly, in the 2nd df, i have f_name, s_name and Age and additional field "sex". when i run the below code, i get "numbers of columns of arguments do not match". Does it mean that both the data frame needs to have the same fields for rbind to work?

##Exercise, two different col names are different

f_name = c('sam','ram','Tim')
s_name = c('john',"singh",'jacob')
Age = c(10,20,23)

ds1=data.frame(f_name,s_name,Age)
ds1

f_name = c('John','Santhy','Toy')
s_name = c('sims',"kumar",'Brown')
Age = c(20,30,40)
Sex = c('Male','Female','Male')

ds2 = data.frame(f_name,s_name,Age,Sex)
ds2

ds3 =rbind(ds1,ds2)
ds3

sujit_35

Member
### Assignments: inner merge of two dfs with different names for common columns
#hint: by.x =<colname by.y =<colnames>

first_name = c('first1','first2','first3')
second_name = c('second1','second2','second3')
roll_no = c('1','2','3')

df1 = data.frame(roll_no,first_name,second_name,stringsAsFactors = FALSE)
df1

maths = c(45,78,68)
science = c(56,73,56)
history = c(89,78,95)
english = c(60,78,90)
rollno = c('3','1','2')

df2 = data.frame(rollno,maths,science,history,english,stringsAsFactors = FALSE)
df2

my_df = merge(df1,df2, by.x="roll_no",by.y="rollno")
my_df

Richard Osei Bofah

Member
Hi Sir,

Please find my solutions to assignments 1 and 2.

For each assignment, I modified the codes in a SECOND OPTION (without using the 'count') but surprisingly got the same results.

I was doubting if the SECOND OPTIONS will always give the same results as the first options

Thank you
Richard

 > ##Assignments: > > > #############Assignment 1: Print the number of times Ravi appears in the given vector: > > ##OPTION ONE > v1<-c("Ravi", "Shashi", "Jaya", "Reshma", "Arun","Pulkit","Ravi") > > > for (names in v1) { + count<-0 + if (names=="Ravi") { + print(paste("The value is",names)) ##only print if the condition is met + + count=count+1 + } + + } #Answer: Ravi appears two times [1] "The value is Ravi" [1] "The value is Ravi" > > > > ##OPTION TWO > > ##Without introducing the incremental conditions of "count", I still get same answer. Any reason? > #Thus: > > > for (names in v1) { + if (names=="Ravi") { + print(paste("The value is",names)) ##only print if the condition is met + + } + + } ##Same answer as before which had the "count": Any Reason sir or colleagues [1] "The value is Ravi" [1] "The value is Ravi" > > > > > #########Assignment 2: Add 5 to the elements if the element is old. Else add 10 the element > > ##OPTION ONE > > v2<-c(1,2,3,4,5,6) > > for (num in v2) { + count<-0 + if (num%%2!=0) { ##Thus, old numbers are not divisible by 2 + print(num+5) + count=count+1 + }else { + print(num+10) ##for non-old numbers + } + } [1] 6 [1] 12 [1] 8 [1] 14 [1] 10 [1] 16 > > > > ##OPTION TWO > > ##Similar to assigmnent 1, Without introducing the incremental conditions of "count", I still get same answer. Any reason? > #Thus: > > for (num in v2) { + if (num%%2!=0) { + print(num+5) + }else { + print(num+10) + } + } ##same answer: Any Reason sir or colleagues [1] 6 [1] 12 [1] 8 [1] 14 [1] 10 [1] 16​ ​

Last edited:

sujit_35

Member
Hi Sir,
I write the code as below

Quest: How that output value came as 11, 14 - 5 times, 12,15- 5 times and 13 -5 times

v <- c(11,12,13,14,15)

for (val in v){

print(paste("The Value is:" , v))
}

Output:
> v <- c(11,12,13,14,15)
>
> for (val in v){
+
+ print(paste("The Value is:" , v))
+ }
[1] "The Value is: 11" "The Value is: 12" "The Value is: 13"
[4] "The Value is: 14" "The Value is: 15"
[1] "The Value is: 11" "The Value is: 12" "The Value is: 13"
[4] "The Value is: 14" "The Value is: 15"
[1] "The Value is: 11" "The Value is: 12" "The Value is: 13"
[4] "The Value is: 14" "The Value is: 15"
[1] "The Value is: 11" "The Value is: 12" "The Value is: 13"
[4] "The Value is: 14" "The Value is: 15"
[1] "The Value is: 11" "The Value is: 12" "The Value is: 13"
[4] "The Value is: 14" "The Value is: 15"
you have:
print(paste("The Value is:" , v)), this will print all the vector (11,12,13,14,15) 5 times, since you have given v

correct way is:
print(paste("The Value is:" , val)), this will print the elements in the vector.

put val in place of v

Richard Osei Bofah

Member
Dear Sir, please find my submission to assignment 3.

#########Assignment 3: Extract the even numbers out of this vector (my_vect)

##MY R SCRIPT (CODES)

my_vec<-c(0,5,10,15,20,25,30,35,40,45,50,55,60)

for (val in my_vec) {

if (val%%2==0 &&val!=0) ##exclude zero (0) also
print(paste("This is an eevn number of:", val))
}

##OUTPUT FROM THE CONSOLE

> #########Assignment 3: Extract the even numbers out of this vector (my_vect)
>
> my_vec<-c(0,5,10,15,20,25,30,35,40,45,50,55,60)
>
> for (val in my_vec) {
+
+ if (val%%2==0 &&val!=0) ##exclude zero (0) also
+ print(paste("This is an eevn number of:", val))
+ }
[1] "This is an eevn number of: 10"
[1] "This is an eevn number of: 20"
[1] "This is an eevn number of: 30"
[1] "This is an eevn number of: 40"
[1] "This is an eevn number of: 50"
[1] "This is an eevn number of: 60"

Akshay Pandurang Paunikar

New Member
Hello Sir,
Please check the below code :

# Assignment - inner merge of 2 data frames with diff name for common column #

name <- c("Alex","Bob","Charlie")

age <- c(24,22,21)

roll_no <- c("1","2","3")

Student_data <- data.frame(roll_no,name,age,stringsAsFactors = FALSE)

Student_data

name <- c("Denise","Ellie","Franklin")

age <- c(20,21,22)

rollno <- c("4","5","6")

Student_data_1 <- data.frame(rollno,name,age,stringsAsFactors = FALSE)

Student_data_1

Total_Student <- merge(Student_data,Student_data_1,by.x = c("roll_no","name","age"),by.y = c("rollno","name","age"),all = TRUE)

Total_Student

Output :

roll_no name age
1 1 Alex 24
2 2 Bob 22
3 3 Charlie 21
4 4 Denise 20
5 5 Ellie 21
6 6 Franklin 22

Richard Osei Bofah

Member
Dear Sir,

Please find my solution to assignment 4 below:

#############Assignment 4: Provide user defined names to row names and column names: Using dimnames

###MY R SCRIPT

mtx<-matrix(1:15, nrow=3)
mtx
dim(mtx) ##we have 3 by 5 matrix

dimnames(mtx)<-list(c("rowname1","rowname2","rowname3"),c("colname1","colname2","colname3","colname4","colname5"))
mtx

###OUTPUT FROM THE CONSOLE

> #############Assignment 4: Provide user defined names to row names and column names: Using dimnames
>
> mtx<-matrix(1:15, nrow=3)
> mtx
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15
> dim(mtx) ##we have 3 by 5 matrix
[1] 3 5
>
> dimnames(mtx)<-list(c("rowname1","rowname2","rowname3"),c("colname1","colname2","colname3","colname4","colname5"))
> mtx
colname1 colname2 colname3 colname4 colname5
rowname1 1 4 7 10 13
rowname2 2 5 8 11 14
rowname3 3 6 9 12 15

Richard Osei Bofah

Member
I need help:
I have been getting NA anything I try to extract some elements from the list below.

list1<-list("1","2","3")
list2<-list("sun","Mon","Tue")
merged_list<-c(list1,list2)
merged_list
class(merged_list)

merged_list[[1]][2] ##WHY NA?

merged_list[[2]][3] ##WHY NA?

sujit_35

Member
I need help:
I have been getting NA anything I try to extract some elements from the list below.

list1<-list("1","2","3")
list2<-list("sun","Mon","Tue")
merged_list<-c(list1,list2)
merged_list
class(merged_list)

merged_list[[1]][2] ##WHY NA?

merged_list[[2]][3] ##WHY NA?
Your "merged_list", is the combination of list1 and list2 and this "merged_list" has 6 elements now, 1,2,3,sun,Mon and Tue. So if we check,
merged_list[[1]] will give 1, similarly merged_list[[2]] will give 2 and so on. So there are only 6 elements. Bur if we write th code merged_list[[1]][2], we are basically trying to find the 2nd element within the first element 1. but here there is only one value in the first element. That is the reason it gives NA. is my understanding.

list1<-list("1","2","3")
list2<-list("sun","Mon","Tue")
merged_list<-list(list1,list2) ## instead of "c(list1,list2)", we use list(list1,list2)
merged_list

the the output of:
merged_list[[1]][2] will give the answer = '2'

since in the merged_list, now, there are two elements list1 and list2 and the 2nd element within list1 = 2

sujit_35

Member
Hi All, I have these lines of code but it does not seems to work, what am i doing wrong, any help would be appreciated.

hyundai1 = subset(US_Car_df, brand == 'hyundai' , model == 'mpv')
View(hyundai1)

hyundai1 = subset(US_Car_df, color == 'white' && brand == 'hyundai')
View(hyundai1)

honda = subset(US_Car_df, brand == 'honda' && color == 'black')
View(honda)

Support Simplilearn(4685)

Moderator
Staff member
Alumni
Hi All, I have these lines of code but it does not seems to work, what am i doing wrong, any help would be appreciated.

hyundai1 = subset(US_Car_df, brand == 'hyundai' , model == 'mpv')
View(hyundai1)

hyundai1 = subset(US_Car_df, color == 'white' && brand == 'hyundai')
View(hyundai1)

honda = subset(US_Car_df, brand == 'honda' && color == 'black')
View(honda)
Hi Sujit,

You can try the below code while you are subsetting more than one column.

hyundai = subset(df, model == 'mpv' & brand == 'hyundai')
hyundai

I hope this helps you.

Happy Learning !!!

munnasharma0601

New Member
can somebody share the practice files from the 3/21/2021 dated live class of data science with r programming from trainer sabyasachi.

Divya Harish

New Member
ASSIGNMENT-
Innermerge 2 dfs with diff names for common column

CASE1
RSCRIPT:
rollno=c("1","2","3","4")
fname=c("alpha","beta","gama","theta")
sname=c("ina","mina","dika","rola")
first_class=data.frame(rollno, fname, sname,stringsAsFactors = FALSE)
first_class

rollno=c("5","6","7")
firstn=c("alpha1","beta1","gama1")
sname=c("ina1","mina1","dika1")
second_class=data.frame(rollno,firstn,sname,stringsAsFactors = FALSE)
second_class

totalclass=merge(first_class,second_class,by.x=c("rollno","fname","sname"),by.y=c("rollno","firstn","sname"),all=TRUE)
totalclass

OUTPUT:

> rollno=c("1","2","3","4")
> fname=c("alpha","beta","gama","theta")
> sname=c("ina","mina","dika","rola")
> first_class=data.frame(rollno, fname, sname,stringsAsFactors = FALSE)
> first_class
rollno fname sname
1 1 alpha ina
2 2 beta mina
3 3 gama dika
4 4 theta rola
>
>
> rollno=c("5","6","7")
> firstn=c("alpha1","beta1","gama1")
> sname=c("ina1","mina1","dika1")
> second_class=data.frame(rollno,firstn,sname,stringsAsFactors = FALSE)
> second_class
rollno firstn sname
1 5 alpha1 ina1
2 6 beta1 mina1
3 7 gama1 dika1
>
> totalclass=merge(first_class,second_class,by.x=c("rollno","fname","sname"),by.y=c("rollno","firstn","sname"),all=TRUE)
>
> totalclass
rollno fname sname
1 1 alpha ina
2 2 beta mina
3 3 gama dika
4 4 theta rola
5 5 alpha1 ina1
6 6 beta1 mina1
7 7 gama1 dika1

CASE2
RSCRIPT-

rollno=c("1","2","3","4")
fname=c("alpha","beta","gama","theta")
sname=c("ina","mina","dika","rola")
maths=c(12,13,14,15)
first_class=data.frame(rollno, fname, sname,maths,stringsAsFactors = FALSE)
first_class

rollno=c("1","2","3","4")
fname=c("alpha","beta","gama","theta")
surname=c("ina","mina","dika","rola")
french=c(22,23,45,67)
second_class=data.frame(rollno,fname,surname,french,stringsAsFactors = FALSE)
second_class

classtotal=merge(first_class,second_class,by.x=c("rollno","fname","sname"),by.y=c("rollno","fname","surname"))
classtotal

OUTCOME

rollno=c("1","2","3","4")
> fname=c("alpha","beta","gama","theta")
> sname=c("ina","mina","dika","rola")
> maths=c(12,13,14,15)
> first_class=data.frame(rollno, fname, sname,maths,stringsAsFactors = FALSE)
> first_class
rollno fname sname maths
1 1 alpha ina 12
2 2 beta mina 13
3 3 gama dika 14
4 4 theta rola 15
>
> rollno=c("1","2","3","4")
> fname=c("alpha","beta","gama","theta")
> surname=c("ina","mina","dika","rola")
> french=c(22,23,45,67)
> second_class=data.frame(rollno,fname,surname,french,stringsAsFactors = FALSE)
> second_class
rollno fname surname french
1 1 alpha ina 22
2 2 beta mina 23
3 3 gama dika 45
4 4 theta rola 67
>
> classtotal=merge(first_class,second_class,by.x=c("rollno","fname","sname"),by.y=c("rollno","fname","surname"))
> classtotal
rollno fname sname maths french
1 1 alpha ina 12 22
2 2 beta mina 13 23
3 3 gama dika 14 45
4 4 theta rola 15 67

#####HOPE THIS IS THE RIGHT WAY ####

Richard Osei Bofah

Member
Dear sir and colleagues,

Find my codes to Assignment 5

####Assignment5 : inner merge of two data frames (dfs) with different name for common column
#Hint: by.x = <colname> by.y = <colname>

###MY CODE

first_name <- c('first1','first2','first3')
second_name <- c('second1','second2','second3')
ID_student <- c('1','2','3')

class1 <- data.frame(ID_student,first_name,second_name)

class1

class(class1)

str(class1)

summary(class1)

class1\$ID_student

levels(class1\$ID_student)

class1 <- data.frame(ID_student,first_name,second_name,stringsAsFactors = FALSE)

str(class1)

#### r bind

first_name <- c('first4','first5','first6')
second_name <- c('second4','second5','second6')
ID_student <- c('4','5','6')
class2 <- data.frame(ID_student,first_name,second_name,stringsAsFactors = FALSE)

class2

class_total <- rbind(class1,class2)

class_total

###### cbind

maths <- c(45,78,90,98,76,67)
science <- c(59,95,40,50,62,79)

class_total <-cbind(class_total,maths,science)

class_total

#### inner merge

history <- c(45,78,49,28,90,30)

english <- c(90,84,89,78,72,49)

Identifier <- c('4','5','1','2','6','3')

class3 <- data.frame(Identifier,history,english,stringsAsFactors = FALSE)

class3

class_total_1 <- merge(class_total,class3, by.x = 'ID_student', by.y="Identifier")
class_total_1

##OR
class_total_1 <- merge(class3, class_total, by.x = 'Identifier', by.y="ID_student")
class_total_1

Richard Osei Bofah

Member
Dear sir and colleagues,

Find my codes to Assignment 5

####Assignment5 : inner merge of two data frames (dfs) with different name for common column
#Hint: by.x = <colname> by.y = <colname>

###MY CODE

first_name <- c('first1','first2','first3')
second_name <- c('second1','second2','second3')
ID_student <- c('1','2','3')

class1 <- data.frame(ID_student,first_name,second_name)

class1

class(class1)

str(class1)

summary(class1)

class1\$ID_student

levels(class1\$ID_student)

class1 <- data.frame(ID_student,first_name,second_name,stringsAsFactors = FALSE)

str(class1)

#### r bind

first_name <- c('first4','first5','first6')
second_name <- c('second4','second5','second6')
ID_student <- c('4','5','6')
class2 <- data.frame(ID_student,first_name,second_name,stringsAsFactors = FALSE)

class2

class_total <- rbind(class1,class2)

class_total

###### cbind

maths <- c(45,78,90,98,76,67)
science <- c(59,95,40,50,62,79)

class_total <-cbind(class_total,maths,science)

class_total

#### inner merge

history <- c(45,78,49,28,90,30)

english <- c(90,84,89,78,72,49)

Identifier <- c('4','5','1','2','6','3')

class3 <- data.frame(Identifier,history,english,stringsAsFactors = FALSE)

class3

class_total_1 <- merge(class_total,class3, by.x = 'ID_student', by.y="Identifier")
class_total_1

##OR
class_total_1 <- merge(class3, class_total, by.x = 'Identifier', by.y="ID_student")
class_total_1
Sir, Please I need some clarification on my submission to assignment 5 above

Thus, In my own R-Studio, when I issue the code below:

levels(class1\$ID_student)

It gives me NULL compared to what I get from the Simplilearn lab practice of [1] "1" "2" "3".

Any reason for these differences? I am using R 4.04

Richard Osei Bofah

Member
Your "merged_list", is the combination of list1 and list2 and this "merged_list" has 6 elements now, 1,2,3,sun,Mon and Tue. So if we check,
merged_list[[1]] will give 1, similarly merged_list[[2]] will give 2 and so on. So there are only 6 elements. Bur if we write th code merged_list[[1]][2], we are basically trying to find the 2nd element within the first element 1. but here there is only one value in the first element. That is the reason it gives NA. is my understanding.

list1<-list("1","2","3")
list2<-list("sun","Mon","Tue")
merged_list<-list(list1,list2) ## instead of "c(list1,list2)", we use list(list1,list2)
merged_list

the the output of:
merged_list[[1]][2] will give the answer = '2'

since in the merged_list, now, there are two elements list1 and list2 and the 2nd element within list1 = 2
Thank you Sujit. It worked now.

Varun Sharma_25

Member
Sir, Please I need some clarification on my submission to assignment 5 above

Thus, In my own R-Studio, when I issue the code below:

levels(class1\$ID_student)

It gives me NULL compared to what I get from the Simplilearn lab practice of [1] "1" "2" "3".

Any reason for these differences? I am using R 4.04
Hi, I don't know which R version I am using on my local but I have faced the same issue. The R on local machine is reading a class as "data.frame" by default. So, you have to use "as.factor()' to convert your data into factors & thn you will be able to perform "level()". Happy Learning!!

Varun Sharma_25

Member
Hello Sir,

Assignment: Do not perform the paste(brand, " ", model) operation in case of chevrolet:

df1 <- filter(US_Car_df, brand != 'chevrolet')
print(df1)

df2 <- filter(US_Car_df, brand == "chevrolet")
print(df2)

df1 <- mutate(df1, brand_model = paste(brand, " ", model))

df_bind <- bind_rows(df1,df2)#bind_rows by using "DPLYR" package.

df_bind <- arrange(df_bind,X)
View(df_bind)

sujit_35

Member
I am new to R and I just applied all the learning and googled few to get the solution to the below assignment. I am sure there will be much simpler way to achieve the answer. owere, this is something i would like to share with you all. Attached is my data file in xlsx as well.

#create a dummy column Date with data in the format '02/15/2020 20:31:15'. and create another column name with values like "Mr John James" and " John James"

#Added two columns - dummy date as per the format and dummy names

#1 - Extract the month out of the date in the format "Jan" "Feb" etc and create a separate column

#2 - Extract the time out of it and create a separate column with values " afternoon , morning , evening , night " based on the hour .

#3 - Create another data frame with the names of states and abbreviation of the state and join with the original data frame .

#4 - Extract the first name out of the newly created name column

#5 - change the value of condition to number of hours left for all the records .

#6 - create bins of mileage and separate them into different groups based on the values .

#There should be six additional columns created for each requirement.

# Check/change directory

setwd("C:/Users/SujitSonar/Desktop/SimpliLearn")
getwd()

names(US_cars_df)[1] = "Id"

#install.packages("lubridate")
library("lubridate")

#install.packages("tidyverse")
library("tidyverse")

#1 - Extract the month out of the date in the format "Jan" "Feb" etc and create a separate column

US_cars_df=mutate(US_cars_df,
Months = month(US_cars_df\$date,label = TRUE)
)

#2 - Extract the time out of it and create a separate column with values " afternoon , morning ,
#evening , night " based on the hour .

US_cars_df = mutate(US_cars_df,as.numeric(hour(dmy_hms(US_cars_df\$date))))

names(US_cars_df)[17]="Hours"

hrs = US_cars_df\$Hours

#cut(hrs,breaks=c(-Inf,12,19,00,11,Inf),labels =c("morning","afternoon","evening","night","early_morning") )

US_cars_df\$time_cat = cut(hrs,breaks=c(-Inf,4,11,16,19,00,Inf),labels =c("mid night","early morning","morning","afternoon","evening","night") )

US_cars_df = US_cars_df[-c(17)]

#3 - Create another data frame with the names of states and abbreviation of the state and join with the original data frame .

library(stringr)
library(dplyr)

State_df = (unique(US_cars_df\$state))

state_names = str_to_title(State_df)

state_names

i=1
state_abb = c()

for (names in state_names){
if(names == "Ontario"){
state_abb = c(state_abb,"OT")
}else{
state_abb = c(state_abb,state.abb[which(state.name==state_names)])
}

i = i+1
}

state_abb

length(state_abb)

state_data = cbind(State_df,state_abb)

state_data = data.frame(state_data)
names(state_data)[1]='state'

view(state_data)

class(state_data)

US_cars_df = merge(US_cars_df,state_data, by.x ="state", all.x=TRUE )

#4 - Extract the first name out of the newly created name column

n1 =unique(US_cars_df\$name)
n1
n2 = str_replace(n1,"Mrs","")
n2
n3=str_replace(n2,"Mr","")
n3

n4= str_replace(n3,"Ms","")
n4

n5 = str_trim(n4)
n5

n6 =str_split_fixed(n5," ",3)
n6

n7=cbind(n1,n6)
n7

n8= data.frame(n7)
names(n8)[1]='name'
names(n8)[2]='first_name'
names(n8)[3]='last_name'

US_cars_df = merge(US_cars_df,n8, by.x ="name", all.x=TRUE )

US_cars_df = US_cars_df[-21]

View(US_cars_df)

length(US_cars_df\$name)

#5 - change the value of condition to number of hours left for all the records .

class(US_cars_df\$condition)

c0= unique(US_cars_df\$condition)

length(c0)
c0

c1 = str_replace(c0,"Listing Expired","0 hours")
c1
View((c1))

class(c1)

c2 = str_split_fixed(c1," ",3)

c3= cbind(c0,c1,c2)
view(c3)

c4= data.frame(c3)
names(c4)[1:5] =c("condition","condition_format","val","days","text")
names(c4)
View(c4)
c4=c4[-5]

View(c4)

unique(c4\$days)

days = c("days","hours",'minutes')
h_val = c(24,1,1/60)
h_1= cbind(days,h_val)
h_1

h_1=data.frame(h_1)
h_1

c5= merge(c4,h_1, by.x ="days", all.x=TRUE )
View(c5)

class(c5\$h_val)

c6 =mutate(c5, newcol1=as.numeric(c5\$val))
c7=mutate(c6,newcol2=as.numeric(c6\$h_val))

c8=mutate(c7,no_of_hrs_left = c7\$newcol1*c7\$newcol2)

c9=c8[-(4:7)]
c9=c9[-1]
View(c9)
c9=c9[-2]

length(c9\$condition)

class(c9)

c10= merge(c9,US_cars_df, by.x ="condition", all.x=TRUE )
length(c10\$condition)

View(c10)

c10 = mutate(c10,hrs_left= as.integer(c10\$no_of_hrs_left))

names(c10)
View(c10)
US_cars_df = c10[-2]
View(US_cars_df)

#6 - create bins of mileage and separate them into different groups based on the values .

length(unique(US_cars_df\$mileage))
class(US_cars_df\$mileage)

summary(US_cars_df\$mileage)

US_cars_df\$mileage_cat = cut(US_cars_df\$mileage,breaks=c(-Inf,25000,50000,75000,Inf),labels =c("low","average","mid","high") )

View(US_cars_df)

names(US_cars_df)

US_cars_df=US_cars_df[,c("Id","price","name","brand","model","year","title_status","mileage","color","vin","lot","state","country",
"date","condition","Months","time_cat","state_abb","first_name","last_name","hrs_left","mileage_cat")]

View(US_cars_df)

sujit_35

Member
Hi sir/ Team,
need help on the below code, not sure where i am going wrong, i get an error. is it because my weight is numeric and my months is character?

weight = c(2.5,2.8,3.2,4.8,5.1,5.9)
months = c('Jan','feb','mar','apr','may','jun')
plot(months,
weight,
type = 'b',
main='baby weight by months')

o/p:
Error in plot.window(...) : need finite 'xlim' values
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
> class(weight)

sujit_35

Member
Assignment1 on ggplot:

##Assignment : Draw vline for mean of Male and Female in different colors .

df <- data.frame(
gender=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5),
rnorm(200, mean=65, sd=5)))
)

grouped <- aggregate(df\$weight , by = list(df\$gender) , FUN = mean)
names(grouped)[1:2]=c("gender","grp.mean")
grouped

ggplot(df,aes(x=weight)) +
geom_density() +
geom_vline(aes(xintercept =grp.mean,color=gender),data=grouped,linetype='dashed',size=1)

# #created another data frame "grouped" using the original df to find the mean weight for male and female and using the colums from this new data frame in the xintercept of geom_vline, to plot the mean lines for male and female

sujit_35

Member
## Assignment : on ggplot
### plot multiple lines for multiple continuous data in a single plot.
## hint : ggplot () + geom_density() + geom_density ()

df <- data.frame(
gender=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5),
rnorm(200, mean=65, sd=5)))
)

grouped <- aggregate(df\$weight , by = list(df\$gender) , FUN = mean)

names(grouped)[1:2]=c("gender","grp.mean")
grouped

ggplot(df,aes(x=weight)) +
geom_density(aes(color=gender)) +
geom_vline(aes(xintercept =grp.mean,color=gender),data=grouped,linetype='dashed',size=1)

## as per the hint we are suppose to achieve this using two geom_density but i was unbale to do this using two geo_density, if anybody ahs done it, please share. Instaed i just used one geom_density and used the color = gender

nsen59341

Member
##Assignment : Draw vline for mean of Male and Female in different colors.

df <- data.frame(
gender=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5),
rnorm(200, mean=65, sd=5)))
)

df_fmal <- filter(df,gender=="F")
df_mal <- filter(df,gender=="M")
mean_w8_fmal <- mean(df_fmal\$weight)
mean_w8_mal <- mean(df_mal\$weight)

p <- ggplot(df, aes(x=weight)) + geom_density()

p + geom_vline(aes(xintercept=mean_w8_fmal), color="blue",
linetype="dashed", size=1) + geom_vline(aes(xintercept=mean_w8_mal), color="red",
linetype="dashed", size=1)

sujit_35

Member
Hi Sir, need your help on combo chart/ two axis, uisng the below data, I want to make

1) age and cnt as stacked bar chat by male and female and
2) age and sum_exp as line (one line each for male and female)

age = c(0,1,2,3,4,5)
gen =c('M','F','M','M','F','F')
cnt = c(3,4,8,9,10,5)
sum_exp = c(1000,3890,4678,897,890,987)

df= data.frame(age,cnt,sum_exp
df

Arun V_4

Member
### Assignment 1 : Print the number of times the name Ravi appears in the given vector

n=c('arun','ravi','preeti','ravi','karthi')
count=0
for (name in n) {
if (name=='ravi'){
count=count+1
if(count==2){
print(paste('The name Ravi appearing times:',count))
}
}
}

Arun V_4

Member
##Assignment 2: Add 5 to the elements if the element is old. Else add 10 the element

v1=c(1,2,3,4,5)
v1
for (val in v1) {
if(val%%2==0){
print(val+10)
}
else
{
print(val+5)
}
}

NAGARAJA RAO DIWAKAR

Member
Sir,
I am getting below error in console, when I am trying to create train and test in regression analysis. I am trying to do the 7th project "Hospital expenditure".
Could you please let me know where I am going wrong?

set.seed(111) # Seed initializes the randomness
> sample = sample.split(hosp_analysis, SplitRatio = 0.7) # Returns a vector with T for 70% of data
> trainingSet = subset(hosp_analysis, sample == T)
Error: Must subset rows with a valid subscript vector.
i Logical subscripts must match the size of the indexed input.
x Input has size 499 but subscript `r` has size 3.
Run `rlang::last_error()` to see where the error occurred.
> testSet = subset(hosp_analysis, sample == F)
Error: Must subset rows with a valid subscript vector.
i Logical subscripts must match the size of the indexed input.
x Input has size 499 but subscript `r` has size 3.
Run `rlang::last_error()` to see where the error occurred.

nsen59341

Member
Sir,
I am getting below error in console, when I am trying to create train and test in regression analysis. I am trying to do the 7th project "Hospital expenditure".
Could you please let me know where I am going wrong?

set.seed(111) # Seed initializes the randomness
> sample = sample.split(hosp_analysis, SplitRatio = 0.7) # Returns a vector with T for 70% of data
> trainingSet = subset(hosp_analysis, sample == T)
Error: Must subset rows with a valid subscript vector.
i Logical subscripts must match the size of the indexed input.
x Input has size 499 but subscript `r` has size 3.
Run `rlang::last_error()` to see where the error occurred.
> testSet = subset(hosp_analysis, sample == F)
Error: Must subset rows with a valid subscript vector.
i Logical subscripts must match the size of the indexed input.
x Input has size 499 but subscript `r` has size 3.
Run `rlang::last_error()` to see where the error occurred.
Hello Nagaraja,
First check whether your dataset is of class data.frame or not(by class(hosp_analysis)). If it is not then convert it into a dataframe. Then try.

Sunil Kumar_62

Active Member
Hello Sir @ Sabyasanchi? are you planning to give data science with python?? please let me know if yes when?

NAGARAJA RAO DIWAKAR

Member
Hello Sir,
I am getting below singularity error for coefficients in my model. I have removed NA in RACE from row#277 . With my dim is 499 rows and 26 columns after creating dummy variable for RACE and AGE. How to fix this problem?

> # Create model
> model = lm(formula=LOS ~ ., data=trainingSet)
> model

Call:
lm(formula = LOS ~ ., data = trainingSet)

Coefficients:
(Intercept) GENDER TOTCHG APRDRG AGE1 AGE2
-3.176167 0.386647 0.001000 0.006167 -0.617258 -2.449325
AGE3 AGE4 AGE5 AGE6 AGE7 AGE8
-10.508603 NA -4.098314 -3.942341 -0.707063 NA
AGE9 AGE10 AGE11 AGE12 AGE13 AGE14
-6.699560 -2.685859 -0.852399 -2.133080 -1.185292 -0.726483
AGE15 AGE16 AGE17 RACE2 RACE3 RACE4
-1.006001 -1.657240 -2.128020 -3.777826 NA -0.372907
RACE5 RACE6
1.947843 NA

> summary(model)

Call:
lm(formula = LOS ~ ., data = trainingSet)

Residuals:
Min 1Q Median 3Q Max
-13.8799 -0.4313 -0.0781 0.3619 13.2214

Coefficients: (4 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.176e+00 5.459e-01 -5.818 1.43e-08 ***
GENDER 3.866e-01 2.176e-01 1.777 0.076547 .
TOTCHG 1.000e-03 3.842e-05 26.029 < 2e-16 ***
APRDRG 6.167e-03 7.905e-04 7.802 8.39e-14 ***
AGE1 -6.173e-01 7.812e-01 -0.790 0.430010
AGE2 -2.449e+00 1.926e+00 -1.272 0.204438
AGE3 -1.051e+01 1.942e+00 -5.410 1.23e-07 ***
AGE4 NA NA NA NA
AGE5 -4.098e+00 1.403e+00 -2.920 0.003742 **
AGE6 -3.942e+00 1.392e+00 -2.833 0.004906 **
AGE7 -7.071e-01 1.391e+00 -0.508 0.611488
AGE8 NA NA NA NA
AGE9 -6.700e+00 1.383e+00 -4.843 1.98e-06 ***
AGE10 -2.686e+00 9.656e-01 -2.781 0.005728 **
AGE11 -8.524e-01 7.244e-01 -1.177 0.240176
AGE12 -2.133e+00 6.097e-01 -3.498 0.000533 ***
AGE13 -1.185e+00 5.072e-01 -2.337 0.020039 *
AGE14 -7.265e-01 4.864e-01 -1.494 0.136229
AGE15 -1.006e+00 4.795e-01 -2.098 0.036657 *
AGE16 -1.657e+00 4.684e-01 -3.538 0.000462 ***
AGE17 -2.128e+00 4.267e-01 -4.987 1.00e-06 ***
RACE2 -3.778e+00 1.462e+00 -2.584 0.010189 *
RACE3 NA NA NA NA
RACE4 -3.729e-01 1.348e+00 -0.277 0.782182
RACE5 1.948e+00 1.379e+00 1.413 0.158735
RACE6 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.873 on 325 degrees of freedom
Multiple R-squared: 0.6853, Adjusted R-squared: 0.665
F-statistic: 33.7 on 21 and 325 DF, p-value: < 2.2e-16

NAGARAJA RAO DIWAKAR

Member
Hello Sir,
I am getting below singularity error for coefficients in my model. I have removed NA in RACE from row#277 . With my dim is 499 rows and 26 columns after creating dummy variable for RACE and AGE. How to fix this problem?

> # Create model
> model = lm(formula=LOS ~ ., data=trainingSet)
> model

Call:
lm(formula = LOS ~ ., data = trainingSet)

Coefficients:
(Intercept) GENDER TOTCHG APRDRG AGE1 AGE2
-3.176167 0.386647 0.001000 0.006167 -0.617258 -2.449325
AGE3 AGE4 AGE5 AGE6 AGE7 AGE8
-10.508603 NA -4.098314 -3.942341 -0.707063 NA
AGE9 AGE10 AGE11 AGE12 AGE13 AGE14
-6.699560 -2.685859 -0.852399 -2.133080 -1.185292 -0.726483
AGE15 AGE16 AGE17 RACE2 RACE3 RACE4
-1.006001 -1.657240 -2.128020 -3.777826 NA -0.372907
RACE5 RACE6
1.947843 NA

> summary(model)

Call:
lm(formula = LOS ~ ., data = trainingSet)

Residuals:
Min 1Q Median 3Q Max
-13.8799 -0.4313 -0.0781 0.3619 13.2214

Coefficients: (4 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.176e+00 5.459e-01 -5.818 1.43e-08 ***
GENDER 3.866e-01 2.176e-01 1.777 0.076547 .
TOTCHG 1.000e-03 3.842e-05 26.029 < 2e-16 ***
APRDRG 6.167e-03 7.905e-04 7.802 8.39e-14 ***
AGE1 -6.173e-01 7.812e-01 -0.790 0.430010
AGE2 -2.449e+00 1.926e+00 -1.272 0.204438
AGE3 -1.051e+01 1.942e+00 -5.410 1.23e-07 ***
AGE4 NA NA NA NA
AGE5 -4.098e+00 1.403e+00 -2.920 0.003742 **
AGE6 -3.942e+00 1.392e+00 -2.833 0.004906 **
AGE7 -7.071e-01 1.391e+00 -0.508 0.611488
AGE8 NA NA NA NA
AGE9 -6.700e+00 1.383e+00 -4.843 1.98e-06 ***
AGE10 -2.686e+00 9.656e-01 -2.781 0.005728 **
AGE11 -8.524e-01 7.244e-01 -1.177 0.240176
AGE12 -2.133e+00 6.097e-01 -3.498 0.000533 ***
AGE13 -1.185e+00 5.072e-01 -2.337 0.020039 *
AGE14 -7.265e-01 4.864e-01 -1.494 0.136229
AGE15 -1.006e+00 4.795e-01 -2.098 0.036657 *
AGE16 -1.657e+00 4.684e-01 -3.538 0.000462 ***
AGE17 -2.128e+00 4.267e-01 -4.987 1.00e-06 ***
RACE2 -3.778e+00 1.462e+00 -2.584 0.010189 *
RACE3 NA NA NA NA
RACE4 -3.729e-01 1.348e+00 -0.277 0.782182
RACE5 1.948e+00 1.379e+00 1.413 0.158735
RACE6 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.873 on 325 degrees of freedom
Multiple R-squared: 0.6853, Adjusted R-squared: 0.665
F-statistic: 33.7 on 21 and 325 DF, p-value: < 2.2e-16

Hi Sabya Sir,
If I consider age group as continuous variable, model R^2 value is decreasing. However I am not getting coefficient NA values in this model. RACE is creating maximum probability (P value) error. Shall I remove RACE to improve better model further for Length of stay?
Please suggest how to proceed further on this project.