Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Data Science With R Programming | Madhuparna | Sep 25 - Oct 24

bit.madhuparna786

Moderator
Alumni
Trainer

bit.madhuparna786

Moderator
Alumni
Trainer
x <- factor(c("male", "female", "female", "male"))
x

typeof(x)

#output

typeof(x)
[1] "integer"
why it is not showing as Factor instead of integer
x
[1] male female female male
Levels: female male

This output itself shows you are dealing with factors. Note "Levels" are only for Factors and categorical variables.

Happy Coding :)

---Madhu
 

bit.madhuparna786

Moderator
Alumni
Trainer
Hello Madhu,
Assignment 1 solution
#Homework
# Q1 Make a simple comparison between two numbers x=5 and y=9 and print the larger number
#using the "IF" statement.
#I method
x<-5
y<-9
if(x>y)
{
print("The value of x is greater")
} else
{
print("The value of y is greater")
}
Output:
"The value of y is greater"

or


x<-5
y<-9
if(x>y)
{
print(paste("The value", x, "is greater"))
} else
{
print(paste("The value", y,"is greater"))
}

Output:
"The value of 9 is greater"


#Assume we have two variables having the same values. x=y=8
#Use nested "iF" statements to check the 3 conditions
#x&y are equal
#x>y
#x<y
#Use "else if" and "else" statements along with the "IF" statement to check the 3 conditions
#and specify the result which should be printed in each case.
#Finally let R print the right result.

x<-y<-8
x
y
if(x>y){
print("x is bigger than y")
}else if (x==y){
print("x and y, both are equal")
}else{
print("y is bigger than x")
}

Output:
[1] "x and y, both are equal"
Way to go Lakhsmi!!!!!

Keep Coding :)

---------Madhu
 

Vibha Shishodiya

Active Member
Hello Madhu
Hope you and all other are doing well!
My doubt is when we are using same function why output format is different.
Pls help
df<-data.frame(name,age,sex, stringsAsFactors=FALSE)
> df
name age sex
1 Joe 27 M
2 John 26 M
3 Nancy 26 F
> df<-data.frame(name<-c("Joe","John","Nancy"),age<-c(27,25,24),sex<-c('M','M','F'),stringsAsFactors = FALSE)
> df
name....c..Joe....John....Nancy.. age....c.27..25..24. sex....c..M....M....F..
1 Joe 27 M
2 John 25 M
3 Nancy 24 F

Regards
 

Vibha Shishodiya

Active Member
Hello Madhu
Hope you and all other are doing well!
My doubt is when we are using same function why output format is different.
Pls help
df<-data.frame(name,age,sex, stringsAsFactors=FALSE)
> df
name age sex
1 Joe 27 M
2 John 26 M
3 Nancy 26 F
> df<-data.frame(name<-c("Joe","John","Nancy"),age<-c(27,25,24),sex<-c('M','M','F'),stringsAsFactors = FALSE)
> df

name....c..Joe....John....Nancy.. age....c.27..25..24. sex....c..M....M....F..

1 Joe 27 M


2 John 25 M
3 Nancy 24 F

Regards
name....c..Joe....John....Nancy.. age....c.27..25..24. sex....c..M....M....F.. why we are getting this line . we just want our col heads.
 
Hello @bit.madhuparna786
I hope you're doing well.

I've been trying to practice since last time but I'm totally stuck.

I cannot set the wd in LMS R lab (see the attached pics). After I click on the three dots, I don't get the same options as you and even if enter the path manually it just doesn't work.

Also, I could successfully installed xlsx packaged using my R local program but I cannot run library(xlsx). Here is the error message I get:
Error : the loading of the package or the names space has failed for ‘xlsx’ :
.onLoad has failed in loadNamespace() for 'rJava', details :
call : fun(libname, pkgname)
error : JAVA_HOME cannot be determined from the Registry


Can you please help me to figure it out?
 

Attachments

  • 2.png
    2.png
    238.1 KB · Views: 2
  • 3.png
    3.png
    232.6 KB · Views: 2
Last edited:

Vibha Shishodiya

Active Member
Hello Madhu

Getting this error message

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

> detach("package:dplyr", unload = TRUE)
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union


Regards
 

Lakshmi Narashimman

Active Member
Hello Madhu,

I am unable to execute the write.xlsx() function in the r-studio for the code that you have written. The console shows that the function is not found. But the package of xlsx is installed in that r-studio. Could you please tell me what should do to execute it?
 

Lakshmi Narashimman

Active Member
Hello Madhu,

I have a doubt. I tried executing both range and square root for a list using the sapply function. It shows error. Can't we execute multiple operations to the list and generate the output using the sapply function? Below is the code for which I got error.

Code:

no1<-list(a=c(5,5),b=c(7,7),d=c(9,9))
no1
sapply(no1,range,sqrt)


When I applied sqrt or range separately, I got the result. But when I applied for both these operations together, I got the error. Is there a way where we can execute multiple operations for the entire list using sapply function?
 
Hi Vaishnavi,

I am facing multiple issues with the LMS.

1. This is the second time it has happened that when I try to attend the Live Class it shows no class available.
I had to leave the first batch by Arindam and then Join Madhu's batch after checking with your support. On Third the same issue occurred when it said no classes for me and when it eventually showed up it said Meeting Locked.

2. All of a sudden the Multicasting options has stopped on your LMS due to which I cannot use your system on my Smart T.V , Tablet or Cell Phone. I have again raised the issue again with you Tech Support so far no one has replied.

3. I cannot hear anything on the recordings and cannot find a logical reason on why its doing so. I tried everything at my end and called you support no response yet.

The only answer I get is we have escalated and someone will get back to you.

Looking forward for your guidance and assistance on the same.

If someone could call me it would be great.

Regards
 
For people having problems running xlsx functions and files exporting issues

I finally solved my issue regarding xlsx by following the steps here:

For me, it was apparently due to the fact that my OS and Rstudio version were 64bit, but the Java version I was using was by default 32bit. So I manually installed the 64bit version of Java and now my xlsx functions run correctly, lastly I can finally export png and other files.

Hope it will help some other people.
 
Last edited:

Vibha Shishodiya

Active Member
Hello Madhu
I have just checked for the feedback window but neither it has popped nor it is appearing in live classes even.
Kindly send the link if any.
Regards
 

bit.madhuparna786

Moderator
Alumni
Trainer
Hello Madhu,

I have a doubt. I tried executing both range and square root for a list using the sapply function. It shows error. Can't we execute multiple operations to the list and generate the output using the sapply function? Below is the code for which I got error.

Code:

no1<-list(a=c(5,5),b=c(7,7),d=c(9,9))
no1
sapply(no1,range,sqrt)


When I applied sqrt or range separately, I got the result. But when I applied for both these operations together, I got the error. Is there a way where we can execute multiple operations for the entire list using sapply function?
Dont execute both the operations together.
Execute one at a time--
sapply(no1,sqrt)
sapply(no1,range)

-----Happy C
Madhu
 

bit.madhuparna786

Moderator
Alumni
Trainer
Hello Madhu,

I am unable to execute the write.xlsx() function in the r-studio for the code that you have written. The console shows that the function is not found. But the package of xlsx is installed in that r-studio. Could you please tell me what should do to execute it?
Lakshmi,

It has to run.
Make sure you have called the package using the library function.
We will look into it, in the upcoming class.

----Madhu
 

bit.madhuparna786

Moderator
Alumni
Trainer
Hello Madhu

Getting this error message

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

> detach("package:dplyr", unload = TRUE)
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union


Regards
Vibha,

This is not an error :)

:)

------Happy Coding!
Madhu
 

bit.madhuparna786

Moderator
Alumni
Trainer
Hello @bit.madhuparna786
I hope you're doing well.

I've been trying to practice since last time but I'm totally stuck.

I cannot set the wd in LMS R lab (see the attached pics). After I click on the three dots, I don't get the same options as you and even if enter the path manually it just doesn't work.

Also, I could successfully installed xlsx packaged using my R local program but I cannot run library(xlsx). Here is the error message I get:
Error : the loading of the package or the names space has failed for ‘xlsx’ :
.onLoad has failed in loadNamespace() for 'rJava', details :
call : fun(libname, pkgname)
error : JAVA_HOME cannot be determined from the Registry


Can you please help me to figure it out?
Hi Serena,

You can set the working directory using the "setwd" function.

step1. copy the path from the address bar of the folder where you want to save your output files.
step 2. in the editor window, type setwd() and within brackets paste the path copied

setwd("C:\Users\RIYANRIYA\Desktop\DS_R_FULL\data_R_SL")------------#Example

step3.Now change the backslash to forward slashes.

setwd("C:/Users/RIYANRIYA/Desktop/DS_R_FULL/data_R_SL")--------------# Example,see how the path will change

step4. now execute the line.

step 5. Check if the WD is set using the getwd() option

getwd()
[1] "C:/Users/RIYANRIYA/Desktop/DS_R_FULL/data_R_SL"

Please follow these steps,and let me know.

----Happy Coding :)
Madhu
 

Vaishnavi Chauhan

Active Member
Simplilearn Support
Customer
Hii everyone !! :)

Hope you all are doing great !!

You can create a WhatsApp group, but please share only one link for the group and all will join the same.

Also, if in case you have any queries or issues with the LMS please reply to this comment and I will help you with the same.

Have a good day all!!

Happy Learning !! :D
 

Vaishnavi Chauhan

Active Member
Simplilearn Support
Customer
Hi Vaishnavi,

I am facing multiple issues with the LMS.

1. This is the second time it has happened that when I try to attend the Live Class it shows no class available.
I had to leave the first batch by Arindam and then Join Madhu's batch after checking with your support. On Third the same issue occurred when it said no classes for me and when it eventually showed up it said Meeting Locked.

2. All of a sudden the Multicasting options has stopped on your LMS due to which I cannot use your system on my Smart T.V , Tablet or Cell Phone. I have again raised the issue again with you Tech Support so far no one has replied.

3. I cannot hear anything on the recordings and cannot find a logical reason on why its doing so. I tried everything at my end and called you support no response yet.

The only answer I get is we have escalated and someone will get back to you.

Looking forward for your guidance and assistance on the same.

If someone could call me it would be great.

Regards
Hii Sabarish,

Greetings !!

I will call you within some time and we shall resolve these queries.
 

Lakshmi Narashimman

Active Member
Hello Madhu,

Can you elaborate about the function table? table()

What is the difference between table () function and data.frame () function? Please explain
 
Last edited:

Lakshmi Narashimman

Active Member
Hello @bit.madhuparna786

I executed these two codes for barplots.

What code should I execute for months from Jan to September to be displayed on the barplot in r-console?

I tried using data.frame() function to first create a table consisting of months and average marks of the class. But the console returned error.

How do I make sure that all these months from Jan to Sep get shown in the R-console barplot?

Please guide me.

The error returned code

Class_Eighth_Average_Marks<-c(56,76,67,83,65,91,45,59,61)
Class_Eighth_Average_Marks
Exam_months<-c("Jan","Feb","March","April","May","June","July","August","September")
Exam_months
Class_eight_marks_monthwise<-data.frame(Exam_months, Class_Eighth_Average_Marks)
Class_eight_marks_monthwise
barplot(Class_eight_marks_monthwise,horiz=TRUE, col="red")


The correct code

Class_Eighth_Average_Marks<-c(56,76,67,83,65,91,45,59,61)
Class_Eighth_Average_Marks
barplot(Class_Eighth_Average_Marks,horiz=TRUE, xlab="Average Marks of Class Eight", ylab="Months", main="Average Marks scored by the Class Eighth Monthwise",col="red")

1634147680141.png


This correct code returns the list of marks in the form of barplot. But it did not display the months against these average marks.

How do I make sure that the months starting from Jan to Sep are displayed in the barplot against the marks?

Please clarify the doubt.
 
Last edited:

Lakshmi Narashimman

Active Member
Hello Madhu,

I have a doubt. What is the difference between the density () function and plot () function in the kernel density. Why do we use both these functions separately? Please elaborate. Below is the code you explained in the previous class.


#kernel density data

#The only difference between KDP and hist is in HIST we have frequency/counts on the Y axis
#whereas we have % in the Y axis in KDP and the area under the plot equals 1

library(dplyr)
density_data<-density(mtcars$mpg)
plot(density_data)

# Filling density Plot with color
density_data <-
density(mtcars$mpg)
plot(density_data, main="Kernel
Density of Miles Per Gallon")
polygon(density_data,
col="skyblue", border="black") #THIS FUNCTION "POLYGON" IS VERY IMP which will
#fill in the area under the curve.
 

Lakshmi Narashimman

Active Member
Please explain about the kernel density plots. Why is it used? What are the real-world purposes these kernel density plots are used in? How does a kernel density plot differ from histogram? When do we use both of them?
 

bit.madhuparna786

Moderator
Alumni
Trainer
Hello Madhu,

I have a doubt. What is the difference between the density () function and plot () function in the kernel density. Why do we use both these functions separately? Please elaborate. Below is the code you explained in the previous class.


#kernel density data

#The only difference between KDP and hist is in HIST we have frequency/counts on the Y axis
#whereas we have % in the Y axis in KDP and the area under the plot equals 1

library(dplyr)
density_data<-density(mtcars$mpg)
plot(density_data)

# Filling density Plot with color
density_data <-
density(mtcars$mpg)
plot(density_data, main="Kernel
Density of Miles Per Gallon")
polygon(density_data,
col="skyblue", border="black") #THIS FUNCTION "POLYGON" IS VERY IMP which will
#fill in the area under the curve.
Hi Lakhmi,

the density function computes the kernel(central) densities, while the plot function is a graphics function and has several options to control the appearance of the graphical display of the KDE.

Happy Learning :)

---Madhu
 

Lakshmi Narashimman

Active Member
Please mention the difference between the confidence interval. confidence level with an easy example. It seems a bit confusing. I saw this difference from the online. Is this misleading? or Is the difference right?

1634382552903.png
 
Last edited:

Vibha Shishodiya

Active Member
Hello Madhu
I have following doubts:


Doubt 1
How to change the order of the columns of a dataset
Eg Iris dataset. If I want to make specie coloumn as first column instead of last wat to do.

Doubt 2
"Arrange" function under dplyr
what do we mean by accross
arrange(across(start with ("Petal.Width"),desc))

Doubt 3
append(1:5, 0:1 By "aa", after = 3)
##1 2 3 0 1 4 5 ##0,1 appended but aa not.

Doubt 4
#Step 2: Defining Age for each generation

# Generation/YearFromTo
# Z Born 1996 and later 0-22
# Y Born 1977 to 1995 22-41
# X Born 1965 to 1976 41-53
# Baby Boomers Born 1946 to 1964 53-Above
Generation is just random calculation ?? In actual we have to use mathematics or we take random one only??

Pls share barplot codes also as I have not noted them.

Regards
 
Last edited:

Vibha Shishodiya

Active Member
Hello Madhu
While writing in excel I got this format in excel for Demo2 of bank data and not as per yours . Is this is correct.
For getting view of your excel table we have to edit the table in excel itself?? or I have made some mistake
pls guide

write.xlsx( Final_Table, "C:/Users/dell/OneDrive/Desktop/R Programming/R madhuparna/Day 3-4/R Export Write/Demo Bank Result.xlsx",sheetName = "Sheet1", col.names = TRUE, row.names = TRUE, append = FALSE)




Var1Var2Freq
1Baby Boomersfailure
79​
2Xfailure
128​
3Yfailure
282​
4Zfailure
1​
5Baby Boomersother
25​
6Xother
58​
7Yother
112​
8Zother
2​
9Baby Boomerssuccess
36​
10Xsuccess
27​
11Ysuccess
65​
12Zsuccess
1​
13Baby Boomersunknown
610​
14Xunknown
1126​
15Yunknown
1959​
16Zunknown
10​
 

Lakshmi Narashimman

Active Member
Hello @bit.madhuparna786 ,

I worked myself the simple linear regression modelling with a new data set found from Kaggle on simple linear regression. I worked out the codes successfully as you shared
.

The data consisted of 119,040 observations. It was a task that asked the data analysts to find out if the daily minimum temperature impacted the daily maximum temperature. Also, it asked to predict the daily maximum temperature given the daily minimum temperature. Overall, it asked whether the daily maximum temperature and daily minimum temperature had a significant relationship.

I wanted to share the workings of my code that I practiced from the Kaggle Dataset. I am unable to share the table as there are 119,040 observations found in this data set.

Please provide me with some valuable suggestions based on how I have done this particular coding and provided the results.

Here is the link for the data set


Weather Conditions in World War Two: Is there a relationship between the daily minimum and maximum temperature? Can you predict the maximum temperature given the minimum temperature? (https://www.kaggle.com/smid80/weatherww2/version/1)

I have worked out the coding here for this particular problem. Also, I have shared the output at the end of the coding too.

Code: R-program


# Is there a significant relationship between the daily minimum and maximum temperature?
# Can you predict the maximum temperature given the minimum temperature?
#Daily Maximum Temperature depends on the minimum temperature (Daily Maximum Temperature= Daily Minimum temperature)
#Null hypothesis: There is no significant relationship between the daily Maximum temperature and daily minimum temperature
#Alternative hypothesis: There is a significant relationship between the Daily maximum temperature and daily minimum temperature

#Importing the dataset

library(gdata)
library(readxl)
weather_data <- read_excel("Summary of Weather.xlsx")
View(weather_data)
print(weather_data)

#Viewing the columns or names of the data set
names(weather_data)

#Indexing and taking the min and mx temperature out from the datasets
weather_data<-weather_data[,5:6]
weather_data
names(weather_data)
weather_data$MaxTemp

#83328 Training set (70%)
#35712 Test Set (30%)
#Splitting or dividing the data set into training set and test set

library(caTools)
set.seed(123)
split<-sample.split(weather_data$MaxTemp, SplitRatio = 0.7)
training_set=subset(weather_data, split==TRUE)
test_set=subset(weather_data, split==FALSE)

#Fitting simple linear model to the training set

Classifier= lm(formula = MaxTemp~MinTemp, data=training_set)
Classifier
summary(Classifier)

#Predicting the test set results

y_est<-predict(Classifier, newdata=test_set)
y_est

#Data Visualization for training data set

library(ggplot2)
ggplot()+
geom_point(aes(x=training_set$MinTemp, y=training_set$MaxTemp),colour='red')+
geom_line(aes(x=training_set$MinTemp,y=predict(Classifier, newdata = training_set)), colour='blue')+
ggtitle('MaxTemp vs MinTemp (Training set)')+
xlab('MinTemp')+
ylab('MaxTemp')


#Data Visualization for the test data set

ggplot()+
geom_point(aes(x=test_set$MinTemp,y=test_set$MaxTemp), colour='blue')+
geom_line(aes(test_set$MinTemp, y=predict(Classifier, newdata = test_set)),colour='red')+
ggtitle('MaxTemp vs MinTemo(Test set)')+
xlab('MinTemp')+
ylab('MaxTemp')


#Output and Results that were found in the console.

#ADJUSTED R -SQAURED IS 0.7722.
#This means Daily Minimum Temperature has a significant impact on the daily maximum temperature.
# The daily minimum temperature has impacted the daily maximum temperature by 77 percent
#The p-value is less than 2.2e-16 that is p-value is less than 0.00000000000000022
# When p is low null should go.
#As p-value is less than 0.05, the null hypothesis is rejected and alternative hypothesis is accepted
# This shows that there is a significant relationship between the daily maximum temperature and daily minimum temp


Data Visualization for the predicted training data set



1634848858381.png


Data Visualization for the predicted test data set

1634848920295.png
 

Lakshmi Narashimman

Active Member
Hello @bit.madhuparna786

When I was working out some sums on z-test, I saw some functions like z.test(), prop.test(), binom.test(). These were relating to the Z-Test that we worked out. We worked out using the pnorm() function. Could you please differentiate the above three functions with pnorm() in r-program.?


Also, there was a problem where I did not know how to execute the codes. Please help me with it. Should we work with the pnorm() function or the above functions like z.test(), prop.test(), binom.test() etc? Please clarify it. Here are the questions.

Problem 1:

#A school claimed that the students’ study that is more intelligent than the average school.
#On calculating the IQ scores of 50 students, the average turns out to be 11.
#The mean of the population IQ is 100 and the standard deviation is 15.
#State whether the claim of principal is right or not at a 5% significance level.

Problem 2:

#Let’s assume that 30 out of 70 people recommend Street Food to their friends.
#To test this claim, a random sample of 150 people obtained.
#Of these 150 people, 80 indicate that they recommend Street Food to their friend.
#Is this claim accurate? Use alpha = 0.05.

Problem 3:

#Suppose that 10 volunteers have done an intelligence test; here are the results obtained.
#The mean obtained at the same test, from the entire population is 75.
#You want to check if there is a statistically significant difference (with a significance level of 95%)
#between the means of the sample and the population, assuming that the sample variance is known and equal to 18. (one sample z test)

a= 65, 78, 88, 55, 48, 95, 66, 57, 79, 81
 
Last edited:

Vibha Shishodiya

Active Member
Hello Madhu
Which data you have used for this and how to calculate fstat
##########ANOVA EXAMPLE######

pf(fstat, df1, df2, lower.tail = FALSE)


# fstat - the value of the f-statistic
# df1 - degrees of freedom 1
# df2 - degrees of freedom 2
# lower.tail - whether or not to return the probability associated
# with the lower tail of the F distribution. This is TRUE by default.

pf(2.23,3,16,lower.tail = FALSE)

#[1] 0.1241814 ------out
Regards
 
Top