Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Data Science Certification Training - R Programming Jan 30,31 Feb 6,7,13,14,20,21,27,28 Pratul Goyal

The class teacher pratul.goyal111 regarding Data science certification-R programming scheduled from 30th Jan till 28 th Feb 2021 shared files through google drive to all attendees but still I have not been received any shared file. I think its a matter of mismatched email id. The correct email id is binodranchi@gmail.com. Hence, needed rectification. Also, coming to community.simplilearn the names displaying as new member is Binod Kumar_3 which should be only Binod Kumar hence. correction in this regard is needed.
hopefully looking forward to rectify the problems as soon as possible
with regards,
 
Still not able to download shared files that were shared by Trainer Pratul Goyal. Needs attention on this matter. Without have reference files hoe I can go for practice. Pls. look onto the matter positively.
Hopepflly looking forward for help
Thanks n regards
 
The class teacher pratul.goyal111 regarding Data science certification-R programming scheduled from 30th Jan till 28 th Feb 2021 shared files through google drive to all attendees but still I have not been received any shared file. I think its a matter of mismatched email id. The correct email id is binodranchi@gmail.com. Hence, needed rectification. Also, coming to community.simplilearn the names displaying as new member is Binod Kumar_3 which should be only Binod Kumar hence. correction in this regard is needed.
hopefully looking forward to rectify the problems as soon as possible
with regards,
Please raise a ticket
 

Aquib Shaikh

New Member
Hello sir,
For project: Comcast Telecom Consumer Complaints.
How do we do the following:
- Provide a table with the frequency of complaint types (Creating a contingency table)
Thanks
 
sir can you please explain the syntax in ggplot.
ggplot(data = vc, mapping = aes(x = reorder(Var1,Freq),
y = Freq,
col = Var1))+
geom_bar(stat = 'identity', fill ='white') +
xlab('Type') +
ylab('Frequency')+
ggtitle('Types of car') +
theme(legend.title = element_text())

over here why we use identity in "stat" and the last line of theme
 
A

Ani Nayak

Guest
For Internet project, there is a Time column in the data set (numeric, with range 0 - 50000).
On doing feature scaling using the scale function for the Time column, the range is becoming -5 - 115.
Expected range is -3.29 - 3.29 ( as per the class for feature scaling) . Is this correct or time variable needs to be handled differently?
 

Attachments

  • Scale - Timeinpage.png
    Scale - Timeinpage.png
    92.3 KB · Views: 14
A

Ani Nayak

Guest
Also need guidance for the outlier treatment for the Time column
 
A

Ani Nayak

Guest
for task3. On FEATURE SELECTION using Backward Elimination, p- value is low for more than 1 variable. Can it be concluded that the exits is affected by mulitple /more than one variable?
 

Attachments

  • Feature Seletion - more than 1 variable.png
    Feature Seletion - more than 1 variable.png
    188.9 KB · Views: 17

Edmond_2

New Member
hello sir.. Please I'm finding a slight challenge with project 1.. i keep on getting errors when i try to predict the test_set..Below are the queries;

data <- read.csv(file.choose(),stringsAsFactors = TRUE)

data <- (read.csv('C:/Users/oseia/OneDrive/Desktop/1555052405_datasets/1555052405_datasets.csv',
stringsAsFactors = TRUE))
setwd('C:/Users/oseia/OneDrive/Desktop')
getwd()
View(data)
str(data)
summary(data$Style)
sapply(data,class)
data = data.frame(data)
View(data)
str(data)
data = as.factor(data)
data$Style = data$Style
names(data$Style)
head(data)
ncol(data)
nrow(data)
length(data)
= row.names(data$Style)
colnames(data)

library(dplyr)
select(data$Style)
data2 = data[,2]
data3 <- read.csv(file.choose(),stringsAsFactors = TRUE)
View(data3)
data4 = merge(data,data3)
View(data4)
str(data4)
summary(data4)
sapply(data4,class)
data4$X.11 = NULL
data3[,25:36] = NULL
View(data3)
ncol(data4)
nrow(data4)
dim(data4)



# DATA PREPROCESSING

# Take Care of the Missing Values in data3
data3$X26.9.2013 = ifelse(is.na(data3$X26.9.2013),ave(data3$X26.9.2013,FUN = function(x) mean(x, na.rm = T)),data3$X26.9.2013)

data3$ X12.10.2013 = ifelse(is.na(data3$ X12.10.2013),ave(data3$ X12.10.2013,FUN = function(x) mean(x, na.rm = T)),data3$ X12.10.2013)

data3$ X10.10.2013 = ifelse(is.na(data3$ X10.10.2013),ave(data3$ X10.10.2013,FUN = function(x) mean(x, na.rm = T)),data3$ X10.10.2013)

data3$X30.9.2013 = ifelse(is.na(data3$X30.9.2013),ave(data3$X30.9.2013,FUN = function(x) mean(x, na.rm = T)),data3$X30.9.2013)

data3$X2.10.2013 = ifelse(is.na(data3$X2.10.2013),ave(data3$X2.10.2013,FUN = function(x) mean(x, na.rm = T)),data3$X2.10.2013)

data3$ X3.10.2013 = ifelse(is.na(data3$X3.10.2013 ),ave(data3$X3.10.2013 ,FUN = function(x) mean(x, na.rm = T)),data3$X3.10.2013)

data3$ X8.10.2010 = ifelse(is.na(data3$ X8.10.2010),ave(data3$ X8.10.2010,FUN = function(x) mean(x, na.rm = T)),data3$ X8.10.2010)

View(data4)

levels(data$Style)
levels(data$Price)
levels(data$Size)
levels(data$Season)
levels(data$NeckLine)
levels(data$SleeveLength)
levels(data$waiseline)
levels(data$Material)
levels(data$FabricType)
levels(data$Decoration)
levels(data$Pattern.Type)

View(data)

data[c(86,143,146,147,150,153,155,177,179,180,182,183,
192,196,211,214,219,221,224,227,230,233,235,246,
248,250,251,308,310,317,328,332,338,354,356,357,358,359,
381,388,408,430,438),'Price'] <- "Low"

data[c(259,260,262,266,270,279,284,285,295,302,
350,399,447,473,477),'Price'] <- "High"

data[c(263,264),"Price"] <- "Low"
data[c(296,373),'Size'] <- "S"
data[c(187,271,64),'Season'] <- "Spring"
data[c(17,251),'Season'] <- "Summer"

data[c(440,444,446,450,452,453,464,479),'Season'] <- "Automn"
data[c(148,149,195,196,199,240,241,242,243,244,245,
266,331,346,347,349,365,366,381,395,396,399,400,
429,439,454,466,469,474,477,478,479,484,
487,488,491,495,499),'Season'] <- "Winter"
data[c(262,264,385),'NeckLine'] <- "Sweetheart"

library(plyr)
levels(data$Style)[levels(data$Style)=="sexy"] = "Sexy"

levels(data$Price)[levels(data$Price)=="high"] = "High"
levels(data$Price)[levels(data$Price)==""] = "Low"
levels(data$Price)[levels(data$Price)=="low"] = "Low"
levels(data$Style)[levels(data$Style)=="sexy"] = "Sexy"


levels(data$Season)[levels(data$Season)=="spring"] = "Spring"
levels(data$Season)[levels(data$Season)==""] = "Spring"
levels(data$Season)[levels(data$Season)=="summer"] = "Summer"

levels(data$Season)[levels(data$Season)=="Autumn"] = "Automn"
levels(data$Season)[levels(data$Season)=="winter"] = "Winter"

levels(data$NeckLine)[levels(data$NeckLine)=="sweetheart"] = "Sweetheart"

levels(data$NeckLine)[levels(data$NeckLine)=="NULL"] = "Sweetheart"
levels(data$NeckLine)[levels(data$NeckLine)==""] = "Sweetheart"

levels(data$Size)[levels(data$Size)=="s"] = "S"
levels(data$Size)[levels(data$Size)=="small"] = "S"

data4 = merge(data,data3)

# Encoding the categorical data
data4$Style = factor(data4$Style,
levels = c('bohemian','Brief','Casual','cute','fashion','Flare','Novelty','OL','party','Sexy','vintage','work'),
labels = c(1,2,3,4,5,6,7,8,9,10,11,12))

data4$Price = factor(data4$Price,
levels = c("Average","High","Low","Medium","very-high"),
labels = c(1,2,3,4,5))

data4$Size = factor(data4$Size,
levels = c("free","L","M","S","XL"),
labels = c(1,2,3,4,5))


data4$Season = factor(data4$Season,
levels = c("Automn","Spring","Summer","Winter"),
labels = c(1,2,3,4))


data4$NeckLine = factor(data4$NeckLine,
levels = c("backless","boat-neck","bowneck","halter","mandarin-collor",
"o-neck","open",
"peterpan-collor","ruffled","Scoop",
"slash-neck","sqare-collor",
"Sweetheart","turndowncollor","v-neck"),
labels = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))


data4$SleeveLength = factor(data4$SleeveLength,
levels = c("butterfly","cap-sleeves","capsleeves",
"full","half","halfsleeve",
"NULL","Petal","short",
"sleeevless","sleeveless","sleevless",
"sleveless","threequarter","threequater",
"thressqatar","turndowncollor","urndowncollor"),
labels = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))


data4$waiseline = factor(data4$waiseline,
levels = c("","dropped","empire","natural","null","princess"),
labels = c(1,2,3,4,5,6))


data4$Material = factor(data4$Material,
levels = c("","acrylic","cashmere",
"chiffonfabric","cotton","knitting",
"lace","linen","lycra",
"microfiber","milksilk","mix",
"modal","model","null",
"nylon","other","polyster",
"rayon","shiffon","silk",
"sill","spandex","viscos","wool"),
labels = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25))

data4$FabricType = factor(data4$FabricType,
levels = c( "","batik","broadcloth","chiffon",
"Corduroy","dobby","flannael","flannel",
"jersey","knitted","knitting","lace",
"null","organza","other","poplin",
"satin","sattin","shiffon","terry",
"tulle","wollen","woolen","worsted" ),
labels = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24))

data4$Decoration = factor(data4$Decoration,
levels = c( "","applique","beading","bow",
"button","cascading","crystal","draped",
"embroidary","feathers","flowers","hollowout",
"lace","none","null","pearls",
"plain","pleat","pockets","rivet",
"ruched","ruffles","sashes","sequined",
"tassel","Tiered"),
labels = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26))

data4$Pattern.Type = factor(data4$Pattern.Type,
levels = c("","animal","character","dot","floral",
"geometric","leapord","leopard","none","null",
"patchwork","plaid","print","solid","splice",
"striped"),
labels = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))
View(data4)

data4$Recommendation = factor(data4$Recommendation,levels = c(1,0))
data4$ï..Dress_ID = as.factor(data4$ï..Dress_ID)

data5 = data4[,2:14]
View(data5)

# Splitting the data into the training and test set
library(caTools)
split = sample.split(data5$Recommendation,SplitRatio = 0.8)
set.seed(12)
training_set = subset(data5,split == T)
test_set = subset(data5,split == F)
View(test_set)
View(training_set)
length(test_set)

#Apply PCA
library(caret)
library(e1071)
pca = preProcess(x = training_set[-13],method = 'pca',pcaComp = 2) #Fixed
training_set1 = predict(pca,training_set1)
test_set1 = predict(pca,test_set1)
View(training_set1)
training_set2 = training_set1[c(2,3,1)]
test_set2 = test_set1[c(2,3,1)]
View(training_set2)
View(test_set2)
# LOGISTIC REGRESSION
data5 = data4[,2:14]
View(data5)
View(data4)

#No need for Feature Scaling because there is only one numeric column/Variable in the data
library(tidyverse)
library(caret)
# Fit the Model to the Training Set
classifier = glm(formula = Recommendation ~.,
family = binomial,
data = training_set1)
summary(classifier)$coef

# Predicting the Test Set Result
prob_pred = classifier %>%
predict(test_set1,type = 'response')

prob_pred = predict(classifier,type = 'response',newdata = test_set1[-1])
 
A

Ani Nayak

Guest
Please check the graphs for linear regression with and without outliers.
Is this correct?
 

Attachments

  • Graph before outlier removal.png
    Graph before outlier removal.png
    249.5 KB · Views: 18
  • Graph after outlier removal.png
    Graph after outlier removal.png
    258.6 KB · Views: 19
A

Ani Nayak

Guest
Does the NA value in the summary means that the variable can be eliminated?
 

Attachments

  • NA Value in Summary.png
    NA Value in Summary.png
    236 KB · Views: 17
Hello Pratul,
Thanks for the sessions, I am a little confused about moving from PCA to the modeling. I used the PC1 and PC2 in my models and I am not sure how to translate that into my importance of my predictors (GRE and GPA ) for predicting Admit in project 5.
Please have a look and your comments will be very much appreciated.
thanks
anita
 

Attachments

  • source_code_Capstone_project_AG.pdf
    55.5 KB · Views: 12
I dont know if i am doing this right i also had issues creating modelm1<-1m(TOTCHG~AGE+FEMALE)
 

Attachments

  • RTEST1.pdf
    109.3 KB · Views: 7
do i need to type out into the body of R

#pvalue comes out to be very high 68% this means we can take risk and reject the null hypothesis
#this means there is no relation between the race of the patient and the hospital cost.

to make

Modelm1<-1m(TOTCHG~AGE+Female)
summary(modelm1)
 
A

Ani Nayak

Guest
As discussed during the class, please share the sample code for Normalization for scaling.
Also, can you please give some reference code for iterator for outlier treatment and boxplot.
 
Hello Pratul,
Thanks for the sessions, I am a little confused about moving from PCA to the modeling. I used the PC1 and PC2 in my models and I am not sure how to translate that into my importance of my predictors (GRE and GPA ) for predicting Admit in project 5.
Please have a look and your comments will be very much appreciated.
thanks
anita
 
Hello Pratul,
Thanks for the sessions, I am a little confused about moving from PCA to the modeling. I used the PC1 and PC2 in my models and I am not sure how to translate that into my importance of my predictors (GRE and GPA ) for predicting Admit in project 5.
Please have a look and your comments will be very much appreciated.
thanks
anita
 

pratul.goyal111

Well-Known Member
Hello Pratul,
Thanks for the sessions, I am a little confused about moving from PCA to the modeling. I used the PC1 and PC2 in my models and I am not sure how to translate that into my importance of my predictors (GRE and GPA ) for predicting Admit in project 5.
Please have a look and your comments will be very much appreciated.
thanks
anita
Try to create Contingency Table
 

pratul.goyal111

Well-Known Member
As discussed during the class, please share the sample code for Normalization for scaling.
Also, can you please give some reference code for iterator for outlier treatment and boxplot.

Refer to this official documentation of R
 
# College admission Project
#lgr model apply

dataset_lgr = read.csv(file.choose())
View(dataset_lgr)
dataset_lgr$admit=factor(dataset_lgr$admit,levels = c(0,1))

library(caTools)
set.seed(123)
split=sample.split(dataset_lgr$admit,SplitRatio=0.7)
training_set=subset(dataset_lgr,split==T)
test_set=subset(dataset_lgr,split==F)

library(dplyr)
training_set=training_set%>%
mutate_at(c(2,3), funs(c(scale(.))))

test_set=test_set%>%
mutate_at(c(2,3), funs(c(scale(.))))



classifier=glm(formula = admit~.,
family=binomial,
data = training_set)
prob_prediction=predict(classifier,type = 'response',newdata = test_set)
y_pred=ifelse(prob_prediction>0.5,1,0)

cm=table(test_set$admit,y_pred>0.5)
Accuracy_score=(71+8)/120*100
 
I tried Logistic Regression Model on College Admission Project.Please have a look . Question : Why total observation in y_pred = 400, that should be 120 & why are we considering in confusion matrix table (y_pred>0.5??).
 
# Decision Tree on college admission project.
data=read.csv(file.choose())
View(data)
attach(data)

library(dplyr)
data=mutate(data,admit=factor(admit))

library(caTools)
set.seed(123)
sample=sample.split(admit,SplitRatio=0.7)
training_set=subset(data,sample==T)
test_set=subset(data,sample==F)


training_set=training_set%>%
mutate_at(c(2,3), funs(c(scale(.))))

test_set=test_set%>%
mutate_at(c(2,3), funs(c(scale(.))))

library(rpart)
decission_tree=rpart(admit~.,data = training_set)

tree_s_pred = predict(decission_tree,test_set,type = 'class')

library(caret)
confusionMatrix(tree_s_pred,test_set$admit)
 
A

Ani Nayak

Guest

Refer to this official documentation of R
I am getting message : "No documentation for ‘normalize’ in specified packages and libraries" for normalize function in the practice lab. Please advice
 

Attachments

  • No normalize function.png
    No normalize function.png
    116.5 KB · Views: 6
Hey Learners,
Please post your queries here.
Hello Pratul,
I have received a message saying I have failed my Machine learning Project for 0% solution to the problem. As you are aware it is Data Science with R that you taught. I have not submitted a project in Machine learning. I only submitted the R Project 5 on Sunday 28th February 2021. At the assessment site, it says my project is being reviewed. Please rectify this problem to enable me obtain my certificate sooner than later,
thanks
anita
 
Hello Pratul,

I've completed the project and submitted on Monday and still my project is under review and I didn't get to know if I passed or not. Please review it and let me know if I need to submit another one or this was fine in order to obtain certificate.
 
Last edited:
A

Ani Nayak

Guest
I have also not received the feedback for my project yet. Has anyone from our batch received feedback?
 
Top