Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

DS with R | Nimisha | Jan 4th

_90528

Member
Can someone help me how to submit the project??????
it shows only few formats to be uploaded.. How to convert R file into pdf or I can copy and paste the code
 
@ Nimisha Pandey Ma'am, Kindly let us know if we have a session tomorrow morning, LMS see not showing any sessions scheduled for tomorrow.
Please have a look at the code below, it is for Project 4- Insurance data. Kindly let me know if I am on the right track for analyzing the data.
Your mentoring is highly appreciated.
Many Thanks.

########## Project - 4 ######### INSURANCE ##########

# Load the Insurance data to memory
ins_data <- read.csv(file.choose())

# View the data, summarise and identify the structure of data
View(ins_data)
summary(ins_data)
str(ins_data)

# convert Kilometres, Zone and Make variables to Factor

ins_data$Kilometres <- as.factor(ins_data$Kilometres)
ins_data$Zone <- as.factor(ins_data$Zone)
ins_data$Make <- as.factor(ins_data$Make)

str(ins_data)

#### Data cleaning #####
#### All values less than 0 are converted to NA and then dropped ####
library(dplyr)
library(tidyr)

ins_data <- ins_data %>%
mutate(Insured = replace(Insured, Insured<0, NA),
Claims = replace(Claims, Claims<0, NA),
Payment = replace(Payment, Payment<0, NA))

ins_data <- ins_data %>%
drop_na()

str(ins_data)
summary(ins_data)

#### Data Visualization before Modeling ####
library(RColorBrewer)

# Visualizing the percentage share of Insurance claims by Make of cars

vc <- table(ins_data$Make)
perc <- round(table(ins_data$Make)/sum(table(ins_data$Make)) * 100, 2)
pie(vc, radius = 1,
labels = paste(names(vc),'- ',perc, '%', sep = ''),
main = 'Pie Chart',
col = brewer.pal(6, 'Set2'),
border = 'white')

##### Building a Linear Regression Model

res <- lm(Payment ~ ., data=ins_data)
summary(res)

##### View the confidence interval of the predictor variables
confint(res)
##### Verifying the Very Important Features
vif(res)
##### Having a look at the outliers
outlierTest(res)

##### Plotting highly correlated variables
cor(ins_data$Insured, ins_data$Payment)
cor(ins_data$Claims, ins_data$Payment)

plot(ins_data$Insured, ins_data$Payment)
plot(ins_data$Claims, ins_data$Payment)


#### Aggregating data using group by for Zone, Kilo, Bonus and Make

ins_data %>%
group_by(Zone) %>%
summarise(Payment = mean(Payment), Insured = mean(Insured), Claims = mean(Claims))%>%
data.frame()

# Zone 4 has the Highest Payment

ins_data %>%
group_by(Kilometres) %>%
summarise(Payment = mean(Payment), Insured = mean(Insured), Claims = mean(Claims))%>%
data.frame()

# Kilometres 2 has the Highest Payment

ins_data %>%
group_by(Bonus) %>%
summarise(Payment = mean(Payment), Insured = mean(Insured), Claims = mean(Claims))%>%
data.frame()

# Bonus 7 has the Highest Payment

ins_data %>%
group_by(Make) %>%
summarise(Payment = mean(Payment), Insured = mean(Insured), Claims = mean(Claims))%>%
data.frame()

# Make 9 has the Highest Payment


##### Model for predicting the Claims ######
 

Angalakuduru Harsha Vardhan

Active Member
Alumni
Can someone help me how to submit the project??????
it shows only few formats to be uploaded.. How to convert R file into pdf or I can copy and paste the code
Some instructions are available in a word format in self learning tab in lms that may help you how to upload or in format we can upload the files.
 

Priya_170

Member
ma'am,for outliers detection i am using boxplot. boxplot(df$admit)
boxplot(df$gre) ## Outlier
boxplot(df$gpa) ## Outlier
pls help me in outlier treatment.
Project5: College Admission
 
Hello Co-learners,
I would like to share with you a package in R called esquisse.
install.packages("esquisse")
library(esquisse)
esquisser(dataset)

This addin helps us to visualize different plots with our dataset by just dragging and dropping the variables in the mentioned fields. Once you are satisfied with the plot you can copy the code for this and put it along side your code.
For more details please visit the link below.
https://cran.r-project.org/web/packages/esquisse/vignettes/get-started.html
I am sure you will all like this package, it makes visualization very comfortable and a breeze.
Have a great time with R!
 
Hello Co-learners,
I would like to share with you a package in R called esquisse.
install.packages("esquisse")
library(esquisse)
esquisser(dataset)

This addin helps us to visualize different plots with our dataset by just dragging and dropping the variables in the mentioned fields. Once you are satisfied with the plot you can copy the code for this and put it along side your code.
For more details please visit the link below.
https://cran.r-project.org/web/packages/esquisse/vignettes/get-started.html
I am sure you will all like this package, it makes visualization very comfortable and a breeze.
Have a great time with R!
I tried using this feature it's much more easy to make charts and visualizations in R this way, Thanks for the suggestion :).
 
HI mam I am not able to get invoice dates it is showing NA can u please help me on this
##loading Data
df_data <- read.csv("C:/Users/34nav/Downloads/Ecommerce.csv", stringsAsFactors=TRUE)
View(df_data)
###cleaning data#####
df_data<-df_data %>% mutate(Quantity = replace(Quantity,Quantity<=0,NA),UnitPrice = replace(UnitPrice,UnitPrice<=0,NA))
df_data
df_data<-df_data %>% mutate(InvoiceNo = as.factor(InvoiceNo),InvoiceDate = as.Date(InvoiceDate,"%d%M%Y"),StockCode = as.factor(StockCode),CustomerID = as.factor(CustomerID),Country = as.factor(Country))
df_data
 

Nimisha Pandey

Well-Known Member
Alumni
Trainer
I am stuck at Project 2: If anyone of you have completed Project 2 Please help

Question: Provide state wise status of complaints in a stacked bar chart.
here in this case there are total of 43 states , how this can be presented in a stacked chart
providing 43 states in a single bar chart may not be suitable... use only 5 - 6 in bar plot.. you can select like top 5 & bottom 5
 
Hi Nimisha, I have not attended the 22nd Jan 2021 class and the video recording do not have audio.
Request you to please provide any alternative so that i can get idea on the topics covered.
 
providing 43 states in a single bar chart may not be suitable... use only 5 - 6 in bar plot.. you can select like top 5 & bottom 5

i have tried with classifying State in Y axis.(horizondal bar) it looks ok. but it is congested still.

How do we sort the data frame for ascending order of no of frequencies in State? I could do it in a table, but we need to pass a data frame for ggplot. Please guide

Also there are 2 entries in District of Columbia, we see 2 of them cos of "upper case O" in one of them. how do i change it in the data?

Checking for duplicates
#District of Columbia 1 0 1
#District Of Columbia 14 2 16
# HOW TO HANDLE THIS?

Complaint.type had repetitive values in different character lengths and cases... How can I use a function to look for the value iin Complaint.type contains "Internet" and return a value in a new coloumn as "Internet issues" thereby i can classify them as
"Internet Issues"
"Billing Issues"
"service Issues"
"Others"
 
Last edited:

Angalakuduru Harsha Vardhan

Active Member
Alumni
Can you please upload a file regarding the packages which are need to be installed while using R based upon the agenda which was taught in the very first class. Will be a good support Thanks
 
Hello Co-learners,
I would like to share with you a package in R called esquisse.
install.packages("esquisse")
library(esquisse)
esquisser(dataset)

This addin helps us to visualize different plots with our dataset by just dragging and dropping the variables in the mentioned fields. Once you are satisfied with the plot you can copy the code for this and put it along side your code.
For more details please visit the link below.
https://cran.r-project.org/web/packages/esquisse/vignettes/get-started.html
I am sure you will all like this package, it makes visualization very comfortable and a breeze.
Have a great time with R!

Thank you Vivian! Nice of you!
 

SWATI SHAYNA

Active Member
Alumni
i have tried with classifying State in Y axis.(horizondal bar) it looks ok. but it is congested still.

How do we sort the data frame for ascending order of no of frequencies in State? I could do it in a table, but we need to pass a data frame for ggplot. Please guide

Also there are 2 entries in District of Columbia, we see 2 of them cos of "upper case O" in one of them. how do i change it in the data?

Checking for duplicates
#District of Columbia 1 0 1
#District Of Columbia 14 2 16
# HOW TO HANDLE THIS?

Complaint.type had repetitive values in different character lengths and cases... How can I use a function to look for the value iin Complaint.type contains "Internet" and return a value in a new coloumn as "Internet issues" thereby i can classify them as
"Internet Issues"
"Billing Issues"
"service Issues"
"Others"
If in case I have understood your query, following Loc might help you!

grep() can be used to match strings/characters/pattern within a string and you can ignore cases in order to select both lower and upper string.
Ex:Service_Issue <- grep("ervice", telecom$Customer.Complaint, ignore.case = FALSE)
 
# Convert data type of InvoiceDate from Factor to Date

> class(ecommerce$InvoiceDate)
[1] "factor"

> ecommerce$InvoiceDate <- as.Date(ecommerce$InvoiceDate, "%d/%m/%Y")
> class(ecommerce$InvoiceDate)
[1] "Date"

> ecommerce$InvoiceDate
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[32] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[63] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[94] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[125] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[156] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[187] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[218] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[249] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[280] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[311] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[342] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[373] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[404] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[435] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[466] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[497] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[528] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[559] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[590] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[621] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[652] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[683] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[714] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[745] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[776] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[807] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[838] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[869] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[900] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[931] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[962] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[993] NA NA NA NA NA NA NA NA
[ reached 'max' / getOption("max.print") -- omitted 405829 entries ]

@Nimisha ma'am, kindly help why am I getting NA on conversion?
 
Hi Nismisha, Could you please help us with 22-Jan-2021 audio recording issue.
Also, i have uploaded my project n 24th Jan and have not received any feedback on it. Could you please update on the same.
 
HI mam I am not able to get invoice dates it is showing NA can u please help me on this
##loading Data
df_data <- read.csv("C:/Users/34nav/Downloads/Ecommerce.csv", stringsAsFactors=TRUE)
View(df_data)
###cleaning data#####
df_data<-df_data %>% mutate(Quantity = replace(Quantity,Quantity<=0,NA),UnitPrice = replace(UnitPrice,UnitPrice<=0,NA))
df_data
df_data<-df_data %>% mutate(InvoiceNo = as.factor(InvoiceNo),InvoiceDate = as.Date(InvoiceDate,"%d%M%Y"),StockCode = as.factor(StockCode),CustomerID = as.factor(CustomerID),Country = as.factor(Country))
df_data


===============

library(lubridate)
Ecommerce$InvoiceDate <- dmy(Ecommerce$InvoiceDate)

I think it will help you
class(Ecommerce$InvoiceDate)
 
Top