Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Data Science with R | Pulkit Taneja | Apr 5

Prajesh Sortee

Active Member
Good evening sir
(Project 2 Related)
I had one doubt, after plotting the day wise and month wise count of complaints,
how to sort different complaint types what function should we use.
please help.
Hope this would be helpful: You can use grepl() function for pattern detection of the patterns like " internet", "speeds" for around internet and network issues respectively . Others belongs to other domain category
 

Prajesh Sortee

Active Member
Good evening sir
(Project 2 Related)
I had one doubt, after plotting the day-wise and month-wise count of complaints,
how to sort different complaint types what function should we use?
please help.
Your Question answer is mentioned above by Raymond George_1, I think you can relate with this.
 
#Normalize the gre col:
gre<-as.data.frame(lapply(datac$gre,normalize))
hist(gre, col="Red",main="Graduate record exam scores_after_Norm")


can u plz help me with this code,
 
Can anyone please help me for project#5-college_admission

Descriptive:
Categorize the average of grade point into High, Medium, and Low (with admission probability percentages) and plot it on a point chart.
 
Doubts in projects:-

Project2:- Comcast
---Provide a table with the frequency of complaint types. For this question,
freq_comp_types <- table(dataset$Customer.Complaint)
is this ok or need to do some other processing in Complaint values using GrpString?

Project5:- College Admission
-----Use variable reduction techniques to identify significant variables.
How to perform variable reduction??? I googled it and it showed something based on covariance. Should i go with it or any other technique?

Project7:- Healthcare cost analysis
----In the questions like below:-

1. To record the patient statistics, the agency wants to find the age category of people who frequently visit the hospital and has the maximum expenditure.
2. In order of severity of the diagnosis and treatments and to find out the expensive treatments, the agency wants to find the diagnosis-related group that has maximum hospitalization and expenditure.

"and" is used between two conditions. So whether it asks to combine both conditions in single expression or it asks to find results individually for both the variables.
----In project 5, there is a question to identify and treat outliers. So do we need to treat outliers for all the datasets for project? As dataset in Project7 has a lot of outliers that are removed completely by running the below process for 3 times.

ggplot(dataset)+
geom_boxplot(aes(x=TOTCHG))

out_vals <- boxplot.stats(dataset$TOTCHG)$out

out_indx <- which(dataset$TOTCHG %in% out_vals)

dataset[out_indx,]$TOTCHG <- NA

dataset$TOTCHG <- impute(dataset$TOTCHG, mean)

ggplot(dataset)+
geom_boxplot(aes(x=TOTCHG))

PFA boxplot of the same.View attachment 15417
Thanks.
Hi Sachin.. did you get this question answered?? which project did you submit? Even i am seeking for help on project
 
Hi Prajesh,

You can try the below code.

View attachment 15370

I hope this helps you.

Happy Learning !!!
This code is giving the below error:
Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
Error: Aesthetics must be valid data columns. Problematic aesthetic(s): y = count.
Did you mistype the name of a data column or forget to add after_stat()?
Run `rlang::last_error()` to see where the error occurred.
 
For Comcast complaint project, I am trying the below code but not getting any graph after running ggplot function. It shows a blank graph with no line/graph:
Please help:
# Import Data into R environment:

comcast_data <- read.csv(file = "Comcast_Telecom_Complaints_Dataset.csv",header = TRUE)
view(comcast_data)

#Manipulating column names

names(comcast_data) <- stri_replace_all(regex = "\\.",replacement = "",
str =names(comcast_data))
head(comcast_data)
na_vector <- is.na(comcast_data)
length(na_vector[na_vector==T])

comcast_data$Date = dmy(comcast_data$Date)
comcast_data$Date

monthly_count <- summarise(group_by(comcast_data,Month = as.integer(month(Date))),
Count = n())
daily_count<- summarise(group_by(comcast_data,Date),Count =n())
monthly_count<-arrange(monthly_count,Month)

ggplot(data = monthly_count, aes(Month,Count,label = Count))+
geom_line()+
geom_point(size = 0.8)+
geom_text()+
scale_x_continuous(breaks = monthly_count$Month)+
labs(title = "Monthly Ticket Count",x= "Months",y ="No. of Tickets")+
theme(plot.title = element_text(hjust = 0.5));
 
Does anyone has the link of Pulkit's session conducted on 26th April, Monday. I suppose it was the last session. The google drive has recording links for 22nd and 23rd April only.
 
Top