Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Data Science Certification Training - R Programming|Pratul|Apr 3-May 2

Suman Basu

Active Member
Alumni
Customer
Hi Everyone,

Please use this community thread for your Data Science with R discussion.

Regards,
Simplilearn
 
I am facing an error while using mat%%as.matrix(vec) : as non-conformable arrays. My code is
mat = matrix(1:6,nrow = 2, ncol = 3)
vec = c(10,20,30)
mat%%as.matrix(vec)
 
If the user entered operator (character) is not in any of the above then you can assign some default statement like:

switch(operator,
"+" = print(paste("Addition of two numbers is: ", number1 + number2)),
"-" = print(paste("Subtraction of two numbers is: ", number1 - number2)),
"*" = print(paste("Multiplication of two numbers is: ", number1 * number2)),
"^" = print(paste("Exponent of two numbers is: ", number1 ^ number2)),
"/" = print(paste("Division of two numbers is: ", number1 / number2)),
"%/%" = print(paste("Integer Division of two numbers is: ", number1 %/% number2)),
"%%" = print(paste("Division of two numbers is: ", number1 %% number2)),
print("default") # Default Statement
)
 
in the below code:

#Taking care of the missing Values
dataset$Age = ifelse(is.na(dataset$Age),
ave(dataset$Age,FUN = function(x) mean(x,na.rm = TRUE)),
dataset$Age)

can you please tell me in the part FUN = function(x), how by default entire Age column ins going into variable x ? We have not declared the value to x.
 
Hello Pratul,

Who can help me since I do not have the laboratories available. When I give launch lab I get a blank R page and there I was following your exercises but today my exercises that I had are showing an error in addition to the fact that in the top bar it appears that I have done 0/7 projects.

Today I got lost because I keep getting an R error and I couldn't do anything.

Let me know if you need screenshots, I need to do my labs and I can't and those labs will be disabled as soon as the course finish.
 

pratul.goyal111

Well-Known Member
in the below code:

#Taking care of the missing Values
dataset$Age = ifelse(is.na(dataset$Age),
ave(dataset$Age,FUN = function(x) mean(x,na.rm = TRUE)),
dataset$Age)

can you please tell me in the part FUN = function(x), how by default entire Age column ins going into variable x ? We have not declared the value to x.
Age column is the first argument under Ave so if the value is NULL it will be filled by average of all the other columns present
 

pratul.goyal111

Well-Known Member
Hello Pratul,

Who can help me since I do not have the laboratories available. When I give launch lab I get a blank R page and there I was following your exercises but today my exercises that I had are showing an error in addition to the fact that in the top bar it appears that I have done 0/7 projects.

Today I got lost because I keep getting an R error and I couldn't do anything.

Let me know if you need screenshots, I need to do my labs and I can't and those labs will be disabled as soon as the course finish.
There was an issue with the lab which I feel isd rectified please reach me out since it again do not work
 

Amal M_2

New Member
# 1 Sample T Test
# H0:mu>=30000

# If P Value > Alpha (100-confidence level) then accept H0
cars_data=read.csv(file.choose())
View(cars_data)
sedan_data=cars_data[cars_data$Type=='Sedan','MSRP']
t.test(sedan_data,mu=30000,alternative = 'less')
# One Sample t-test
#
# data: sedan_data
# t = -0.23512, df = 261, p-value = 0.4071
# alternative hypothesis: true mean is less than 30000
# 95 percent confidence interval:
# -Inf 31362.96
# sample estimates:
# mean of x
# 29773.62


# T Test
# H0:mu<=30000

t.test(sedan_data,mu=30000,alternative = 'greater')
# One Sample t-test
#
# data: sedan_data
# t = -0.23512, df = 261, p-value = 0.5929
# alternative hypothesis: true mean is greater than 30000
# 95 percent confidence interval:
# 28184.28 Inf
# sample estimates:
# mean of x
# 29773.62

In both cases we are accepting H0 as P Value > Alpha. How is it logicaly possible?
 
In regards to the Project, are we expected to clean the data in Excel before loading the data into R Studio OR should we include data cleaning in our code OR it doesn't matter which data-cleaning method we choose?
 
Hi Pratul,

I am going through 1st project and I can't able to understand the 2 datasets in it. I see more price columns. I am assuming that price is dependent variable and rest of the attributes are independent variable. Is my assumption correct?

can you explain this datasets? do I need to merge them into single dataset for analyzing?

Thanks & Regards,
Shravan Kumar Rama
 
Hi Pratul,

I am going through 1st project and I can't able to understand the 2 datasets in it. I see more price columns. I am assuming that price is dependent variable and rest of the attributes are independent variable. Is my assumption correct?

can you explain this datasets? do I need to merge them into single dataset for analyzing?

Thanks & Regards,
Shravan Kumar Rama
I ran into that problem. The files are *.xlsx, which R doesn't really like. You can open them in excel and export them as *.csv on your machine or use an online converter. Then the read.csv should work well.
 
I ran into that problem. The files are *.xlsx, which R doesn't really like. You can open them in excel and export them as *.csv on your machine or use an online converter. Then the read.csv should work well.
Thank you Katherine, have you started this project? I am trying to understand the requirement from 1st project.
 
Thank you Katherine, have you started this project? I am trying to understand the requirement from 1st project.
Hi! I looked at it, but chose not to do it. It needs a lot of data cleaning and it will take some regression to suggest dresses. It seemed too complicated for me right now.

I chose to work on Project 2 (comcast complaints) and am about halfway through it. It is mostly data visualization and sorting (as far as I can tell), which feels more manageable to me. Most of my time is spent on troubleshooting syntax right now.
 
Just to make sure I'm understanding the instructions...In Project 2, One of the directives says to...
"Provide state wise status of complaints in a stacked bar chart. Use the categorized variable from Q3."

The term "Q3" is ambiguous to me. Does it mean "Quarter 3," as in the months of the year in the third quarter, July, August and September?

OR Does it mean "Question 3," which I find more confusing because the directives are not numbered and the bulleted list is formatted strangely.

PS- Is anyone reading these??? Answers would be nice...
 
Hi Pratul, I am working on Comcast Telecom Consumer Complaints project so i have done few parts but in few cases i am facing problems so here i am mentioning few questions and also sharing my code.

q1)
  • Which complaint types are maximum i.e., around internet, network issues, or across any other domains.
- Create a new categorical variable with value as Open and Closed. Open & Pending is to be categorized as Open and Closed & Solved is to be categorized as Closed.

-Provide state wise status of complaints in a stacked bar chart. Use the categorized variable from Q3.

What is Q3 means here?

  • Which state has the highest percentage of unresolved complaints( how will i get the percentage)
I have filter out the pending complaint types but not bringing out the percentage:
see code:
comcast %>% filter(Status=='Pending') %>% select(State,Status) %>% count(State,Status)


Provide the percentage of complaints resolved till date, which were received through theInternet and customer care calls.


Now i am sharing my code:
#Which state has the maximum complaints( is this code right for this question?)


statewise_complaint <- summarise(group_by(comcast,state=tolower(State)),Count=n())
View(statewise_complaint)

statewise <- arrange(statewise_complaint,desc(Count))
View(statewise)

ggplot(statewise,mapping = aes(x=state,y=Count,fill=state))+geom_bar(stat = 'identity')+
scale_x_discrete(breaks=statewise$state)+geom_label(aes(label=Count))
 
Just to make sure I'm understanding the instructions...In Project 2, One of the directives says to...
"Provide state wise status of complaints in a stacked bar chart. Use the categorized variable from Q3."

The term "Q3" is ambiguous to me. Does it mean "Quarter 3," as in the months of the year in the third quarter, July, August and September?

OR Does it mean "Question 3," which I find more confusing because the directives are not numbered and the bulleted list is formatted strangely.

PS- Is anyone reading these??? Answers would be nice...
This is all I got back from Support (see attached). I am still confused. What does "Q3" mean???
 

Attachments

  • response.png
    response.png
    52.1 KB · Views: 22
I am working on Project 2 (Comcast Complaints).
I am currently working on the part that asks which state has the highest percentage of unresolved complaints.
To find this, I need to divide the number of Open complaints by the Total Number of Complaints per State (*100%).

I have a data frame that is Complaints_By_State that includes 42 obs. of 2 variables (State and its corresponding Total Number of Complaints).

I also have a data frame that is Open_By_State. <-- I am having trouble with this one. Right now, my Open_By_State collects all the "Open" complaint statuses, but doesn't separate by State. I want to get be 42 obs. of 2 variables (State and its corresponding number of Open Complaints) however, my code returns 517 obs. of 11 variables (not what I want).

Here is my code for Open_By_State:
Open_By_State <- complaints[which(complaints$Complaint_Status == "Open"),]

How can I get it to return just State and Open status?
 

Attachments

  • Open_By_State df.png
    Open_By_State df.png
    28.6 KB · Views: 17
  • Complaints_By_State df.png
    Complaints_By_State df.png
    16.8 KB · Views: 17
Top