Separate names with a comma.
Recommended. Know people from your network.
Don't have an account?Sign up Now
To reset your password, enter the email address you registered with and we"ll send your instructions on their way.
Discussion in 'Big Data and Analytics' started by _22998, Mar 18, 2018.
This thread is for Data_Science_with_R started on 10 march
this is the link for anderson sweeney williams statistics for business and economics pdf
I have 1 issue in R Studio desktop while Import Dataset in which didn't show from .csv file so what to do for that ?
HI ! What type of projects we need to submit when I am taking the course Data Science with R.??
hp<-read.csv(file.choose(), header = T)
use this line of code for import dataset,if u didnt get the import block maybe the window is not in front of you, kindly
press alt+enter to check open tabs. i hope this will work
simple projects with knowledge of algorithms discussed in live class and bit tricky approach will be sufficient to submit he project
In the "Projects for R" attachment in simplilearn (Data science with R _ Downloads) you will see 4 zip folders (Insurance, Retail,Internet & Healthcare). You will have submit one of those projects
Can you tell me, Where our Trainer Nimisha attached the class room training R codes.
Let me know the updates.
@Nimisha Pandey Hi Nimisha just wanted to clarify when is the Data Science (started Mar'10th) Course with R programming actually getting over, because I'm a little worried about the project submission time. The Web ex recordings says Mar'10th through Apr'8th, but scheduled classes are only till Apr'1st.
Thanks Jitendra for solution
jeny the course will end on 1st april as per the LMS showing but it depends upon trainer to extend the limit of classes of batch. so if Nimisha mam want to take few more session in the batch then we will get links through mails to attend the class and to download the recording of the class
@Nimisha Pandey So the dates mentioned in the web-ex are irrelevant ?
the Project files are .Zip files and not able to open the CSV file within that in the Rstudio lab, did any one try to open , pls share the steps
You have to right-click on the zip file and extract the data from it. Then you will find the CSV file inside it which you need to use in the RStudio.
I am sure this will help you.
The dates mentioned in LMS are absolutely relevant. As we have extended the session by 2 days since all the learners will be doing their project submission in the last session. So, I would like to inform you that, kindly follow the instructions given on the below mentioned community thread which will guide you on which projects you can work and you have submit your project on the last day.
We have 3 sessions to go before that, so, if you will start referring to the projects and choose any 1 domain on which you would like to work on. And you will be able to ask your general queries related to the same before the last session and successfully submit your project in the last session.
i hope this will help you. All the very best !!!
Please find attached codes for Z test, t test (Basic Stats Questions), Anova and Chi square test and also the titanic case study attached here.
Thanks and Best
Link to description about Linear Regression with explanation of all the outputs for the lm() function
The most common metrics to look at while selecting the model are:
R-Squared Higher the better (> 0.70)
Adj R-Squared Higher the better
F-Statistic Higher the better
Std. Error Closer to zero the better
t-statistic Should be greater 1.96 for p-value to be less than 0.05
AIC Lower the better
BIC Lower the better
Mallows cp Should be close to the number of predictors in model
MAPE (Mean absolute percentage error) Lower the better
MSE (Mean squared error) Lower the better
Min_Max Accuracy => mean(min(actual, predicted)/max(actual, predicted)) Higher the better
Nimisha - as mentioned in the class on April 1st, the flight data logistic regression demo project is throwing error when I am trying to run confusion.matrix.on test. The error I get is something like "number of observations do not match the number of predictions"
Please find attached codes for Linear regression, Logistic regression and decision tree attached here.
Thanks and Best
Good morning Nimisha,I belong to the R training program (Jul16 -Aug1), the following are my doubts,
1-usage of abs()
2-Usage of print along with paste in FOR loop
3-Prime number program, limit of numbers in for loop
4- Why is break needed in while loop when it doesn't auto increase
Titanic train dataset
1-Why did u change survived column to factor type ?
2-Why ticket column is in factor format?
3-What is tapply used for?
4- Why is the data type of 'name' factor?
5-in sum(is.na(x)), what is x?
1.abs() – used to calculate absolute value of any number
2.print function is used for explicit printing along with paste which allows us to combine multiple values and text and print it.
3.For prime no. program the limit of numbers is based on the properties of prime nos. i.e. they are divisible by only 1 and themselves. And since we are looking for only whole no factors so we need not go beyond num/2.
4.Break is not “needed” in while loop. It depends on the logical flow of the program when you need to break out of any loop.
5.Summary stats are Descriptive stats – mean median Min max and the two quartiles
1.As discussed in class all the variables that are of categorical type need to be converted factor data type.
2.By default all character columns are factor type also ticket is a nominal variable hence needs to be factor format
3.tapply is used to apply any function on a dataframe bsed on a factor.. you can go through the slides for detailed explanation
4.factor refers to categorical data type in R.
5.sum(is.na(x))—here x is the variable for which the no. of na values you are trying to claculate