Data Science with R | Samridhi

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Apr 5, 2019.

  1. Nishant_Singh

    Nishant_Singh Well-Known Member
    Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    242
    Likes Received:
    41
    #1
    _51394 likes this.
  2. Simant Setu

    Simant Setu New Member

    Joined:
    Mar 9, 2019
    Messages:
    1
    Likes Received:
    0
    Please share today's sessions(7th April) google drive link "Data Science with R" -started 6th April
     
    #2
  3. Jagadeesh R(2309)

    Jagadeesh R(2309) New Member
    Alumni

    Joined:
    Jun 9, 2014
    Messages:
    1
    Likes Received:
    0
    Its available in LMS, please check now
     
    #3
  4. _19142

    _19142 Active Member

    Joined:
    Dec 28, 2017
    Messages:
    23
    Likes Received:
    1
    #4
  5. Kunal Guwalani

    Kunal Guwalani Well-Known Member
    Simplilearn Support

    Joined:
    Jul 17, 2018
    Messages:
    177
    Likes Received:
    14
    #5
  6. Norah AlOtaibi

    Norah AlOtaibi Customer
    Customer

    Joined:
    Jun 15, 2019
    Messages:
    2
    Likes Received:
    0
    Hello there;
    I have a query about testing data file (.csv) for normality.
    To check if the data is normally distributed via visualization, do I have to check every column in my dataset? (I have 7 columns).
    + is there another way to check the entire dataset of different data type for normalization?

    Regards.
    Norah A.
     
    #6
  7. devkhivsare

    devkhivsare Member
    Alumni

    Joined:
    May 5, 2015
    Messages:
    3
    Likes Received:
    1
    Hi Nishant,
    Can you share with me the community link created for DS with R by Samridhi for the batch which started on 13-Jul
     
    #7
  8. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    Hi Norah,

    Please use a for loop to create visualization for each column.

    Regards,
    Samridhi
     
    #8
  9. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Samridhi,

    Could you please post the codes executed on July 28,2019(Sunday)?
    I don't find them in your google drive.
    Thank you.
     
    #9
  10. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    #10
  11. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi,

    How can i subtract one dataset from its parent dataset ?

    Thanks,
     
    #11
  12. _49336

    _49336 Member
    Alumni

    Joined:
    Nov 23, 2018
    Messages:
    2
    Likes Received:
    0
    Please let me know where I can find assignment files for class dated 03/8/2019
     
    #12
  13. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi Samridhi Mam,

    i want to replace the NA values in Age column of titanic dataset with its categorical median w.r.t title. That means i want to replace the median of age for Master title in all NA values in age column for records having Master title. Can you help me with this

    >mrms = filter(titanic_train, titanic_train$Title %in% c("Mr","Mrs","Ms", "Miss","Master"))
    > title_median_age = aggregate(Age~(Title) ,mrms,median,na.rm=TRUE)
    > title_median_age
    Title Age
    1 Master 3.5
    2 Miss 21.0
    3 Mr 30.0
    4 Mrs 35.0
    5 Ms 28.0
     
    #13
  14. Kunal Guwalani

    Kunal Guwalani Well-Known Member
    Simplilearn Support

    Joined:
    Jul 17, 2018
    Messages:
    177
    Likes Received:
    14
    Please check the drive and LMS for the same.
     
    #14
  15. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Thanks Samridhi!
    Could you now help me understand why the below code is not running?
    pie3D(x=Petal_Width,labels=df$Species,explode=.1,labelcex=.9,col=rainbow(3))

    The error msg is: could not find function "pie3D"
     
    #15
  16. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0

    You need to load the library(plotrix) - if you already have the package installed in your RStudio or else you need to install the package - install.packages("plotrix")
     
    #16
  17. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Thanks again!
    Samridhi, Could you please tell me how do i download the .R file(file consisting of codes) and exported graphs under "Home" in RLABS?
     
    #17
  18. Mary Ghosh

    Mary Ghosh New Member

    Joined:
    Jun 9, 2019
    Messages:
    1
    Likes Received:
    0
    Hi Samridhi,

    Could you please post the Project Discussion file in google drive so that i can access.
     
    #18
  19. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Samridhi,
    The output of Q6. is as below:
    > model2 = aov(TOTCHG~.,data = Patient_dataset)
    > summary(model2) Df Sum Sq Mean Sq F value Pr(>F)
    AGE 1 1.308e+08 1.308e+08 212.680 < 2e-16 ***
    FEMALE 1 6.610e+07 6.610e+07 107.467 < 2e-16 ***
    LOS 1 3.087e+09 3.087e+09 5018.365 < 2e-16 ***
    RACE 5 1.325e+07 2.649e+06 4.307 0.000781 ***
    APRDRG 62 3.984e+09 6.426e+07 104.461 < 2e-16 ***
    Residuals 429 2.639e+08 6.151e+05
    ---
    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    #does this mean AGE,FEMALE,LOS,APRDRG has an equal impact on hospital cost as they have the least and same p values
     
    #19
  20. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi Samridhi Mam,

    Can you please share the DS report template like how we have to prepare ?

    Thanks,
    Naveen
     
    #20
  21. Paras Mansukhbhai Viradia

    Joined:
    Jun 29, 2019
    Messages:
    2
    Likes Received:
    0
    Hi Mam
    Could you please share the hint file in G Drive for project

    Thanks
    Paras Viradia
     
    #21
  22. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    Hi Naveen,

    There is no template. Create a word document which has code, analysis and visualizations and upload the same in all the 3 tabs of lms.

    Regards,
    Samridhi
     
    #22
  23. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    Hi

    Please import the package plotrix. Please refer the code shared

    Regards,
    Samridhi
     
    #23
  24. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0

    Thanks Mam

    I have one doubt - if we are converting an continuous variable like age to categorical factor. Is it compulsory to have equal intervals or can we have non-uniform breaks and follow the process like usual.

    Please advise.
     
    #24
  25. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    Hi,

    It is preferred to have equal intervals according to percentile (as percentile indicates relative frequency) and ideally a factor column should have levels with equal frequency for sufficient representation by each level.

    Regards,
    Samridhi
     
    #25
  26. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    Hi,
    Please refer to the file Class8. Missing values imputation and Class8. DataPreprocessing. It has been solved there.

    Regards,
    Samridhi
     
    #26
  27. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0

    Thanks Mam,
    But
    When I use equal intervals, inference I am getting is very broad. I mean for example say 0-10 age group is significant for some business requirement but when we check the data its only age group between 0-3 has 90% of total 0-10 records. And we have few records in age group 4-10.

    So inferring that 0-10 age group where business need to focus might not be accurate. As 0-3 will be more accurate. So that business can focus on correct age group.

    Is my understanding correct? So can we use unequal intervals in the case?
     
    #27
  28. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi Samridhi Mam,

    Can you help me with the syntax for cross validation ( LOOCV)?
    I looked into the goggle drive but could not find one. And searching on internet is confusing
     
    #28
  29. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi Mam.
    I ran below code but not able to infer anything? Can you please advise ?

    > #K fold
    > # set seed
    > set.seed(100)
    > #define train control for k fold cross validation
    > train_control = trainControl(method = "cv", number = 10)
    > # Fit Navie Bayes model
    > k_model = train(admit ~ gre + gpa + rank,data = college,
    + trControl = train_control, method = "nb")
    >
    > print(k_model)
    Naive Bayes
    400 samples
    3 predictor
    2 classes: '0', '1'
    No pre-processing
    Resampling: Cross-Validated (10 fold)
    Summary of sample sizes: 360, 361, 359, 360, 361, 360, ...
    Resampling results across tuning parameters:
    usekernel Accuracy Kappa
    FALSE 0.6704941 0.22171451
    TRUE 0.6525578 0.06314192
    Tuning parameter 'fL' was held constant at a value of 0
    Tuning parameter 'adjust'
    was held constant at a value of 1
    Accuracy was used to select the optimal model using the largest value.
    The final values used for the model were fL = 0, usekernel = FALSE and adjust = 1.
    >
     
    #29
  30. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi Mam,

    We are doing Cross Validation to test our model(which we created) or the existing model like xgboost, random forest , SVM which will be better for our data set for predicting?
     
    #30
  31. Menaka Ilango

    Menaka Ilango New Member

    Joined:
    Aug 4, 2019
    Messages:
    1
    Likes Received:
    0
    hello Samridhi,

    Assignment: Aug 17th batch

    i=NULL
    j=NULL
    for(i in 4:1)
    { for(j in 4:1)
    { if((i-j)<0)
    cat(" ")
    else
    cat(1)
    }
    cat("\n")
    }
     
    #31
  32. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    Cross-validation is to check the variance error. Try out all. Choose the final model as the one with highest average accuracy.

    Regards,
    Samridhi
     
    #32
  33. Sbukamhayise

    Sbukamhayise New Member

    Joined:
    Aug 28, 2019
    Messages:
    1
    Likes Received:
    0
    Hi Samridhi

    Please see the solution for the assignment problem?

    # Write the tables of 2 and 3
    # 2X1 =2
    # 2X2 =4
    ?cat

    for(table in 2:3)
    {
    cat("\n","Table of",table,"\n")
    for(i in 1:10)
    {
    print(paste(table,"x",i,"=",table*i))
    }
    }

    Regards
    Sibusiso Goodwill Hlatshwayo
     
    #33
  34. Saikat Dan

    Saikat Dan New Member

    Joined:
    Aug 14, 2019
    Messages:
    1
    Likes Received:
    0
    Hi,

    In the titanic_train dataset, the general format for Names col = Braund, Mr. Owen Harris; However, there are some names like:
    Van Impe, Mrs. Jean Baptiste. Is there any function by which we can extract the characters between the first comma and the next occurrence of space?

    Regards,
    Dan
     
    #34
  35. Jude Rodriguez

    Joined:
    Jul 18, 2019
    Messages:
    10
    Likes Received:
    1
    Maam, I have a query. please help. I have to split a column with header "a" into 2 columns "x" and "y" using " in " as a delimiter. Managed using str_split, but unable to assign the two split columns into headers "x" & "y". Please advise how to proceed. Please find the required files uploaded for reference.
     

    Attached Files:

    #35
  36. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    Hi,

    We'll discuss this in the class.

    Regards,
    Samridhi
     
    #36
  37. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    Let's discuss in the class!

    Regards,
    Samridhi
     
    #37
  38. Dilip pawar

    Dilip pawar New Member

    Joined:
    Oct 7, 2019
    Messages:
    1
    Likes Received:
    0
    How to add column in Titanic data set
     
    #38

Share This Page