Data Science with R | Samridhi

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Apr 5, 2019.

  1. Nishant_Singh

    Nishant_Singh Well-Known Member
    Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    218
    Likes Received:
    30
    #1
    _51394 likes this.
  2. Simant Setu

    Simant Setu New Member

    Joined:
    Mar 9, 2019
    Messages:
    1
    Likes Received:
    0
    Please share today's sessions(7th April) google drive link "Data Science with R" -started 6th April
     
    #2
  3. Jagadeesh R(2309)

    Jagadeesh R(2309) New Member
    Alumni

    Joined:
    Jun 9, 2014
    Messages:
    1
    Likes Received:
    0
    Its available in LMS, please check now
     
    #3
  4. _19142

    _19142 Active Member

    Joined:
    Dec 28, 2017
    Messages:
    23
    Likes Received:
    1
    #4
  5. Kunal Guwalani

    Kunal Guwalani Well-Known Member
    Simplilearn Support

    Joined:
    Jul 17, 2018
    Messages:
    160
    Likes Received:
    12
    #5
  6. Norah AlOtaibi

    Norah AlOtaibi Customer
    Customer

    Joined:
    Jun 15, 2019
    Messages:
    2
    Likes Received:
    0
    Hello there;
    I have a query about testing data file (.csv) for normality.
    To check if the data is normally distributed via visualization, do I have to check every column in my dataset? (I have 7 columns).
    + is there another way to check the entire dataset of different data type for normalization?

    Regards.
    Norah A.
     
    #6
  7. devkhivsare

    devkhivsare Member
    Alumni

    Joined:
    May 5, 2015
    Messages:
    3
    Likes Received:
    1
    Hi Nishant,
    Can you share with me the community link created for DS with R by Samridhi for the batch which started on 13-Jul
     
    #7
  8. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    154
    Likes Received:
    19
    Hi Norah,

    Please use a for loop to create visualization for each column.

    Regards,
    Samridhi
     
    #8
  9. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Samridhi,

    Could you please post the codes executed on July 28,2019(Sunday)?
    I don't find them in your google drive.
    Thank you.
     
    #9
  10. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    154
    Likes Received:
    19
    #10
  11. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    6
    Likes Received:
    0
    Hi,

    How can i subtract one dataset from its parent dataset ?

    Thanks,
     
    #11
  12. _49336

    _49336 Member
    Alumni

    Joined:
    Nov 23, 2018
    Messages:
    2
    Likes Received:
    0
    Please let me know where I can find assignment files for class dated 03/8/2019
     
    #12
  13. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    6
    Likes Received:
    0
    Hi Samridhi Mam,

    i want to replace the NA values in Age column of titanic dataset with its categorical median w.r.t title. That means i want to replace the median of age for Master title in all NA values in age column for records having Master title. Can you help me with this

    >mrms = filter(titanic_train, titanic_train$Title %in% c("Mr","Mrs","Ms", "Miss","Master"))
    > title_median_age = aggregate(Age~(Title) ,mrms,median,na.rm=TRUE)
    > title_median_age
    Title Age
    1 Master 3.5
    2 Miss 21.0
    3 Mr 30.0
    4 Mrs 35.0
    5 Ms 28.0
     
    #13
  14. Kunal Guwalani

    Kunal Guwalani Well-Known Member
    Simplilearn Support

    Joined:
    Jul 17, 2018
    Messages:
    160
    Likes Received:
    12
    Please check the drive and LMS for the same.
     
    #14
  15. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Thanks Samridhi!
    Could you now help me understand why the below code is not running?
    pie3D(x=Petal_Width,labels=df$Species,explode=.1,labelcex=.9,col=rainbow(3))

    The error msg is: could not find function "pie3D"
     
    #15
  16. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    6
    Likes Received:
    0

    You need to load the library(plotrix) - if you already have the package installed in your RStudio or else you need to install the package - install.packages("plotrix")
     
    #16
  17. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Thanks again!
    Samridhi, Could you please tell me how do i download the .R file(file consisting of codes) and exported graphs under "Home" in RLABS?
     
    #17
  18. Mary Ghosh

    Mary Ghosh New Member

    Joined:
    Jun 9, 2019
    Messages:
    1
    Likes Received:
    0
    Hi Samridhi,

    Could you please post the Project Discussion file in google drive so that i can access.
     
    #18
  19. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Samridhi,
    The output of Q6. is as below:
    > model2 = aov(TOTCHG~.,data = Patient_dataset)
    > summary(model2) Df Sum Sq Mean Sq F value Pr(>F)
    AGE 1 1.308e+08 1.308e+08 212.680 < 2e-16 ***
    FEMALE 1 6.610e+07 6.610e+07 107.467 < 2e-16 ***
    LOS 1 3.087e+09 3.087e+09 5018.365 < 2e-16 ***
    RACE 5 1.325e+07 2.649e+06 4.307 0.000781 ***
    APRDRG 62 3.984e+09 6.426e+07 104.461 < 2e-16 ***
    Residuals 429 2.639e+08 6.151e+05
    ---
    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    #does this mean AGE,FEMALE,LOS,APRDRG has an equal impact on hospital cost as they have the least and same p values
     
    #19
  20. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    6
    Likes Received:
    0
    Hi Samridhi Mam,

    Can you please share the DS report template like how we have to prepare ?

    Thanks,
    Naveen
     
    #20
  21. Paras Mansukhbhai Viradia

    Joined:
    Jun 29, 2019
    Messages:
    1
    Likes Received:
    0
    Hi Mam
    Could you please share the hint file in G Drive for project

    Thanks
    Paras Viradia
     
    #21
  22. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    154
    Likes Received:
    19
    Hi Naveen,

    There is no template. Create a word document which has code, analysis and visualizations and upload the same in all the 3 tabs of lms.

    Regards,
    Samridhi
     
    #22
  23. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    154
    Likes Received:
    19
    Hi

    Please import the package plotrix. Please refer the code shared

    Regards,
    Samridhi
     
    #23
  24. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    6
    Likes Received:
    0

    Thanks Mam

    I have one doubt - if we are converting an continuous variable like age to categorical factor. Is it compulsory to have equal intervals or can we have non-uniform breaks and follow the process like usual.

    Please advise.
     
    #24
  25. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    154
    Likes Received:
    19
    Hi,

    It is preferred to have equal intervals according to percentile (as percentile indicates relative frequency) and ideally a factor column should have levels with equal frequency for sufficient representation by each level.

    Regards,
    Samridhi
     
    #25
  26. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    154
    Likes Received:
    19
    Hi,
    Please refer to the file Class8. Missing values imputation and Class8. DataPreprocessing. It has been solved there.

    Regards,
    Samridhi
     
    #26
  27. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    6
    Likes Received:
    0

    Thanks Mam,
    But
    When I use equal intervals, inference I am getting is very broad. I mean for example say 0-10 age group is significant for some business requirement but when we check the data its only age group between 0-3 has 90% of total 0-10 records. And we have few records in age group 4-10.

    So inferring that 0-10 age group where business need to focus might not be accurate. As 0-3 will be more accurate. So that business can focus on correct age group.

    Is my understanding correct? So can we use unequal intervals in the case?
     
    #27

Share This Page