Data Science with R | Samridhi

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Apr 5, 2019.

  1. Nishant_Singh

    Nishant_Singh Well-Known Member
    Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    251
    Likes Received:
    55
    #1
    keerti tsb and _51394 like this.
  2. Simant Setu

    Simant Setu New Member

    Joined:
    Mar 9, 2019
    Messages:
    1
    Likes Received:
    0
    Please share today's sessions(7th April) google drive link "Data Science with R" -started 6th April
     
    #2
  3. Jagadeesh R(2309)

    Jagadeesh R(2309) New Member
    Alumni

    Joined:
    Jun 9, 2014
    Messages:
    1
    Likes Received:
    0
    Its available in LMS, please check now
     
    #3
  4. _19142

    _19142 Active Member

    Joined:
    Dec 28, 2017
    Messages:
    23
    Likes Received:
    1
    #4
  5. Kunal Guwalani

    Kunal Guwalani Well-Known Member
    Simplilearn Support

    Joined:
    Jul 17, 2018
    Messages:
    193
    Likes Received:
    19
    #5
  6. Norah AlOtaibi

    Norah AlOtaibi Customer
    Customer

    Joined:
    Jun 15, 2019
    Messages:
    2
    Likes Received:
    0
    Hello there;
    I have a query about testing data file (.csv) for normality.
    To check if the data is normally distributed via visualization, do I have to check every column in my dataset? (I have 7 columns).
    + is there another way to check the entire dataset of different data type for normalization?

    Regards.
    Norah A.
     
    #6
  7. devkhivsare

    devkhivsare Member
    Alumni

    Joined:
    May 5, 2015
    Messages:
    3
    Likes Received:
    1
    Hi Nishant,
    Can you share with me the community link created for DS with R by Samridhi for the batch which started on 13-Jul
     
    #7
  8. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    196
    Likes Received:
    21
    Hi Norah,

    Please use a for loop to create visualization for each column.

    Regards,
    Samridhi
     
    #8
    Varun Chaturvedi likes this.
  9. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Samridhi,

    Could you please post the codes executed on July 28,2019(Sunday)?
    I don't find them in your google drive.
    Thank you.
     
    #9
  10. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    196
    Likes Received:
    21
    #10
  11. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi,

    How can i subtract one dataset from its parent dataset ?

    Thanks,
     
    #11
  12. _49336

    _49336 Member
    Alumni

    Joined:
    Nov 23, 2018
    Messages:
    8
    Likes Received:
    0
    Please let me know where I can find assignment files for class dated 03/8/2019
     
    #12
  13. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi Samridhi Mam,

    i want to replace the NA values in Age column of titanic dataset with its categorical median w.r.t title. That means i want to replace the median of age for Master title in all NA values in age column for records having Master title. Can you help me with this

    >mrms = filter(titanic_train, titanic_train$Title %in% c("Mr","Mrs","Ms", "Miss","Master"))
    > title_median_age = aggregate(Age~(Title) ,mrms,median,na.rm=TRUE)
    > title_median_age
    Title Age
    1 Master 3.5
    2 Miss 21.0
    3 Mr 30.0
    4 Mrs 35.0
    5 Ms 28.0
     
    #13
  14. Kunal Guwalani

    Kunal Guwalani Well-Known Member
    Simplilearn Support

    Joined:
    Jul 17, 2018
    Messages:
    193
    Likes Received:
    19
    Please check the drive and LMS for the same.
     
    #14
  15. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Thanks Samridhi!
    Could you now help me understand why the below code is not running?
    pie3D(x=Petal_Width,labels=df$Species,explode=.1,labelcex=.9,col=rainbow(3))

    The error msg is: could not find function "pie3D"
     
    #15
  16. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0

    You need to load the library(plotrix) - if you already have the package installed in your RStudio or else you need to install the package - install.packages("plotrix")
     
    #16
  17. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Thanks again!
    Samridhi, Could you please tell me how do i download the .R file(file consisting of codes) and exported graphs under "Home" in RLABS?
     
    #17
  18. Mary Ghosh

    Mary Ghosh New Member

    Joined:
    Jun 9, 2019
    Messages:
    1
    Likes Received:
    0
    Hi Samridhi,

    Could you please post the Project Discussion file in google drive so that i can access.
     
    #18
  19. MOUMITA CHOWDHURY

    Joined:
    Jul 6, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Samridhi,
    The output of Q6. is as below:
    > model2 = aov(TOTCHG~.,data = Patient_dataset)
    > summary(model2) Df Sum Sq Mean Sq F value Pr(>F)
    AGE 1 1.308e+08 1.308e+08 212.680 < 2e-16 ***
    FEMALE 1 6.610e+07 6.610e+07 107.467 < 2e-16 ***
    LOS 1 3.087e+09 3.087e+09 5018.365 < 2e-16 ***
    RACE 5 1.325e+07 2.649e+06 4.307 0.000781 ***
    APRDRG 62 3.984e+09 6.426e+07 104.461 < 2e-16 ***
    Residuals 429 2.639e+08 6.151e+05
    ---
    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    #does this mean AGE,FEMALE,LOS,APRDRG has an equal impact on hospital cost as they have the least and same p values
     
    #19
  20. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi Samridhi Mam,

    Can you please share the DS report template like how we have to prepare ?

    Thanks,
    Naveen
     
    #20
  21. Paras Mansukhbhai Viradia

    Joined:
    Jun 29, 2019
    Messages:
    2
    Likes Received:
    0
    Hi Mam
    Could you please share the hint file in G Drive for project

    Thanks
    Paras Viradia
     
    #21
  22. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    196
    Likes Received:
    21
    Hi Naveen,

    There is no template. Create a word document which has code, analysis and visualizations and upload the same in all the 3 tabs of lms.

    Regards,
    Samridhi
     
    #22
  23. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    196
    Likes Received:
    21
    Hi

    Please import the package plotrix. Please refer the code shared

    Regards,
    Samridhi
     
    #23
  24. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0

    Thanks Mam

    I have one doubt - if we are converting an continuous variable like age to categorical factor. Is it compulsory to have equal intervals or can we have non-uniform breaks and follow the process like usual.

    Please advise.
     
    #24
  25. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    196
    Likes Received:
    21
    Hi,

    It is preferred to have equal intervals according to percentile (as percentile indicates relative frequency) and ideally a factor column should have levels with equal frequency for sufficient representation by each level.

    Regards,
    Samridhi
     
    #25
  26. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    196
    Likes Received:
    21
    Hi,
    Please refer to the file Class8. Missing values imputation and Class8. DataPreprocessing. It has been solved there.

    Regards,
    Samridhi
     
    #26
  27. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0

    Thanks Mam,
    But
    When I use equal intervals, inference I am getting is very broad. I mean for example say 0-10 age group is significant for some business requirement but when we check the data its only age group between 0-3 has 90% of total 0-10 records. And we have few records in age group 4-10.

    So inferring that 0-10 age group where business need to focus might not be accurate. As 0-3 will be more accurate. So that business can focus on correct age group.

    Is my understanding correct? So can we use unequal intervals in the case?
     
    #27
  28. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi Samridhi Mam,

    Can you help me with the syntax for cross validation ( LOOCV)?
    I looked into the goggle drive but could not find one. And searching on internet is confusing
     
    #28
  29. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi Mam.
    I ran below code but not able to infer anything? Can you please advise ?

    > #K fold
    > # set seed
    > set.seed(100)
    > #define train control for k fold cross validation
    > train_control = trainControl(method = "cv", number = 10)
    > # Fit Navie Bayes model
    > k_model = train(admit ~ gre + gpa + rank,data = college,
    + trControl = train_control, method = "nb")
    >
    > print(k_model)
    Naive Bayes
    400 samples
    3 predictor
    2 classes: '0', '1'
    No pre-processing
    Resampling: Cross-Validated (10 fold)
    Summary of sample sizes: 360, 361, 359, 360, 361, 360, ...
    Resampling results across tuning parameters:
    usekernel Accuracy Kappa
    FALSE 0.6704941 0.22171451
    TRUE 0.6525578 0.06314192
    Tuning parameter 'fL' was held constant at a value of 0
    Tuning parameter 'adjust'
    was held constant at a value of 1
    Accuracy was used to select the optimal model using the largest value.
    The final values used for the model were fL = 0, usekernel = FALSE and adjust = 1.
    >
     
    #29
  30. Naveen Kumar_45

    Joined:
    Jul 6, 2019
    Messages:
    11
    Likes Received:
    0
    Hi Mam,

    We are doing Cross Validation to test our model(which we created) or the existing model like xgboost, random forest , SVM which will be better for our data set for predicting?
     
    #30
  31. Menaka Ilango

    Menaka Ilango New Member

    Joined:
    Aug 4, 2019
    Messages:
    1
    Likes Received:
    0
    hello Samridhi,

    Assignment: Aug 17th batch

    i=NULL
    j=NULL
    for(i in 4:1)
    { for(j in 4:1)
    { if((i-j)<0)
    cat(" ")
    else
    cat(1)
    }
    cat("\n")
    }
     
    #31
  32. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    196
    Likes Received:
    21
    Cross-validation is to check the variance error. Try out all. Choose the final model as the one with highest average accuracy.

    Regards,
    Samridhi
     
    #32
  33. Sbukamhayise

    Sbukamhayise New Member

    Joined:
    Aug 28, 2019
    Messages:
    1
    Likes Received:
    0
    Hi Samridhi

    Please see the solution for the assignment problem?

    # Write the tables of 2 and 3
    # 2X1 =2
    # 2X2 =4
    ?cat

    for(table in 2:3)
    {
    cat("\n","Table of",table,"\n")
    for(i in 1:10)
    {
    print(paste(table,"x",i,"=",table*i))
    }
    }

    Regards
    Sibusiso Goodwill Hlatshwayo
     
    #33
  34. Saikat Dan

    Saikat Dan New Member

    Joined:
    Aug 14, 2019
    Messages:
    1
    Likes Received:
    0
    Hi,

    In the titanic_train dataset, the general format for Names col = Braund, Mr. Owen Harris; However, there are some names like:
    Van Impe, Mrs. Jean Baptiste. Is there any function by which we can extract the characters between the first comma and the next occurrence of space?

    Regards,
    Dan
     
    #34
  35. Jude Rodriguez

    Jude Rodriguez Active Member

    Joined:
    Jul 18, 2019
    Messages:
    17
    Likes Received:
    1
    Maam, I have a query. please help. I have to split a column with header "a" into 2 columns "x" and "y" using " in " as a delimiter. Managed using str_split, but unable to assign the two split columns into headers "x" & "y". Please advise how to proceed. Please find the required files uploaded for reference.
     

    Attached Files:

    #35
  36. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    196
    Likes Received:
    21
    Hi,

    We'll discuss this in the class.

    Regards,
    Samridhi
     
    #36
  37. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Alumni Trainer

    Joined:
    Aug 16, 2017
    Messages:
    196
    Likes Received:
    21
    Let's discuss in the class!

    Regards,
    Samridhi
     
    #37
  38. Dilip pawar

    Dilip pawar New Member

    Joined:
    Oct 7, 2019
    Messages:
    1
    Likes Received:
    0
    How to add column in Titanic data set
     
    #38
  39. Uzma Fathima

    Uzma Fathima New Member

    Joined:
    Sep 4, 2019
    Messages:
    1
    Likes Received:
    0
    assignment answer : multiplication tables with r
    for (i in 1:10)
    {
    for (j in 1:10)
    print(paste(i,'x', j, '=', i*j))


    }
     
    #39
  40. Aaditya Jittha

    Joined:
    Dec 27, 2019
    Messages:
    5
    Likes Received:
    0
    Hi Samridhi,

    I have a query on the below code:

    Code:
    p = NULL
    for(i in 1:10){
    if(i==1)
    p = i
    else
    p = paste(p,i,sep=",")
    }
    p

    Can I use the cat function here instead of paste?
     
    #40
  41. Aaditya Jittha

    Joined:
    Dec 27, 2019
    Messages:
    5
    Likes Received:
    0
    Hi Samridhi,


    For the assignment I have created an answer like this:

    for(j in 2:10){
    for(i in 1:10){
    cat(j,"x",i,"=",j*i,"\n")
    }
    }
     
    #41
  42. Deepak Mishra_4

    Deepak Mishra_4 New Member

    Joined:
    May 31, 2019
    Messages:
    1
    Likes Received:
    0
    Hi Samridhi,

    I don't understand why read.fread is used to read a CSV file. Csv file happens to be my Dataset here.

    PFB Links for Question, DataSet and Solution Code I am referring to:

    #Google Drive Folder: Data Documents
    #URL: https://drive.google.com/open?id=1zX0cZakC3XFyPEZ-8MccjTlkFpiyYoVN

    #File Name: Economist_Assignment_Data.csv
    #URL: https://drive.google.com/open?id=1s8c-ruBBa8ihKJ8j1lgnPsR4mNt0hgQq

    #File Name: Assignment for ggplot2.html
    #URL: https://drive.google.com/open?id=1YJDXUgeYFdiHsbJa-m7RHI-qwRGHOAgn

    #File Name: Assignment for ggplot2 -Solution.html
    #URL: https://drive.google.com/open?id=1WpVBzXWP97sK8YbZDuv4hPYLIxr8FgB2

    Your response is awaited.

    Thanks.
     
    #42
  43. Manpreet Singh Budhail

    Joined:
    Mar 14, 2020
    Messages:
    2
    Likes Received:
    0
    Hi Mam,

    This is regarding the assignments about the regular expression for the course commenced on March 14, 2020.
    My answer for the regular expression is:

    grep(pattern= "[a-z]+\\@[a-z]+\\.[a-z]", x=email, value = T)
     
    #43
  44. Nitesh Gundra

    Nitesh Gundra New Member

    Joined:
    Feb 22, 2020
    Messages:
    1
    Likes Received:
    0
    Hi Samridhi,


    Assignemt 2 : find email id's from the vector Email
    Code : grep(pattern = "[A-Za-z][a-z]+\\@[a-z]+\\.[a-z]",x = email,value = T)->y
    y

    Output :
    [1] "samridhidutta@gmail.com" "ronitkumar@yahoo.com" "sonu.sharma@yahoo.in"

    3 emails id's were found in vector "email"

    Assignement 1 : Make a pyramid
    Code : cat("1","1","1","1","1","1","\n","1","1","1","1","1"," ","\n","1","1","1","1"," ","\n","1","1","1"," ","\n","1","1"," ","\n","1"," ",sep = " ","\n")
    Output :
    1 1 1 1 1 1
    1 1 1 1 1
    1 1 1 1
    1 1 1
    1 1
    Regards,
    Nitesh
     
    #44
  45. Visakh V

    Visakh V New Member

    Joined:
    Feb 29, 2020
    Messages:
    1
    Likes Received:
    0
    Hi Mam,

    Batch March_14. I want to know about the Assignment_Basics. There are 9 questions, We want to create code for all the questions desperately or in same coding. The first question assign to different variables of print only

    Regards
    Visakh V
     
    #45
  46. Bobbili Hemanth Kumar

    Joined:
    Jan 19, 2020
    Messages:
    2
    Likes Received:
    0
    hi mam, regarding regular expression below code is showing o/p as charcter[0]
    email = c("samridhidutta@gmail.com", "ronitkumar@yahoo.com", "samridhi.com@dutta", "sonu.sharma@yahoo.in", "@.netsamridhi")
    grep(pattern = "[a-z]+@[a-z]+//.[a-z]", x = email , value = T)->loc
    email[loc]

    can you tell me where is the mistake, if iam trying this code on my lab as new program it showing x not defined
     
    #46
  47. Prathyusha Patibandla

    Joined:
    Feb 28, 2020
    Messages:
    1
    Likes Received:
    0
    assignment - march 13th - Data science with R - Samridhi..
    Result pattern:
    1111
    111
    11
    1
    for (i in 1:4)
    {
    for (j in 4:i)
    {
    cat (1)
    }
    cat ("\n")
    for (k in 1:j )
    {
    cat(" ")
    }
    }
     
    #47
  48. Manpreet Singh Budhail

    Joined:
    Mar 14, 2020
    Messages:
    2
    Likes Received:
    0
    Hi Mam,

    This is regarding the assignments about the For and If Loops for the course on Data Science in R that have commenced on March 14, 2020.

    Assignment 1: Week 1
    To print an inverted pyramid of 1 in R

    Solution: The code is as follows:

    for (i in 4: 0){
    for(j in 0:i) cat("1")
    cat("\n")
    }

    Output:

    11111
    1111
    111
    11
    1
     
    #48
  49. raghuveer prasad

    Joined:
    Oct 11, 2019
    Messages:
    10
    Likes Received:
    1
    #Assignment
    #1111
    # 111
    # 11
    # 1
    #use nested for loops

    Code:

    x=1;
    for(i in 3:0){
    for (j in 0:x) cat(" ")
    for (j in 0:i) cat("1")
    cat("\n")
    x<-x+1
    }
     
    #49
  50. keerti tsb

    keerti tsb Member

    Joined:
    Sep 9, 2019
    Messages:
    4
    Likes Received:
    0
    Hi All, Please check my code below for assignment1 and reply if you see any issues.
    Thanks in advance.

    # Assignment 1
    preFix="#"
    filler=" "
    repeater="1"
    stringLen=5

    for(i in 1:stringLen-1)
    {
    strg=preFix
    for(j in (stringLen-i):stringLen)
    strg=paste(strg,filler,sep="")
    for(j in 1:(stringLen-i))
    strg=paste(strg,repeater,sep="")
    print(strg)
    }

    Output: (filler =" ")
    [1] "# 11111"
    [1] "# 1111"
    [1] "# 111"
    [1] "# 11"
    [1] "# 1"

    Output: (filler="*")
    [1] "#*11111"
    [1] "#**1111"
    [1] "#***111"
    [1] "#****11"
    [1] "#*****1"
     
    #50
    Last edited: Mar 18, 2020

Share This Page