DS WITH R | JULY 13 - AUG 17 | Samridhi

Discussion in 'Big Data and Analytics' started by Kunal Guwalani, Jul 15, 2019.

  1. Kunal Guwalani

    Kunal Guwalani Well-Known Member
    Simplilearn Support

    Joined:
    Jul 17, 2018
    Messages:
    177
    Likes Received:
    14
    Hi Learners,

    Please use this thread for Data Science with R queries.

    Team Simplilearn!
     
    #1
  2. devkhivsare

    devkhivsare Member
    Alumni

    Joined:
    May 5, 2015
    Messages:
    3
    Likes Received:
    1
    Hi,
    Is this the same community created for Jul 13 batch for Samridhi which she is taking on weekends
     
    #2
  3. Kunal Guwalani

    Kunal Guwalani Well-Known Member
    Simplilearn Support

    Joined:
    Jul 17, 2018
    Messages:
    177
    Likes Received:
    14
    Yes it is for Jul 13 batch taken by Samridhi.
     
    #3
  4. TUSHAR PAHADE

    TUSHAR PAHADE Member

    Joined:
    Jun 29, 2019
    Messages:
    3
    Likes Received:
    0
    Hello M'am
    I am Tushar, Please solve my problem. "dplyr" is not working in my R Studio.
    "library(dplyr)
    Error in library(dplyr) : there is no package called ‘dplyr
    ’"
    and also this error for "hflights"
     
    #4
  5. Jude Rodriguez

    Joined:
    Jul 18, 2019
    Messages:
    10
    Likes Received:
    1
    Hello Ma'am,
    I need your guidance. I am trying to split a single column, multiple rows data set in the following order:

    [1] Name1
    [2] Address1 line1
    [3] Address2 line2
    [4] Tel1
    [5] Fax1
    [6] Email website business1
    [7] Name2
    [8] Address2 line1
    [9] Address2 line2
    [10] Address2 line3.........

    into multiple columns, multiple rows data set in the following order:
    Name Address Tel Fax Email Website Business
    1
    2
    3
    4
    5...

    so far the closest I can come to is separate() from Tidyverse package, which is more of strsplit() with regex, but I cannot use it for my project. Please advise on how to proceed.
    Did a recheck and i can use dim()=c(#rows#,#columns#). issue is, i dont want to specify number of rows. And in some entries, address has two lines, in some, there are 3 lines. in some entries, there is an added description. Also, have several such files with the same task (24 in total). Is there a way to use for() with working directory and apply() to all excel sheets?
    I have attached a sample file. Please help.
     

    Attached Files:

    #5
    Last edited: Aug 8, 2019
  6. Jude Rodriguez

    Joined:
    Jul 18, 2019
    Messages:
    10
    Likes Received:
    1
    Tushar, sorry to interrupt, but pls try install.packages("dplyr") and run before library(dplyr). For hflights, pls type library(hflghts) and run.
     
    #6
    TUSHAR PAHADE likes this.
  7. Jude Rodriguez

    Joined:
    Jul 18, 2019
    Messages:
    10
    Likes Received:
    1
    There is an error while loadig dplyr package and using the fuctions from dplyr package : "
    Error: package or namespace load failed for ‘dplyr’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[]):
    namespace ‘rlang’ 0.3.0.1 is already loaded, but >= 0.4.0 is required"
    Please help.
     
    #7
  8. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    Hi,

    Please install the package in your system using following command:

    install.packages("dplyr")
    library("dplyr")

    Regards,
    Samridhi
     
    #8
  9. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    Hi,

    Please install the dataset first.

    Regards,
    Samridhi
     
    #9
  10. Jude Rodriguez

    Joined:
    Jul 18, 2019
    Messages:
    10
    Likes Received:
    1
    > library("dplyr")
    Error: package or namespace load failed for ‘dplyr’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[]):
    namespace ‘rlang’ 0.3.0.1 is already loaded, but >= 0.4.0 is required
    Error in fetch(key) :
    lazy-load database '/home/rodriguezzude_gmail/R/x86_64-pc-linux-gnu-library/3.4/dplyr/help/dplyr.rdb' is corrupt
    Error in fetch(key) :
    lazy-load database '/home/rodriguezzude_gmail/R/x86_64-pc-linux-gnu-library/3.4/dplyr/help/dplyr.rdb' is corrupt
    Error in fetch(key) :
    lazy-load database '/home/rodriguezzude_gmail/R/x86_64-pc-linux-gnu-library/3.4/dplyr/help/dplyr.rdb' is corrupt
    Error in fetch(key) :
    lazy-load database '/home/rodriguezzude_gmail/R/x86_64-pc-linux-gnu-library/3.4/dplyr/help/dplyr.rdb' is corrupt
    Error in fetch(key) :
    lazy-load database '/home/rodriguezzude_gmail/R/x86_64-pc-linux-gnu-library/3.4/dplyr/help/dplyr.rdb' is corrupt
    Error in fetch(key) :
    lazy-load database '/home/rodriguezzude_gmail/R/x86_64-pc-linux-gnu-library/3.4/dplyr/help/dplyr.rdb' is corrupt
    Error in fetch(key) :
    lazy-load database '/home/rodriguezzude_gmail/R/x86_64-pc-linux-gnu-library/3.4/dplyr/help/dplyr.rdb' is corrupt
    > library(dplyr)Error: package or namespace load failed for ‘dplyr’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[]):
    namespace ‘rlang’ 0.3.0.1 is already loaded, but >= 0.4.0 is required
    Error in fetch(key) :
    lazy-load database '/home/rodriguezzude_gmail/R/x86_64-pc-linux-gnu-library/3.4/dplyr/help/dplyr.rdb' is corrupt
    Error in fetch(key) :
    lazy-load database '/home/rodriguezzude_gmail/R/x86_64-pc-linux-gnu-library/3.4/dplyr/help/dplyr.rdb' is corrupt
    Error in fetch(key) :
    lazy-load database '/home/rodriguezzude_gmail/R/x86_64-pc-linux-gnu-library/3.4/dplyr/help/dplyr.rdb' is corrupt
    Screenshot_461.png Screenshot_462.png Screenshot_461.png Screenshot_462.png
     
    #10
  11. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    P
    Hi,

    I am not aloud to provide the feedback on this. Please submit the project report and it will be evaluated by a different trainer. Also, please delete this post, as we have kept this project for evaluation, and every learner has to try on his own.

    Regards,
    Samridhi
     
    #11
  12. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    Hi,
    Here is the code to do this:

    library(readxl)
    library(stringr)
    dataset <- read_excel("C:\\Users\\Vaibhav\\Desktop\\BA\\simplilearn\\R\\1.Algeria\\1.Algeria.xlsx", col_names = F)
    head(dataset)

    df = data.frame(1:10)
    head(df)
    df[,c('company_name', 'address', 'address_line2', 'address_line3', 'tel', 'fax', 'email', 'website')] = NA
    isNumeric = function(x)
    {
    val = as.integer(x)
    ifelse(is.na(val),F,T)
    }

    sum(sapply(strsplit(split = "", x = "Samridhi 0:00"), FUN = isNumeric)) #to identify numeric in the company name


    j=0
    i = 1
    while(i < nrow(dataset))
    {
    line_num = 0
    dataset[i,1] = as.character(dataset[i,1])
    split = strsplit(split = "", x = as.character(dataset[i,1]))
    if(toupper(dataset[i,1]) == dataset[i,1] & !grepl(pattern = '[:0-9"]', dataset[i,1]))
    {
    j= j+1
    df[j,'company_name'] = dataset[i+line_num,1]

    line_num = line_num+1
    df[j, 'address'] = dataset[i+line_num,1]

    if(!grepl(pattern = "^Te", x = dataset[i+line_num+1,1]))
    {
    line_num = line_num+1
    df[j, 'address_line2'] = dataset[i+line_num,1]

    if(!grepl(pattern = "^Te", x = dataset[i+line_num+1,1]))
    {
    line_num=line_num+1
    df[j,'address_line3'] = dataset[i+line_num,1]

    }
    }

    if(grepl(pattern = "^Te", dataset[i+line_num+1,1]))
    {
    line_num = line_num + 1
    df[j,"tel"] = dataset[i+line_num,1]
    }

    if(grepl(pattern = "Fax:", dataset[i+line_num+1,1]))
    {
    line_num = line_num + 1
    df[j,"fax"] = dataset[i+line_num,1]
    }

    if(grepl(pattern = "^Email", dataset[i+line_num+1,1]))
    {
    line_num = line_num + 1
    df[j,"email"] = str_extract(string = dataset[i+line_num,1], pattern = "Email.+@.+\\.[a-zA-Z0-9]+?")
    df[j, "website"] = str_extract(string = dataset[i+line_num,1], pattern = "Website.+\\..+?\\..+?\\b")
    }

    if(grepl(pattern = "^Website", dataset[i+line_num+1,1]))
    {
    line_num = line_num + 1
    df[j,"website"] = dataset[i+line_num,1]
    }
    i = i + line_num
    }
    else
    i = i + 1
    }

    df$X1.nrow.dataset.= NULL
    View(df)

    #CLEAN FURTHER USING REGULAR EXPRESSIONS
    df$email = gsub(pattern = "(Website.*)|(Email:)", replacement = "", x = df$email)

    I hope the further data cleaning will be simple.

    Regards,
    Samridhi
     
    #12
    TUSHAR PAHADE likes this.
  13. Samridhi Dutta

    Samridhi Dutta Well-Known Member
    Trainer

    Joined:
    Aug 16, 2017
    Messages:
    165
    Likes Received:
    20
    While installing the package, it asks a lot of questions. Type y as the response.
    Try install.packages("dplyr", dependencies=True) - to force install the dependencies.
    Third it can happen because of lack of space in the download drive.

    Regards,
    Samridhi
     
    #13
    Last edited: Aug 16, 2019
  14. Jude Rodriguez

    Joined:
    Jul 18, 2019
    Messages:
    10
    Likes Received:
    1
    THankyou so much maam. Will apply this code and understand the application for similar tasks.
     
    #14
  15. Jude Rodriguez

    Joined:
    Jul 18, 2019
    Messages:
    10
    Likes Received:
    1
    upload_2019-8-16_20-43-3.png
     
    #15
  16. Jude Rodriguez

    Joined:
    Jul 18, 2019
    Messages:
    10
    Likes Received:
    1
    Yes maam. Have deleted the post. What do I need to put in writeup? Is it how I am going to proceed step by step and which function is used for what prupose? and in the R script, Do i put the code I write in the top left source or the executions code in the bottom left console? and the screenshots will be of the plots or of the bottom left console?
     
    #16
  17. Arunanand Murmu

    Joined:
    Jun 27, 2019
    Messages:
    3
    Likes Received:
    0
    Need Help to complete project at least one. I am confused what to do what not. I because I am new in command based programming.
     
    #17
  18. Vinodkumar_6

    Vinodkumar_6 New Member

    Joined:
    Apr 28, 2019
    Messages:
    1
    Likes Received:
    0
    Hi Mam,

    I am working on healthcare project and stuck on 5th Question, i am showing you some code could you help me in finding out the error

    Model3 <- lm(LOS~AGE+FEMALE_Factor+Race_Factor,data = Hops)
    summary(Model3)

    Result :

    > summary(Model3)

    Call:
    lm(formula = LOS ~ AGE + FEMALE_Factor + Race_Factor, data = Hops)

    Residuals:
    Min 1Q Median 3Q Max
    -3.211 -1.211 -0.857 0.143 37.789

    Coefficients:
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 2.85687 0.23160 12.335 <2e-16 ***
    AGE -0.03938 0.02258 -1.744 0.0818 .
    FEMALE_Factor1 0.35391 0.31292 1.131 0.2586
    Race_Factor2 -0.37501 1.39568 -0.269 0.7883
    Race_Factor3 0.78922 3.38581 0.233 0.8158
    Race_Factor4 0.59493 1.95716 0.304 0.7613
    Race_Factor5 -0.85687 1.96273 -0.437 0.6626
    Race_Factor6 -0.71879 2.39295 -0.300 0.7640
    ---
    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 3.376 on 491 degrees of freedom
    Multiple R-squared: 0.008699, Adjusted R-squared: -0.005433
    F-statistic: 0.6156 on 7 and 491 DF, p-value: 0.7432


    i dont know why these Race_factor1, Race_Factor2 ....... are coming from.

    Please help..
     
    #18
  19. Sujatha Devi KVV

    Joined:
    Jun 30, 2019
    Messages:
    2
    Likes Received:
    0
  20. Arunanand Murmu

    Joined:
    Jun 27, 2019
    Messages:
    3
    Likes Received:
    0
    Can you help me to solve one project
     
    #20
  21. TUSHAR PAHADE

    TUSHAR PAHADE Member

    Joined:
    Jun 29, 2019
    Messages:
    3
    Likes Received:
    0
    Thank you for your reply
     
    #21
  22. Vikas Kumar_18

    Vikas Kumar_18 Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Dec 17, 2018
    Messages:
    163
    Likes Received:
    20
    #22

Share This Page