Data Science with R | Shikhar Parashar

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Jul 7, 2019.

  1. Nishant_Singh

    Nishant_Singh Well-Known Member
    Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    222
    Likes Received:
    30
    #1
  2. Vinay Raut

    Vinay Raut Member

    Joined:
    May 13, 2019
    Messages:
    2
    Likes Received:
    0
  3. Vinay Raut

    Vinay Raut Member

    Joined:
    May 13, 2019
    Messages:
    2
    Likes Received:
    0
    i am stuck here..
     
    #3
  4. Vishal Shah_5

    Vishal Shah_5 New Member

    Joined:
    Jun 25, 2019
    Messages:
    1
    Likes Received:
    0
    Hey Nishant, Whats Electives in Master Data Science course
     
    #4
  5. Marni Hiteshwar Chowdary

    Joined:
    Jun 27, 2019
    Messages:
    3
    Likes Received:
    0
    type x and press Enter instead of typing the entire expression again. Let me know if it helps.
    Thanks,
    Hitesh
     
    #5
  6. Taranpreet Singh_2

    Taranpreet Singh_2 New Member

    Joined:
    Jun 29, 2019
    Messages:
    1
    Likes Received:
    0
    Just type x and press enter
     
    #6
  7. _60145

    _60145 Member

    Joined:
    Mar 22, 2019
    Messages:
    9
    Likes Received:
    0
    h
    Hey Vinay, Just type x and click enter
    in the previous step x <- 5+7, x holds value of 12, so inorder to see what x holds, just type x and click enter
     
    #7
  8. Edwin Tekere Odero

    Edwin Tekere Odero New Member

    Joined:
    Jun 19, 2019
    Messages:
    1
    Likes Received:
    0
    I'm stuck here. Please assist.
    upload_2019-7-8_12-21-17.png
     
    #8
  9. _60145

    _60145 Member

    Joined:
    Mar 22, 2019
    Messages:
    9
    Likes Received:
    0
    What is your assignment token?
    what should we give here to get credit
     
    #9
  10. _60145

    _60145 Member

    Joined:
    Mar 22, 2019
    Messages:
    9
    Likes Received:
    0
    @Shikhar Parashar(4707) when i use seq() function , in the below manner
    > seq(5,10, length=30)
    i have got the below output
    [1] 5.000000 5.172414 5.344828 5.517241 5.689655 5.862069 6.034483 6.206897 6.379310
    [10] 6.551724 6.724138 6.896552 7.068966 7.241379 7.413793 7.586207 7.758621 7.931034
    [19] 8.103448 8.275862 8.448276 8.620690 8.793103 8.965517 9.137931 9.310345 9.482759
    [28] 9.655172 9.827586 10.000000

    my question is: what is the logic in generating these 30 numbers. are they random
     
    #10
  11. Tathagat Kishore Mishra

    Joined:
    Jun 14, 2019
    Messages:
    12
    Likes Received:
    8
    Friend we need to READ & READ and then type the commands or statement as per swirl. This tools seems to be comparing the exact input and output.
     
    #11
  12. Tathagat Kishore Mishra

    Joined:
    Jun 14, 2019
    Messages:
    12
    Likes Received:
    8
    It seems it's trying to print length=30 number between 5 and 10, inclusive of both the numbers , ie evenly separated.

    print("We want to find a sequence of length=30 number between 5.000000 to 10.000000")
    print (5.000000)
    temp <- 10-5
    # 29 because we want to also consider both the start and end numbers
    for(x in c(1:29))
    {
    temp = (temp+(5/29))
    print(round(temp,6))
    }

    Other example is -
    > seq(5,10, length=5)
    [1] 5.00 6.25 7.50 8.75 10.00
     
    #12
  13. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    56
    Likes Received:
    8
    Hi Vishal,

    Electives are the optional course that serves as a prerequisite to the main course. But they are optional course and it is for them who are totally new to this field of data science.

    I hope this answers your query.

    Regards,
    Sunny
    Sr. Teaching assistant
     
    #13
  14. _60145

    _60145 Member

    Joined:
    Mar 22, 2019
    Messages:
    9
    Likes Received:
    0
    I am unable to proceed from here, tried multiple times, exiting swirl but still gets the same error, can someone help me
    upload_2019-7-10_22-30-4.png
     
    #14
  15. _60145

    _60145 Member

    Joined:
    Mar 22, 2019
    Messages:
    9
    Likes Received:
    0
    delete # sign in the above window where u see "#x" PFA upload_2019-7-11_12-6-25.png
     
    #15
  16. Subhrajit Pyne

    Joined:
    Jun 24, 2019
    Messages:
    2
    Likes Received:
    0
    I can not get the issue here. Prob_1.JPG
     
    #16
  17. Vikas Kumar_18

    Vikas Kumar_18 Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Dec 17, 2018
    Messages:
    125
    Likes Received:
    19
    It may be the indentation error. Could you please put the last curly braces just after the x. It would work if you assign the values or print function then it would work definitely.

    boring_function <- function(x){print(x)}
    boring_function(5)

    You can check this code also above as reference.
     
    #17
  18. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    56
    Likes Received:
    8
    #18
  19. Tathagat Kishore Mishra

    Joined:
    Jun 14, 2019
    Messages:
    12
    Likes Received:
    8
    Is there a way, I can read .xlsx file and columns are of Factors instead of Characters?
    Not sure if there is error in Self Learning contents as well !!!

    BankCustomer1 <- read.csv("Demo 1_ Identifying Data Structures.csv")
    #View(BankCustomer)
    str(BankCustomer1)
    BankCustomer <- read_excel("tmishra/inputDataSet/Demo 1_Identifying Data Structures.xlsm", stringsasFactors=TRUE)
    str(BankCustomer)
     

    Attached Files:

    #19
  20. Marni Hiteshwar Chowdary

    Joined:
    Jun 27, 2019
    Messages:
    3
    Likes Received:
    0
    Hi Buddies,
    Please help me on sort upload_2019-7-18_20-34-15.png ing out the below error
     
    #20
  21. Tathagat Kishore Mishra

    Joined:
    Jun 14, 2019
    Messages:
    12
    Likes Received:
    8
    Hi Marni,

    There are couple of issue here !!!

    You are trying to assign multiple values to single index,
    It seems d1st seems to be a vector
    dist <- c("JAN", "FEB","MAR") so d1st[4] is NA, Hence you are getting above errors,

    Possible solution may help you-
    > dist <- matrix(c("JAN", "FEB","MAR", "APR","MAY","JUN", "JUL","AUG","SEPT","OCT","NOV","DEC"), 4,3)
    > dist[4][1] "APR"> dist[4,][1] "APR" "AUG" "DEC"> dist[4] <- c(1,2,3)Warning message:In dist[4] <- c(1, 2, 3) : number of items to replace is not a multiple of replacement length> dist[4,] <- c(1,2,3)> dist[4][1] "1"> dist[4,][1] "1" "2" "3"> dist [,1] [,2] [,3]
    [1,] "JAN" "MAY" "SEPT"
    [2,] "FEB" "JUN" "OCT"
    [3,] "MAR" "JUL" "NOV"
    [4,] "1" "2" "3"
    >
     
    #21
  22. Marni Hiteshwar Chowdary

    Joined:
    Jun 27, 2019
    Messages:
    3
    Likes Received:
    0
     
    #22
  23. Rajendra prasad_4

    Joined:
    Jun 27, 2019
    Messages:
    2
    Likes Received:
    2
    Hi,
    Why am i seeing this issue when i am trying to read xls file.
    read.xls("Demo 1_Identifying Data Structures.xlsm")
    Error in findPerl(verbose = verbose) :
    perl executable not found. Use perl= argument to specify the correct path.
    Error in file.exists(tfn) : invalid 'file' argument
    do i need to import perl in my machine?
     
    #23
  24. Rajendra prasad_4

    Joined:
    Jun 27, 2019
    Messages:
    2
    Likes Received:
    2
    I was able to fix the issue installing Perl in my machine and executing below command in R shell
    perl <-"C:/strawberry/perl/bin/perl.exe"
     
    #24
  25. _26955

    _26955 New Member

    Joined:
    Mar 23, 2018
    Messages:
    1
    Likes Received:
    0
    Hi,

    From where I can download the recordings of each session? So far, I could only download the recording of July 7th session through a specific link. Don't we get every recording at one place? Please help!

    Regards,
    Aparna
     
    #25
  26. Kumar Akash_1

    Kumar Akash_1 Member

    Joined:
    Jun 25, 2019
    Messages:
    3
    Likes Received:
    0
    Hi Aparna,

    No need to wait for any link. You can download on your own.

    Please go as below
    Data Science with R--> Live class-->On the Registered class box you will get DOWNLODED RECORDING-->Click on the Downloaded recording then you will get all recorded session till yesterday.

    Regards,
    Akash
     
    #26
  27. _60145

    _60145 Member

    Joined:
    Mar 22, 2019
    Messages:
    9
    Likes Received:
    0
    go to live classes tab, u will find teh recordings
     
    #27
    Last edited: Jul 22, 2019
  28. _60145

    _60145 Member

    Joined:
    Mar 22, 2019
    Messages:
    9
    Likes Received:
    0
    hi shikhar,

    bplot <- barplot(xtabs(~visua_df$Continent), space = FALSE, main = "Countries", col = rainbow(length(visua_df$Continent)))

    here: visua_df is my dataframe, and
    length(visua_df$Continent) is 6

    my question here is, when i have used col = rainbow(6), i have seen different colors to the bars, but when is used
    this: col = rainbow(length(visua_df$Continent), my graph has all the bars set to Red color
     
    #28
  29. Tathagat Kishore Mishra

    Joined:
    Jun 14, 2019
    Messages:
    12
    Likes Received:
    8
    I also faced this issue when Perl was not installed
     
    #29
    Rajendra prasad_4 likes this.
  30. Tathagat Kishore Mishra

    Joined:
    Jun 14, 2019
    Messages:
    12
    Likes Received:
    8
    I am suspecting there is something wrong with visua_df$Continent values. I can see that col=rainbow(X) depends on values of X, if it's high. it paint more than one bar as RED color.

    length(mtcars$cyl)
    #32
    unique(mtcars$cyl)
    # [6,4,8]
    length(unique(mtcars$cyl))
    # 3

    barplot(xtabs(~mtcars$cyl), space = FALSE, main = "Countries", col = rainbow(32)) # gives all red BAR
    barplot(xtabs(~mtcars$cyl), space = FALSE, main = "Countries", col = rainbow(3)) # gives three distinct color

    Can you please share the dataset as .csv or .xls files?
     
    #30
  31. _60145

    _60145 Member

    Joined:
    Mar 22, 2019
    Messages:
    9
    Likes Received:
    0
    PFA.. Dataset file
     

    Attached Files:

    #31
  32. Tathagat Kishore Mishra

    Joined:
    Jun 14, 2019
    Messages:
    12
    Likes Received:
    8
    I was guessing correctly-

    length(visua_df$Continent) #There is too much elements to fit in RAINBOW color !!
    [1] 32109> unique(visua_df$Continent)[1] "OC" "N.America" "AS" "EU"
    [5] "SA" "AF"

    # You need to use length(unique(visua_df$Continent)) to get 6 color
    > bplot <- barplot(xtabs(~visua_df$Continent), space = FALSE, main = "Countries", col = rainbow(length(unique(visua_df$Continent))))
    >
     
    #32
  33. _60145

    _60145 Member

    Joined:
    Mar 22, 2019
    Messages:
    9
    Likes Received:
    0

    Make sense mate...thanks a lot for your explanation n time...
     
    #33
  34. Kumar Akash_1

    Kumar Akash_1 Member

    Joined:
    Jun 25, 2019
    Messages:
    3
    Likes Received:
    0

    Hi Harni,

    Let me understand your query first.

    1. Are you looking to replace the 4th component? If Yes then below will be a possible answer for your query

    dlst[[4]]<-c(1,2,3)

    o/p :

    $months
    [1] "JAN,FEB,MAR"
    $matrix
    [,1] [,2] [,3]

    [1,] 1 3 5

    [2,] 2 4 6
    $msc
    [1] 1

    [[4]]
    [1] 1 2 3

    2. Are u trying to change the name of the 4th component then below will be a possible answer.Here NewName is the new component name

    names(dlst)<-list("month","matrix","msc","NewName")
    > dlst
    $month
    [1] "JAN","FEB","MAR"

    $matrix
    [,1] [,2] [,3]
    [1,] 1 3 5
    [2,] 2 4 6
    $msc
    [1] 1
    $NewName
    [1] 1

    Please post here if you having any doubt.

    Note: Please read more about the [ and [[ operator

    Regards,
    Akash
     
    #34
  35. Shikhar Parashar(4707)

    Alumni

    Joined:
    Feb 13, 2014
    Messages:
    5
    Likes Received:
    2

    Have you tried the stringAsFactor Argument in the read.csv function?
     
    #35
  36. Shikhar Parashar(4707)

    Alumni

    Joined:
    Feb 13, 2014
    Messages:
    5
    Likes Received:
    2
  37. Tathagat Kishore Mishra

    Joined:
    Jun 14, 2019
    Messages:
    12
    Likes Received:
    8
    stringAsFactor works fine for read.csv() but not for read.excel()
     
    #37
  38. Tathagat Kishore Mishra

    Joined:
    Jun 14, 2019
    Messages:
    12
    Likes Received:
    8
    You need to reduce the size of project path(c:\abc\cdf\) in case you get below error on executing head(dataset) in .Rmd file-

    Error in tempfile(pattern = "_rs_rdf_", tmpdir = outputFolder, fileext = ".rdf") : temporary name too long
     
    #38
  39. Kumar Akash_1

    Kumar Akash_1 Member

    Joined:
    Jun 25, 2019
    Messages:
    3
    Likes Received:
    0
    Hi All,Could anyone help me to understand tuning concept with example and how we will interpret the accuracy of the model .
    For example if confusion matrix is showing 82% .what it says with respect to problem statement.

    You can take any example to explain this but for your reference i am attaching one data set where problem statement is
    "To Study a heart disease data set and to model a classifier for predicting whether a patient is suffering from any heart disease or not."
     
    #39
  40. Tathagat Kishore Mishra

    Joined:
    Jun 14, 2019
    Messages:
    12
    Likes Received:
    8
    I am starting this thread to reach to a best solution of this assessments-

    DESCRIPTION

    Background and Objective:
    Every year thousands of applications are being submitted by international students for admission in colleges of the USA. It becomes an iterative task for the Education Department to know the total number of applications received and then compare that data with the total number of applications successfully accepted and visas processed. Hence to make the entire process easy, the education department in the US analyze the factors that influence the admission of a student into colleges. The objective of this exercise is to analyse the same.

    Domain: Education

    Dataset Description:
    Attribute Description
    GRE Graduate Record Exam Scores
    GPA Grade Point Average
    Rank It refers to the prestige of the undergraduate institution.
    The variable rank takes on the values 1 through 4. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest.
    Admit It is a response variable; admit/don’t admit is a binary variable where 1 indicates that student is admitted and 0 indicates that student is not admitted.
    SES SES refers to socioeconomic status: 1 - low, 2 - medium, 3 - high.
    Gender_male Gender_male (0, 1) = 0 -> Female, 1 -> Male
    Race Race – 1, 2, and 3 represent Hispanic, Asian, and African-American

    Analysis Tasks: Analyze the historical data and determine the key drivers for admission.

    Predictive:

    Find the missing values. (if any, perform missing value treatment)
    Find outliers (if any, then perform outlier treatment)
    Find the structure of the data set and if required, transform the numeric data type to factor and vice-versa.
    Find whether the data is normally distributed or not. Use the plot to determine the same.
    Normalize the data if not normally distributed.
    Use variable reduction techniques to identify significant variables.
    Run logistic model to determine the factors that influence the admission process of a student (Drop insignificant variables)
    Calculate the accuracy of the model and run validation techniques.
    Try other modelling techniques like decision tree and SVM and select a champion model
    Determine the accuracy rates for each kind of model
    Select the most accurate model
    Identify other Machine learning or statistical techniques


    Descriptive:
    Categorize the average of grade point into High, Medium, and Low (with admission probability percentages) and plot it on a point chart.
    Cross grid for admission variables with GRE Categorization is shown below:
    GRE Categorized
    0-440 Low
    440-580 Medium
    580+ High
     
    #40
    Rajendra prasad_4 likes this.
  41. Subhrajit Pyne

    Joined:
    Jun 24, 2019
    Messages:
    2
    Likes Received:
    0
    Hello All, what are the steps in Factor Analysis? Please help me. I have got stuck at the 3rd problem statement in Internet project.

    Find out the probable factors from the dataset, which could affect the exits.
    Exit Page Analysis is usually required to get an idea about why a user leaves
    the website for a session and moves on to another one. Please keep in
    mind that exits should not be confused with bounces
     
    #41
  42. Tathagat Kishore Mishra

    Joined:
    Jun 14, 2019
    Messages:
    12
    Likes Received:
    8
    My project has been accepted successfully, got certificate of appreciation after completing assessment, Let me know if any one need any help
     
    #42
    Last edited: Aug 13, 2019

Share This Page