Data Science with R | Jul 18 - Aug 16 | Pratul Goyal (2020)

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Jul 24, 2020.

  1. Nishant_Singh

    Nishant_Singh Well-Known Member
    Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    282
    Likes Received:
    99
    #1
  2. Omer sharif

    Omer sharif New Member

    Joined:
    Jul 17, 2020
    Messages:
    1
    Likes Received:
    0
    Good morning sir, how are you?
    can you please upload the code please !!!
     
    #2
  3. Jasmin Mathew

    Jasmin Mathew New Member

    Joined:
    Jul 13, 2020
    Messages:
    1
    Likes Received:
    0
    Hi Pratul,

    In the R Studio installed on the lab, the package xlsx is not available. I think it was mentioned that all these packages would be available.
    When i tried to install it using the install.packages it gave me the error No permission to install to directory.
    pls find attached the screenshot
     

    Attached Files:

    #3
  4. Anurag Talati

    Anurag Talati Member

    Joined:
    Jul 17, 2020
    Messages:
    4
    Likes Received:
    0
    hello Pratul Sir,
    When I make a vector name <- c('John','Alison','Claire','Debra','Jack','Reed') along with other vectors age, sex, height, weight, mem and then integrate into data frame, I get class(name) as character instead of factor. Please let me know if they are both same thing or not.
     

    Attached Files:

    #4
  5. PAUL S PARAMBEL

    Joined:
    Dec 13, 2019
    Messages:
    4
    Likes Received:
    0
    Hi sir,
    Hope you are safe.
    Sir, I have a doubt on data normalization. As you said we could assume the data as normal by checking, whether the data is in bell curve or not. But when I did this method my codes are getting repeated, like if I have 16 columns I want to check the those 16 columns with same codes. Is there any single code(function) for an entire data set to check the normalization of data, which improves the quality of my coding.
    THANK YOU,
     
    #5
  6. pratul.goyal111

    pratul.goyal111 Active Member

    Joined:
    Apr 11, 2020
    Messages:
    38
    Likes Received:
    0
    yes
     
    #6
  7. pratul.goyal111

    pratul.goyal111 Active Member

    Joined:
    Apr 11, 2020
    Messages:
    38
    Likes Received:
    0
    shapiro wilk test
     
    #7
  8. Vaibhav Chitransh_1

    Vaibhav Chitransh_1 New Member

    Joined:
    Jul 14, 2020
    Messages:
    1
    Likes Received:
    0
    Hello Pratul,

    While working with Frequencies & Contingencies table, i cam across a doubt, can you please the differene between the code used below and which to use when.

    From the dataset "Car.csv" which you have provided -
    #Frequency
    feq <- table(dataset_example1$Make,dataset_example1$Type)
    View(feq)

    #Contingecy
    cont <- table(dataset_example1$Make,dataset_example1$Type)
    View(cont)

    >The output for both are exact the same.
    >So since the output is coming the same, so how to decide which one to use and when ?

    Screenshot of the output:
    upload_2020-8-2_1-8-24.png upload_2020-8-2_1-9-26.png

    Thank You!
     
    #8
  9. PAUL S PARAMBEL

    Joined:
    Dec 13, 2019
    Messages:
    4
    Likes Received:
    0
    Hi sir,
    Hope you are doing good.
    Still I have doubt on normalization of data in the 5th project(college_admission) there is task to normalize the data. The data is clean with no null values. I have treated the outliers, there contains only 2 numerical datatypes still when check with shapiro wilk test the data is not getting normalized. I dont know whether it is because I haven't got the idea behind normalization as normalization is vast a area. If you suggest me any documentation for normalization of data that would be so helpful for me sir.
    Thank you sir,
     
    #9
  10. PAUL S PARAMBEL

    Joined:
    Dec 13, 2019
    Messages:
    4
    Likes Received:
    0
    Hi sir,
    sorry for disturbing you with the same questions.
    I'am stuck at normalization in the 5th project, even when the mean median are around equal on numerical columns of the datasets after treating outliers while checking the normality test jarque bera test is not giving the satisfying result. Could you please help in this step. I'am attaching the screenshots desnity plot_jarque bera test.JPG box_plot_gpa.JPG mean=median.JPG
     
    #10
  11. Yuvraj Singh Shekhawat

    Joined:
    Sep 21, 2019
    Messages:
    2
    Likes Received:
    0
    Hello Sir,

    Can you please guide me which one is best for Data science, R or Python,
    I am very confuse
     
    #11
  12. pratul.goyal111

    pratul.goyal111 Active Member

    Joined:
    Apr 11, 2020
    Messages:
    38
    Likes Received:
    0
    Both are equally important
     
    #12
  13. Anurag Talati

    Anurag Talati Member

    Joined:
    Jul 17, 2020
    Messages:
    4
    Likes Received:
    0
    Hello Sir,
    I have a question in "Analysis of Sales Report of a Clothes Manufacturing Outlet" assignment. When I merge 2dataset file of each 500 rows, the resultant comes at 550 rows when I merge both data sets by Dress ID ( one being for sales and one for attributed). This is confusing.
    Also should I take total of sales to find which attributes contribute maximum to the sales?
     
    #13

Share This Page