DATA SCIENCE WITH R | Nov 28 - Jan 10 | Sabyasanchi (2020)

Discussion in 'Big Data and Analytics' started by Hitesh H S, Nov 28, 2020.

  1. Arun Hosh

    Arun Hosh Active Member

    Joined:
    Apr 29, 2020
    Messages:
    46
    Likes Received:
    6
    Hi Friends ,
    reg: Health cost project

    For Question 4, what approach is best ( a linear model or just categorize & find sum of charges )

    #QUESTION 4
    #To properly utilize the costs, the agency has to analyze the severity of the hospital costs by age and gender for proper allocation of resources.
     
    #101
  2. Ananya Sharma_1

    Ananya Sharma_1 Active Member

    Joined:
    Oct 7, 2020
    Messages:
    41
    Likes Received:
    1
    Arun, I haven't prepared my model yet, so can't say properly about your result..
    I think your both R^2 values are low & all the independent variables (except Age) have quite high P values & low Estimate values so you can make changes in any of these variables & see if you get better results...

    Also, as you mentioned we don't have to treat outliers in this case so by looking into your output values you can say if variables are affecting dependent variable or not..
     
    #102
    Last edited: Jan 8, 2021
  3. Ananya Sharma_1

    Ananya Sharma_1 Active Member

    Joined:
    Oct 7, 2020
    Messages:
    41
    Likes Received:
    1
    I did this by using the later option which you mentioned because sir also explained in that way.
     
    #103
  4. Ananya Sharma_1

    Ananya Sharma_1 Active Member

    Joined:
    Oct 7, 2020
    Messages:
    41
    Likes Received:
    1
    Yes, I got your point.
    I did age outlier analysis as I wanted to increase the efficiency of model (I'm quite unsure if this works or not)
     
    #104
  5. Arun Hosh

    Arun Hosh Active Member

    Joined:
    Apr 29, 2020
    Messages:
    46
    Likes Received:
    6
    Thanks Ananya , you are giving me more insights & I believe your knowledge in the statistics interpretation is very good.
    I am bit weak in this part,
    can you help me or guide me how to interpret the Summary outcome of the linear model..( using the r^2 value or f-stat or P value ) . really want to know it.... can you light my knowledge in any way ( link or any ppt or ebook etcc )
     
    #105
  6. Arun Hosh

    Arun Hosh Active Member

    Joined:
    Apr 29, 2020
    Messages:
    46
    Likes Received:
    6
    Hi Ananya - Also in the health care project there is a "FEMALE" column in data set, while building a linear model how do we treat this binary data.....
     
    #106
  7. Arun Hosh

    Arun Hosh Active Member

    Joined:
    Apr 29, 2020
    Messages:
    46
    Likes Received:
    6

    Ananya , your thought is good, actually we cant categorize age if we categorize then will it come under linear model,
    also there is an approach which I can think of from your comment, if we can omit 5% data as outlier then we can remove the most least frequency value in age, I mean it is not significant in the model or it make least change to the dependent variable..... do you think it will work
     
    #107
  8. Arun Hosh

    Arun Hosh Active Member

    Joined:
    Apr 29, 2020
    Messages:
    46
    Likes Received:
    6
    Hi Ananya
    I used the same option & got this result , did you also find the same result, ( just wanna see if you we both are doing it on same page "hahah")

    age_Group Gender_Cat Severity_Age_expenditure
    <fct> <chr> <int>
    1 Age group(0-1) Male 408356 ( highest value )
    2 Age group (>10) Female 317568
    3 Age group(0-1) Female 306350
    4 Age group (>10) Male 203045
    5 Age group (6-10) Male 77212
    6 Age group (2-5) Male 46778
    7 Age group (2-5) Female 25569
    8 Age group (6-10) Female 1160
     
    #108
  9. Ananya Sharma_1

    Ananya Sharma_1 Active Member

    Joined:
    Oct 7, 2020
    Messages:
    41
    Likes Received:
    1
    I don't have any knowledge of statistics, I'm doing all these analysis based on what sir taught in the class
     
    #109
  10. Ananya Sharma_1

    Ananya Sharma_1 Active Member

    Joined:
    Oct 7, 2020
    Messages:
    41
    Likes Received:
    1
    I am completely new to this course so I'm not very sure about my way of moving ahead with these problems...Currently I'm just following what sir told in class.
    Sir also had one variable, 'View' (binary data) in his dataset, he converted it into dummy & proceeded with model building.
     
    #110
  11. Harvinder Kaur

    Joined:
    Nov 11, 2020
    Messages:
    2
    Likes Received:
    1
    When i am trying to run recorded live classes sessions then in "network recording converter" tool, it is asking for site url. Can anyone share site url?
     
    #111
  12. Arun Hosh

    Arun Hosh Active Member

    Joined:
    Apr 29, 2020
    Messages:
    46
    Likes Received:
    6
    Hi All ,

    Hospital project 3rd question.

    #QUESTION 3
    #To make sure that there is no malpractice, the agency needs to analyze if the race of the patient is related to the hospitalization costs.

    I ran annova & got below result

    > summary(race_annova)
    Df Sum Sq Mean Sq F value Pr(>F)
    as.factor(health$RACE) 5 1.859e+07 3718656 0.244 0.943
    Residuals 493 7.524e+09 15260687


    Below is my interpretation :
    #Annova results shows that the race definitely has relation to the Hospitalization cost ,Higher P value shows that the hospitalization cost is definitely not equally distributed among various race categories


    is this right interpretation, plz comment
     
    #112
  13. Harvinder Kaur

    Joined:
    Nov 11, 2020
    Messages:
    2
    Likes Received:
    1
    I am trying to play live class recordings. Installed Network Recording Player. Do i need to convert downloaded session to Mp4 version? While doing so, it asks for siteURL. In SiteURL, it is asking to input meetingcenter.webex.com. Please suggest what would be meetingcenter here?
     
    #113
    NANDINI SINHA likes this.
  14. Shashank AV

    Shashank AV Member

    Joined:
    Nov 20, 2020
    Messages:
    2
    Likes Received:
    0
    Hello All,
    im new here so forgive me if i say something stupid. and i do need some help with the project. I was doing the Health care analysis one. So the 1st question says "to find the age category of people who frequently visit the hospital and has the maximum expenditure.". so i read the data set, selected the datas , arranged by age and the Hospital discharge costs(Totchg) and grouped by the age. Then i just did the summarize of the age group with Totchg. and for the requently visit the hospital i just take count ? can someone help. thanks.

    Later Edit :
    i used the aggregate function and the count function. but i used 2 different line of code for both so how do i join the aggregate and the count in one line of code?

    aggregate(TOTCHG~AGE,my_df,sum)
    count(my_df,AGE)

    i would like to show the count and the aggregate in one table to make it more neat .
     
    #114
    Last edited: Jan 11, 2021
  15. Arun Hosh

    Arun Hosh Active Member

    Joined:
    Apr 29, 2020
    Messages:
    46
    Likes Received:
    6

    Hi Harvinder, To find the frequency of the ages we can simply make a simple histogram to represent or you can use the count function in "plyr package "
    to calculate which age group has max expenditure , categorize the age into 4 or 5 categories using cut function & then group by age category & make summary based on the sum of total charges..
    this will give you the result...
     
    #115
  16. Shashank AV

    Shashank AV Member

    Joined:
    Nov 20, 2020
    Messages:
    2
    Likes Received:
    0
    Hello,
    for the final question in the Healthcare cost analysis, " To perform a complete analysis, the agency wants to find the variable that mainly affects hospital costs. ",
    is it enough we find and correlation ?
    cr = cor(my_df)
    and then just display it.
    and then just do a corrplot ? or do we make a heatmap ?
     
    #116
  17. Arpita Mitra_1

    Alumni

    Joined:
    Jul 14, 2017
    Messages:
    9
    Likes Received:
    0
    Hi Arun, I used Liner regression model here.
     
    #117
  18. Arun Hosh

    Arun Hosh Active Member

    Joined:
    Apr 29, 2020
    Messages:
    46
    Likes Received:
    6
    Hi Arpita - How did the model perform , can you share the summary o\p & how did you manipulate the data
     
    #118
  19. Arun Hosh

    Arun Hosh Active Member

    Joined:
    Apr 29, 2020
    Messages:
    46
    Likes Received:
    6
    Try making a heat map
     
    #119
  20. Arpita Mitra_1

    Alumni

    Joined:
    Jul 14, 2017
    Messages:
    9
    Likes Received:
    0
    Hi Arun ..p value is less in this model, here I attached my Q4. summery for ur reference.
     

    Attached Files:

    #120

Share This Page