Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

DATA SCIENCE WITH Python | Jan 09 - Feb 13 | Srikanth (2021)

srikanthdakoju

Well-Known Member
Hi All,
I Uploaded the PPT that I used in the class. Also, I uploaded the assignment on central tendencies for your practice.
 

Ahmed_184

Member
Hi all,
any one has problem accessing the previous classes? i don't see the course anymore. Even the classes that we registered them under Cohort 3, now i'm not registered for them. Anyone has same problem?
 
I WANT TO KNOW THE ERROR IN THE FOLLOWING PYTHON CODE

#Plot Major complaint type Heating against location type to check for any pattern
plt.figure(figsize=(3,3))
plt.scatter(grp_data['Complaint Type'],grp_data['Location Type'])
plt.title='Plot complaint type Heating against location type'
plt.xlabel='Complaint Type'
plt.ylabel='Location Type'
plt.show()
 

Sam James_1

New Member
The error "AttributeError: module 'matplotlib' has no attribute 'imshow'" can be resolved using:
import matplotlib.pyplot as plt

I used
import matplotlib as plt
at first which caused the issue.

Please ignore if already known.
 
Hello Srikanth,

The standard deviation calculation for the attached could be wrong. Can you please explain how you got the mean standard deviation as 0.1? Please find the attached.
 

Attachments

  • Screenshot 2021-01-27 at 5.01.24 PM.png
    Screenshot 2021-01-27 at 5.01.24 PM.png
    176 KB · Views: 8
  • Screenshot 2021-01-27 at 5.01.04 PM.png
    Screenshot 2021-01-27 at 5.01.04 PM.png
    113.6 KB · Views: 5
Hi Srikanth,

On the class of January 16th, the video time at 3:42:00, we did yoga guru problem. Can you please let me know why we checked the 95% confidence level in t table instead of 5% significant level? Attached page for your reference.
 

Attachments

  • Screenshot 2021-01-27 at 6.41.21 PM.png
    Screenshot 2021-01-27 at 6.41.21 PM.png
    113.7 KB · Views: 5
Hi Srikanth,

On the class on January 16th, the problem of casino for chi squared test, why did you choose alpha value as 5%? Please find the attached for your reference.
 

Attachments

  • Screenshot 2021-01-29 at 6.24.41 PM.png
    Screenshot 2021-01-29 at 6.24.41 PM.png
    202.9 KB · Views: 8

srikanthdakoju

Well-Known Member
Hi Srikanth,

On the class on January 16th, the problem of casino for chi squared test, why did you choose alpha value as 5%? Please find the attached for your reference.

If they don't mention anything you can assume that the significance level is 5%. Because it is common to practise to go for 5% significance
 

_85674

New Member
I am trying to write a query on the service request where I am try to filter the rows where the complaint status is equal to Closed
df2 = df2.query('Status==Closed')
I am getting error
UndefinedVariableError: name 'Closed' is not defined.
Does anyone tried to extract the same
 
For Q4 of the Customer Services project, do we need to group by 'Location Type' and 'Complaint Type' and then sort by Request_Closing_Time? If so, how do we sort with in a group?
 
In Q5, t-test, the complaint type has more than two values. How to calculate the diff of mean, as there are more than two values?
 
For Q4 of the Customer Services project, do we need to group by 'Location Type' and 'Complaint Type' and then sort by Request_Closing_Time? If so, how do we sort with in a group?

For Q4 of the Customer Services project, do we need to group by 'Location Type' and 'Complaint Type' and then sort by Request_Closing_Time? If so, how do we sort with in a group?
Hello Aseem , I have done something like below .
step 1 : We need to calculate the time differenece in seconds (only for calculation , I believe we can ommit this step as well)

df['Request_Closing_Time_insec'] = df['Request_Closing_Time'].dt.total_seconds()
df.head()

step 2:

group = df[['Complaint Type','Request_Closing_Time_insec','Location Type']].groupby(['Complaint Type','Location Type'])
group.mean()

and this is the result I got .

View attachment 13806
 
Or , we can also use as below :

# Sequence of groupby value is changed
group = df[['Complaint Type','Request_Closing_Time_insec','Location Type']].groupby(['Location Type','Complaint Type'])
group.mean()

And the below result :

1612358606001.png
 
In Q5, t-test, the complaint type has more than two values. How to calculate the diff of mean, as there are more than two values?
Srikanth, I'm not able to proceed on Q5 as I don't know how to perform t-test on multiple groups.. We have learnt only two-sample t-test. How do I calculate mead difference between different groups when they are more than 2?
 
Or , we can also use as below :

# Sequence of groupby value is changed
group = df[['Complaint Type','Request_Closing_Time_insec','Location Type']].groupby(['Location Type','Complaint Type'])
group.mean()

And the below result :

View attachment 13807
Thanks Bibaswan, the output in the table above doesn't have the closing time sorted within each group which is what I am also struggling with. I believe the question is asking to sort the closing time with in each location type/complaint type bucket.
 
Hi All,

Did any one completed Q5 , If yes please help me i have few clarification
  • Whether the average response time across complaint types is similar or not (overall)
Based on average response time we can say rejecting NULL hypothesis

we have total 24 complaint types , hence we can go with T-Test to identify P-Value

In T-Test we have different kind of T-Tests, Can you please help which T-Test we can use to identify P-Value

Thanks
Lakshmipic.png
 

Satyam Suman

Member
Simplilearn Support
Alumni
Hi All,

Did any one completed Q5 , If yes please help me i have few clarification
  • Whether the average response time across complaint types is similar or not (overall)
Based on average response time we can say rejecting NULL hypothesis

we have total 24 complaint types , hence we can go with T-Test to identify P-Value

In T-Test we have different kind of T-Tests, Can you please help which T-Test we can use to identify P-Value

Thanks
LakshmiView attachment 13914
You have to go with One way ANOVA test . You need to use scipy.stats or you can also look for pingouin. It's one liner in ping
 
Hello sir,

i was watching past lectures and downloaded some of the data set and assignment provided but its coming in 7z format and when trying to open it it shows" cannot open the file ".ipynb as archive...how to open it so that i can practice
olympic data files are also coming as 7 z not excel ..how to load it in anaconda notebook
 

Attachments

  • SS.PNG
    SS.PNG
    4.7 KB · Views: 2
  • ss2.PNG
    ss2.PNG
    11.3 KB · Views: 2
Top