Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Data Science with Python | Rahul Aggarwal

Hi Rahul,

I have doubt in 1 sample t-test. My t-sats value is coming as -367. Please look into this. Attaching the file.
 

Attachments

  • 1. Statistics Session - 3 Hypothesis Testing.pdf
    201.4 KB · Views: 19

Mrudula Bhimavarapu

Active Member
Hi Renuka,

First of all for the quarter growth we need to filter out data only for Q3’2012. Hence as discussed in class with Rahul, we will filter the data for 2012 Q3 and 2012 Q2. Then we will add the total sales for these 2 quarters for every store and subtract it. : 2012 Q3 - 2012Q2

Then after subtracting, we will divide it with 2012Q2. This will give us the rate of growth.

In your data above, right now you have data for all the quarters of all the years. Please filter it first and then proceed.

This is my understanding for the question. Please correct me if I am wrong. I am also working in the same project right now,

Regards,
Shreha.


Hello folks

i guess different ways of interpretation -- " Which store/s has good quarterly growth rate in Q3’2012 " ie, identify the store which is has highest sales in Q3 2012. after reading the question couple of times, that is how i interpreted the question though :) and i stand corrected

regards
Mrudula
 

_12759

Active Member
But how to filter the data?as when we use DataFrame Columns converted into index...
Hi Payal,

I did the below steps:

1. First filter the data for the quarters '2012Q2' and '2012Q3':

sales_Q = df_sales.loc[(df_sales['quarter'] == '2012Q2') | (df_sales['quarter'] == '2012Q3') ]
sales_Q

[Out]
Store Date Weekly_Sales Holiday_Flag Temperature Fuel_Price CPI Unemployment date_new quarter
113
1 06-04-2012 1899676.88 0 70.43 3.891 221.435611 7.143 2012-04-06 2012Q2
114 1 13-04-2012 1621031.70 0 69.07 3.891 221.510210 7.143 2012-04-13 2012Q2
115 1 20-04-2012 1521577.87 0 66.76 3.877 221.564074 7.143 2012-04-20 2012Q2

2. Then I summed a weekly sales store wise for these 2 quarters and used a 'unstack' function :

Qtr3_sales = pd.DataFrame(sales_Q.groupby(['Store','quarter'])['Weekly_Sales'].sum().unstack()).reset_index().rename_axis(None, axis=1)

[Out]
Store
2012Q2 2012Q3
0
1 20978760.12 20253947.78
1 2 25083604.88 24303354.86
2 3 5620316.49 5298005.47

3. Now in order to get the total quarter growth we need to subtract the 2 columns ['2012Q3 - 2012Q2] and then divide the result by 2012Q2.

I am right now stuck at step3, as I am unable to access the columns above. I have posted my doubt in a post above too.

This is all my understanding and as discussed with Rahul in class.

Hope this helps.

Thanks and Regards,
Shreha.
 

PAYAL_27

Member
Hi Payal,

I did the below steps:

1. First filter the data for the quarters '2012Q2' and '2012Q3':

sales_Q = df_sales.loc[(df_sales['quarter'] == '2012Q2') | (df_sales['quarter'] == '2012Q3') ]
sales_Q

[Out]
Store Date Weekly_Sales Holiday_Flag Temperature Fuel_Price CPI Unemployment date_new quarter
113
1 06-04-2012 1899676.88 0 70.43 3.891 221.435611 7.143 2012-04-06 2012Q2
114 1 13-04-2012 1621031.70 0 69.07 3.891 221.510210 7.143 2012-04-13 2012Q2
115 1 20-04-2012 1521577.87 0 66.76 3.877 221.564074 7.143 2012-04-20 2012Q2

2. Then I summed a weekly sales store wise for these 2 quarters and used a 'unstack' function :

Qtr3_sales = pd.DataFrame(sales_Q.groupby(['Store','quarter'])['Weekly_Sales'].sum().unstack()).reset_index().rename_axis(None, axis=1)

[Out]
Store
2012Q2 2012Q3
0
1 20978760.12 20253947.78
1 2 25083604.88 24303354.86
2 3 5620316.49 5298005.47

3. Now in order to get the total quarter growth we need to subtract the 2 columns ['2012Q3 - 2012Q2] and then divide the result by 2012Q2.

I am right now stuck at step3, as I am unable to access the columns above. I have posted my doubt in a post above too.

This is all my understanding and as discussed with Rahul in class.

Hope this helps.

Thanks and Regards,
Shreha.
Hi,Thank you for your help..But I already resolved this..Now I need to confirm the answer whether I have done it correctly.
 

Renuka Badduri

Active Member
Hi All,

Need an understanding that point #1, after we calculated Standard deviation in project 3_Walmart_sales.

Do we need to calculate the coefficient of mean to standard deviation?

Regards,
Renuka
 
Hello everyone,

I am doing NYC complaint response time project. I am stuck with these two questions:
  1. Whether the average response time across complaint types is similar or not (overall)............I GROUPED COMPLAINED TYPE ACCORDING TO MEAN REQUEST CLOSING TIME....BUT I AM NOT ABLE TO UNDERSTAND WHAT STATISTICAL TEST SHOULD I APPLY...IF T-TEST...HOW?

  2. Are the type of complaint or service requested and location related?........WHAT STATISTICAL TEST WILL BE USED HERE
Could someone please guide me and provide some hint for solving these questions?
 

PAYAL_27

Member
##Separated holiday and non-holiday data

Holiday_data=pd.DataFrame(data[data['Holiday_Flag']==1]).reset_index(drop=True)
nonHoliday_data=pd.DataFrame(data[data['Holiday_Flag']==0]).reset_index(drop=True)

#Calculated the mean for non-holiday data Storewise...

df1=pd.DataFrame(nonHoliday_data.groupby('Store')['Weekly_Sales'].mean())

#Performed join on both..

df3=df1.join(Holiday_data,on='Store',rsuffix='_holiday').drop(['Holiday_Flag','Store','Qtr','Date'],axis=1).reset_index(drop=True)
df3.head(10)

#How to check the condition of Question 4. Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together..

If anyone can help and tell whether the above code is correct and how we can perform the next step?
Thank you in advance.
 

_12759

Active Member
##Separated holiday and non-holiday data

Holiday_data=pd.DataFrame(data[data['Holiday_Flag']==1]).reset_index(drop=True)
nonHoliday_data=pd.DataFrame(data[data['Holiday_Flag']==0]).reset_index(drop=True)

#Calculated the mean for non-holiday data Storewise...

df1=pd.DataFrame(nonHoliday_data.groupby('Store')['Weekly_Sales'].mean())

#Performed join on both..

df3=df1.join(Holiday_data,on='Store',rsuffix='_holiday').drop(['Holiday_Flag','Store','Qtr','Date'],axis=1).reset_index(drop=True)
df3.head(10)

#How to check the condition of Question 4. Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together..

If anyone can help and tell whether the above code is correct and how we can perform the next step?
Thank you in advance.

Hi Payal,
The question says - 'Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together'.
What I understood from this question is and what I have done is:
1. separated the holiday data and then further separated that dataset on 'superbowl','laborday','thanksgiving' and 'christmas'.
2. Then I calculated the total sales for all these holidays separately and stored them in different variables.
3. Then I calculated the mean of the total weekly sales of the Non-holidays dataset.
4. Then I compared the total sales for each holiday with the mean I got from step 3.

But after doing that, I could not see any negative impact of the holidays on the sales. All the sales for all the holidays above were greater than the mean sales.

Hence I am not very sure of what I did was correct. But I could not think of any other way as of now.

Other batch members please help with your understanding too.

Thanks and Regards,
Shreha.
 

_12759

Active Member
Hello everyone,

I am doing NYC complaint response time project. I am stuck with these two questions:
  1. Whether the average response time across complaint types is similar or not (overall)............I GROUPED COMPLAINED TYPE ACCORDING TO MEAN REQUEST CLOSING TIME....BUT I AM NOT ABLE TO UNDERSTAND WHAT STATISTICAL TEST SHOULD I APPLY...IF T-TEST...HOW?

  2. Are the type of complaint or service requested and location related?........WHAT STATISTICAL TEST WILL BE USED HERE
Could someone please guide me and provide some hint for solving these questions?
Hi Ramanpreet,

Since you are doing this project, could you please help me with the part, as to how to calculate the 'total Response time'??
 

Mrudula Bhimavarapu

Active Member
Hi Payal,
The question says - 'Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together'.
What I understood from this question is and what I have done is:
1. separated the holiday data and then further separated that dataset on 'superbowl','laborday','thanksgiving' and 'christmas'.
2. Then I calculated the total sales for all these holidays separately and stored them in different variables.
3. Then I calculated the mean of the total weekly sales of the Non-holidays dataset.
4. Then I compared the total sales for each holiday with the mean I got from step 3.

But after doing that, I could not see any negative impact of the holidays on the sales. All the sales for all the holidays above were greater than the mean sales.

Hence I am not very sure of what I did was correct. But I could not think of any other way as of now.

Other batch members please help with your understanding too.

Thanks and Regards,
Shreha.

Hi Shreha

yes correct even i did the same steps as you mentioned above and found that the sales on holidays are greater than the mean sales of non holidays. hence no negative impact of the holidays on the sales

regards
mrudula
 

Renuka Badduri

Active Member
Hi Shreha

yes correct even i did the same steps as you mentioned above and found that the sales on holidays are greater than the mean sales of non holidays. hence no negative impact of the holidays on the sales

regards
mrudula

Hi Mrudula/Shreha

Thanks for sharing your thoughts to give more insight into it.

I had a doubt on calculating Non-Holidays season sales for all stores. Will it includes all non-holidays weekly sales for all Stores? or will it be any period of weekly sales under non-holidays?

I am confusing about word 'Season'. Please help to clarify me.

Regards,
Renuka.
 

Mrudula Bhimavarapu

Active Member
Hi Mrudula/Shreha
-
Thanks for sharing your thoughts to give more insight into it.

I had a doubt on calculating Non-Holidays season sales for all stores. Will it includes all non-holidays weekly sales for all Stores? or will it be any period of weekly sales under non-holidays?

I am confusing about word 'Season'. Please help to clarify me.

Regards,
Renuka.

Hi Renuka

i second you completely, the questions have been drafted very trickily and we get confused very easily.

coming back to your question , it would include weekly sales for all stores on all non holidays.
below is how i have shown the output :
On this Date " 07- 09-2012", Holiday Sales is greater than Non Holiday Sales
hope this helps

regards
Mrudula
 

_12759

Active Member
Hi Renuka

i second you completely, the questions have been drafted very trickily and we get confused very easily.

coming back to your question , it would include weekly sales for all stores on all non holidays.
below is how i have shown the output :
On this Date " 07- 09-2012", Holiday Sales is greater than Non Holiday Sales
hope this helps

regards
Mrudula

Hi Mrudula,

I did not understand the question from Renuka above.
Also, how are you able to calculate sales for a particular date? Because in the dataset, we just have the weekly sales for the non-holidays and holidays for all stores. All the weeks are starting/ending on a friday. Even all the holiday dates they have given are also on fridays.

Please clarify. :)

Thanks and Regards,
Shreha.
 

Renuka Badduri

Active Member
Hi Mrudula,

I did not understand the question from Renuka above.
Also, how are you able to calculate sales for a particular date? Because in the dataset, we just have the weekly sales for the non-holidays and holidays for all stores. All the weeks are starting/ending on a friday. Even all the holiday dates they have given are also on fridays.

Please clarify. :)

Thanks and Regards,
Shreha.

Hi Shreha,

I think she mentioned "Weekly sales" of all stores for non-holidays is talking about "Weekly_Sales" column values.
I hope this helps to clarify.

@Mrudula: please correct me if my understanding is wrong.



Regards,
Renuka.
 

Mrudula Bhimavarapu

Active Member
Hi Shreha,

I think she mentioned "Weekly sales" of all stores for non-holidays is talking about "Weekly_Sales" column values.
I hope this helps to clarify.

@Mrudula: please correct me if my understanding is wrong.



Regards,
Renuka.


Hi Renuka,

yes thats correct i was referring to "Weekly_Sales" from the data set.

@ Shreha

i shall try to explain with this one :
in the question they have provided the holiday dates for Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13
corresponding Dataset :
Store Date Weekly_Sales Holiday_Flag Temperature Fuel_Price CPI Unemployment
1 12-02-2010 1641957.44 1 38.51 2.548 211.2421698 8.106
1 11-02-2011 1649614.93 1 36.39 3.022 212.9367046 7.742
1 10-02-2012 1802477.43 1 48.02 3.409 220.2651783 7.348

so the logic to sum all sales for holiday (say for super bowl) :
1. store all holidays in one DF
# Stores Holiday Sales
stores_holiday_sales = walmart_data[walmart_data['Holiday_Flag'] == 1]

2.
stores_holiday_sales_superBowl = stores_holiday_sales[(pd.to_datetime(stores_holiday_sales['Date']) =
= pd.to_datetime('12-02-2010')) |(pd.to_datetime(stores_holiday_sales['Date']) == pd.to_datetime('11-
02-2011'))|(pd.to_datetime(stores_holiday_sales['Date']) == pd.to_datetime('10-02-2012'))|(pd.to_date
time(stores_holiday_sales['Date']) == pd.to_datetime('08-02-2013'))]

hope this helps !!!

regards
Mrudula
 

PAYAL_27

Member
Hi Payal,
The question says - 'Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together'.
What I understood from this question is and what I have done is:
1. separated the holiday data and then further separated that dataset on 'superbowl','laborday','thanksgiving' and 'christmas'.
2. Then I calculated the total sales for all these holidays separately and stored them in different variables.
3. Then I calculated the mean of the total weekly sales of the Non-holidays dataset.
4. Then I compared the total sales for each holiday with the mean I got from step 3.

But after doing that, I could not see any negative impact of the holidays on the sales. All the sales for all the holidays above were greater than the mean sales.

Hence I am not very sure of what I did was correct. But I could not think of any other way as of now.

Other batch members please help with your understanding too.

Thanks and Regards,
Shreha.

Thank you,I will try again
 

_12759

Active Member
Hi Renuka,

yes thats correct i was referring to "Weekly_Sales" from the data set.

@ Shreha

i shall try to explain with this one :
in the question they have provided the holiday dates for Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13
corresponding Dataset :
Store Date Weekly_Sales Holiday_Flag Temperature Fuel_Price CPI Unemployment
1 12-02-2010 1641957.44 1 38.51 2.548 211.2421698 8.106
1 11-02-2011 1649614.93 1 36.39 3.022 212.9367046 7.742
1 10-02-2012 1802477.43 1 48.02 3.409 220.2651783 7.348

so the logic to sum all sales for holiday (say for super bowl) :
1. store all holidays in one DF
# Stores Holiday Sales
stores_holiday_sales = walmart_data[walmart_data['Holiday_Flag'] == 1]

2.
stores_holiday_sales_superBowl = stores_holiday_sales[(pd.to_datetime(stores_holiday_sales['Date']) =
= pd.to_datetime('12-02-2010')) |(pd.to_datetime(stores_holiday_sales['Date']) == pd.to_datetime('11-
02-2011'))|(pd.to_datetime(stores_holiday_sales['Date']) == pd.to_datetime('10-02-2012'))|(pd.to_date
time(stores_holiday_sales['Date']) == pd.to_datetime('08-02-2013'))]

hope this helps !!!

regards
Mrudula


Hi Mrudula,

I have done the same thing too... :)
These questions are very confusing.

Regards,
Shreha.
 

PAYAL_27

Member
Hi Payal,
The question says - 'Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together'.
What I understood from this question is and what I have done is:
1. separated the holiday data and then further separated that dataset on 'superbowl','laborday','thanksgiving' and 'christmas'.
2. Then I calculated the total sales for all these holidays separately and stored them in different variables.
3. Then I calculated the mean of the total weekly sales of the Non-holidays dataset.
4. Then I compared the total sales for each holiday with the mean I got from step 3.

But after doing that, I could not see any negative impact of the holidays on the sales. All the sales for all the holidays above were greater than the mean sales.

Hence I am not very sure of what I did was correct. But I could not think of any other way as of now.

Other batch members please help with your understanding too.

Thanks and Regards,
Shreha.
Hi,Thank you for the help.But Can you please tell how we should compare the values asked in Step4?
 
import matplotlib.pyplot as plt
plt.plot(monthly_wise_sales['new_date'],monthly_wise_sales['Weekly_Sales'])

this line is give an error:
TypeError: float() argument must be a string or a number, not 'Period'

I am converting date into string but this giving error


Remember that i have to use matplotlib library.....as this is generating plot using pandas library but i have to use matplotlib
 

_12759

Active Member
Hello Shreha, Ramanpreet

after ydays session, i have picked up this project (NYC311) and almost done. please do let me know if you need clarification on the same

regards
mrudula

Yeah Mrudula,
I too am done with 80% of the project. Just the hypothesis testing is pending. so working on that right now.

COuld you confirm the answer for this question:

# Order the complaint types based on the average ‘Request_Closing_Time’, grouping them for different locations.

I got the below answer:


Out[45]:

Location Type Complaint Type Processing Time Float
Park
Animal in a Park 14.034780
Roadway Tunnel Derelict Vehicle 0.748507
Street/Sidewalk Graffiti 0.501563
Highway Derelict Vehicle 0.341488


and then I plot it.

Regards,
Shreha.
 

_12759

Active Member
Hi,Thank you for the help.But Can you please tell how we should compare the values asked in Step4?



Hi Payal,
I have used separate If statements for the comparison.

if (superbowl_Tsales > mean_weekly_sales):
print("Positive impact of SuperBowl on sales")
else:
print("Negative impact of SuperBowl on sales")

Regards,
Shreha.
 

Mrudula Bhimavarapu

Active Member
Yeah Mrudula,
I too am done with 80% of the project. Just the hypothesis testing is pending. so working on that right now.

COuld you confirm the answer for this question:

# Order the complaint types based on the average ‘Request_Closing_Time’, grouping them for different locations.

I got the below answer:


Out[45]:

Location Type Complaint Type Processing Time Float
Park
Animal in a Park 14.034780
Roadway Tunnel Derelict Vehicle 0.748507
Street/Sidewalk Graffiti 0.501563
Highway Derelict Vehicle 0.341488


and then I plot it.

Regards,
Shreha.

Hi Shreha

this is how i done it
# step 5 Group by City, Complain Type and showing average of Request Closing in Hour
df_nyc311_grouped = df_nyc311.groupby(['City','Complaint Type']).agg({'Request_Closing_In_Hr': 'mean'})
print(df_nyc311_grouped)

and the output is

upload_2020-5-16_17-0-7.png
 

PAYAL_27

Member
Hi Payal,
I have used separate If statements for the comparison.

if (superbowl_Tsales > mean_weekly_sales):
print("Positive impact of SuperBowl on sales")
else:
print("Negative impact of SuperBowl on sales")

Regards,
Shreha.
Thank you,Thanx alot
 

Rahul_Aggarwal

Active Member
Alumni
Hello Amigos!!

I hope everyone finds this message in good health! Also I'm assuming that you too are working hard on this weekend to finish off the pending assignments/projects/tasks just like me. Just for your information -

(1.) I have uploaded the Walmart Retail Case Study Jupyter Notebook that we discussed during the class -
a. Zip Folder Name on Google Drive : 17. Machine Learning Project -1 Walmart Retail [Revised]
b. Made some code changes, Please go through it

(2.) Have also uploaded a very interesting book today in the E-book folder. This would help you to build/improve intuition around lot of data science & machine learning concepts such as Decision Tree, K-means, Neural Network etc.
a. Books Link - https://drive.google.com/open?id=1E5RAnkw4SD9J56gOGzCU9LumgVeqTcbk
b. A chapter takes hardly 15-20 minutes to complete. No prerequisite is required for this.

Have a happy learning! :)

Regards,
Rahul A
 

Renuka Badduri

Active Member
Hi All,

I have resolved Quarter 3 high growth sales. Highest growth rate 15.773764 with Store # 14 and so on..
Is my solution correct? please reply here.

Regards,
Renuka.

Hi All,

I had given wrong result above due to variables taken.
now i got the result is : Store 7 has highest.
Weekly_Sales_Q2 Weekly_Sales_Q3 Q3-Q2 % Sales
Store
7
7290859.27 8262787.39 971928.12 13.330776
16 6564335.98 7121541.64 557205.66 8.488378
35 10838313.00 11322421.12 484108.12 4.466637
26 13155335.57 13675691.91 520356.34 3.955478
39 20214128.46 20715116.23 500987.77 2.478404

Please confirm.
 

Renuka Badduri

Active Member
Hi Shreha/Mrudula,

I got results for step 4 like below:
Are these figures matched with your numbers? please let me know.

Super Bowl Sales 145682278.34 are higher than Avg non holiday sales 1041256.3802088564
Labour Day Sales 140727684.68 are higher than Avg non holiday sales 1041256.3802088564
Thanksgiving Sales 132414608.5 are higher than Avg non holiday sales 1041256.3802088564
Christmas Sales 86474980.03999999 are higher than Avg non holiday sales 1041256.3802088564

Regards,
Renuka
 
Hello...i have one query please...suppose i have SLA data in below format as an example & i want to build insights & do the visualization. There are more than 20 SLA components for the given time range.

Can anyone please advise how we can perform below:
1. Visualization on below dataset. Is Barplot only the option?
2. Which model prediction can we use on below data to predict future count of SLA KPIs with the fact that there are different priority ticket numbers in different months?
3. Any other views pls.

upload_2020-5-17_0-49-52.png

Thanks,
Sandeep
 

Rahul_Aggarwal

Active Member
Alumni
Hi Rahul,

I have doubt in 1 sample t-test. My t-sats value is coming as -367. Please look into this. Attaching the file.

Hello Kanika,

I have gone through your code and found the mistake -
a. Incorrect : m=hr_attr.average_montly_hours.std() : you are computing standard deviation here instead of mean
b. Correct : m=hr_attr.average_montly_hours.mean()
s=hr_attr.average_montly_hours.std()
(m-200)/(s/np.sqrt(14999))
It should return the correct T- statistics value. Use this T-stat value to get the P-value.

Hope this helps!
 

_12759

Active Member
Hi Shreha/Mrudula,

I got results for step 4 like below:
Are these figures matched with your numbers? please let me know.

Super Bowl Sales 145682278.34 are higher than Avg non holiday sales 1041256.3802088564
Labour Day Sales 140727684.68 are higher than Avg non holiday sales 1041256.3802088564
Thanksgiving Sales 132414608.5 are higher than Avg non holiday sales 1041256.3802088564
Christmas Sales 86474980.03999999 are higher than Avg non holiday sales 1041256.3802088564

Regards,
Renuka

Hi Renuka,
I have got the exact same results. :)
Regards,
Shreha.
 

_12759

Active Member
Hi Shreha

this is how i done it
# step 5 Group by City, Complain Type and showing average of Request Closing in Hour
df_nyc311_grouped = df_nyc311.groupby(['City','Complaint Type']).agg({'Request_Closing_In_Hr': 'mean'})
print(df_nyc311_grouped)

and the output is

View attachment 9484

Great Mrudula.. I have also got the exact same result. Since the 'Location type' in the question was confusing, I had grouped on both the city, and also by the 'Location Type'

Regards,
Shreha
 

_12759

Active Member
Great Mrudula.. I have also got the exact same result. Since the 'Location type' in the question was confusing, I had grouped on both the city, and also by the 'Location Type'

Did u understand how to do the Hypothesis Testing?

Regards,
Shreha
 

_12759

Active Member
Shreha- Thanks for confirming.
Are you done with Walmart project? I am stuck at plotting monthly and semester results. Any idea?
Hi Renuka,

Yeah.. I have done this but not very sure about it. Since they are asking for monthly and semester sales, I was not sure whether they wanted it store wise or for the whole dataset. So, I tried for both.

You need to retrieve the month, and the semester from the weekly sales , and then do a groupby.

monthly_sales = df_sales.groupby(['Store','month'])['Weekly_Sales'].sum()
monthly_sales

Store month
1 1 11203741.49
2 19505306.58
3 20380666.86
4 21623140.34
5 18505332.90

and the same way for semester.

And then I used a line plot to plot them.

Let me know what u are stuck on? Were u able to retrieve the month and semester from the dates? If u r stuck here, I will let u know how to do that.

Regards,
Shreha.
 

Renuka Badduri

Active Member
Hi Renuka,

Yeah.. I have done this but not very sure about it. Since they are asking for monthly and semester sales, I was not sure whether they wanted it store wise or for the whole dataset. So, I tried for both.

You need to retrieve the month, and the semester from the weekly sales , and then do a groupby.

monthly_sales = df_sales.groupby(['Store','month'])['Weekly_Sales'].sum()
monthly_sales

Store month
1 1 11203741.49
2 19505306.58
3 20380666.86
4 21623140.34
5 18505332.90

and the same way for semester.

And then I used a line plot to plot them.

Let me know what u are stuck on? Were u able to retrieve the month and semester from the dates? If u r stuck here, I will let u know how to do that.

Regards,
Shreha.

Hi Shreha,

Yes! I have calculated monthly and semester sales for all stores. I was not sure in that statement mentioned as "Units".

As of now project is going good and calculated below accuracy scores:

Accuracy of CPI is : 0.3055555555555556
Accuracy of Umemployment is: 0.9444444444444444
Accuracy of Fuel Price is: 0.6111111111111112

Please confirm whether you get the same values.

I used scatter plot due to monthly sales records. Thanks much for quick help and understanding.

Regards,
Renuka.
 
Hi, I m solving Customer Service Requests Analysis project 1, In csv file in created date and closed date's date format is different. I am not able to make two columns having same date format..
can anyone help me....
 

Attachments

  • Screenshot.png
    Screenshot.png
    38.6 KB · Views: 7
Hello everyone,

I am doing NYC complaint response time project. I am stuck with these two questions:
  1. Whether the average response time across complaint types is similar or not (overall)............I GROUPED COMPLAINED TYPE ACCORDING TO MEAN REQUEST CLOSING TIME....BUT I AM NOT ABLE TO UNDERSTAND WHAT STATISTICAL TEST SHOULD I APPLY...IF T-TEST...HOW?

  2. Are the type of complaint or service requested and location related?........WHAT STATISTICAL TEST WILL BE USED HERE
Could someone please guide me and provide some hint for solving these questions?

I am not getting correct date format when imported file as csv. After reading date format for both date created and closed date column is not same. Can u help me..I have also attached screenshot.
 

Attachments

  • Screenshot.png
    Screenshot.png
    38.6 KB · Views: 8
Hi All,

I had given wrong result above due to variables taken.
now i got the result is : Store 7 has highest.
Weekly_Sales_Q2 Weekly_Sales_Q3 Q3-Q2 % Sales
Store
7
7290859.27 8262787.39 971928.12 13.330776
16 6564335.98 7121541.64 557205.66 8.488378
35 10838313.00 11322421.12 484108.12 4.466637
26 13155335.57 13675691.91 520356.34 3.955478
39 20214128.46 20715116.23 500987.77 2.478404

Please confirm.


Hi Renuka,

i also got the same result for question 3

regards,
gyaneswar
 
Hi Shreha/Mrudula,

I got results for step 4 like below:
Are these figures matched with your numbers? please let me know.

Super Bowl Sales 145682278.34 are higher than Avg non holiday sales 1041256.3802088564
Labour Day Sales 140727684.68 are higher than Avg non holiday sales 1041256.3802088564
Thanksgiving Sales 132414608.5 are higher than Avg non holiday sales 1041256.3802088564
Christmas Sales 86474980.03999999 are higher than Avg non holiday sales 1041256.3802088564

Regards,
Renuka


Hi Renuka,

There are only four holidays in a year and your solution got on all holidays sales are higher than avg, inspite it was given some holidays have neative effect. please check it once.

regards
gyaneswar
 
Hi Shreha

this is how i done it
# step 5 Group by City, Complain Type and showing average of Request Closing in Hour
df_nyc311_grouped = df_nyc311.groupby(['City','Complaint Type']).agg({'Request_Closing_In_Hr': 'mean'})
print(df_nyc311_grouped)

and the output is

View attachment 9484

Yeah Mrudula,
I too am done with 80% of the project. Just the hypothesis testing is pending. so working on that right now.

COuld you confirm the answer for this question:

# Order the complaint types based on the average ‘Request_Closing_Time’, grouping them for different locations.

I got the below answer:


Out[45]:

Location Type Complaint Type Processing Time Float
Park
Animal in a Park 14.034780
Roadway Tunnel Derelict Vehicle 0.748507
Street/Sidewalk Graffiti 0.501563
Highway Derelict Vehicle 0.341488


and then I plot it.

Regards,
Shreha.


Hi Mrudula/_12579
can u please help me in solving...
 

_12759

Active Member
Hi Renuka,

There are only four holidays in a year and your solution got on all holidays sales are higher than avg, inspite it was given some holidays have neative effect. please check it once.

regards
gyaneswar

Hi Gyaneswar,

Even I got all positive impact of sales. Were you able to find the negative impact of sales? If, so please share your approach.

Regards,
Shreha.
 

_12759

Active Member
I am not getting correct date format when imported file as csv. After reading date format for both date created and closed date column is not same. Can u help me..I have also attached screenshot.

Hi Harshil,

Thats perfectly fine. If you check the excel file, the dates are in this format only. Now you need to convert these dates into 'datetime' to get the proper date format , by using pd.to_datetime function.

Regards,
Shreha.
 

_12759

Active Member
Hi Shreha,

Yes! I have calculated monthly and semester sales for all stores. I was not sure in that statement mentioned as "Units".

As of now project is going good and calculated below accuracy scores:

Accuracy of CPI is : 0.3055555555555556
Accuracy of Umemployment is: 0.9444444444444444
Accuracy of Fuel Price is: 0.6111111111111112

Please confirm whether you get the same values.

I used scatter plot due to monthly sales records. Thanks much for quick help and understanding.

Regards,
Renuka.

Renuka,

How were you able to calculate these accuracies. This is the part I dont understand well. :(
COuld you please share the steps you followed over here?

Regards,
Shreha.
 
Hi Gyaneswar,

Even I got all positive impact of sales. Were you able to find the negative impact of sales? If, so please share your approach.

Regards,
Shreha.


Hi Sreha,

i calculated sales for individual holiday dates and compared with avg non holiday sales.(for all stores together, as mentioned in the question)
i got solution,

2010-02-12 1 48336677.63
2010-11-26 1 65821003.24
2011-02-11 1 47336192.79
2011-11-25 1 66593605.26
2012-02-10 1 50009407.92
2012-09-07 1 48330059.31

so on these holidays sales are more than avg sales on non holiday sales.

regards,
Gyaneswar
 
Hi Harshil,

Thats perfectly fine. If you check the excel file, the dates are in this format only. Now you need to convert these dates into 'datetime' to get the proper date format , by using pd.to_datetime function.

Regards,
Shreha.

Thanks Shreha,
Hi Harshil,

Thats perfectly fine. If you check the excel file, the dates are in this format only. Now you need to convert these dates into 'datetime' to get the proper date format , by using pd.to_datetime function.

Regards,
Shreha.



Hi Shreha,

thanks for your help.

I am now stucked at point 3..(Provide major insights/patterns)

1. do I need to use matplotlib here?
2. How I can draw 4 conclusions.

Thanks in advance.
 

_12759

Active Member
Hi Sreha,

i calculated sales for individual holiday dates and compared with avg non holiday sales.(for all stores together, as mentioned in the question)
i got solution,

2010-02-12 1 48336677.63
2010-11-26 1 65821003.24
2011-02-11 1 47336192.79
2011-11-25 1 66593605.26
2012-02-10 1 50009407.92
2012-09-07 1 48330059.31

so on these holidays sales are more than avg sales on non holiday sales.

regards,
Gyaneswar

hmmm... so u have calculated total sales for all the 17 individual holiday weeks provided and then compared it with the avg non holiday sales. Now i am confused. :)

The questions are really ambiguous.

regards,
Shreha.
 
Top