Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Data Science with Python | Rahul Aggarwal

_12759

Active Member
Working on Walmart Project and facing following issues:

Question number 5 i.e.
Provide a monthly and semester view of sales in units and give insights
1. Not able to fetch the data semester wise.
2. How to increase bins for line chart?
3. To get the monthly data I used Periodindex and did a groupby to get the total monthly sale.
Then did reset_index to change the format, as Period format was not accepted as an input to plot command. Now months displayed on the x axis are 0-35, but i want them to b displayed in date format , i.e. Jan 2010.. Feb 2010 and so on
so how to do that.

B. Statistical Model
1. I am not able to decide on the ML modal for this project. Linear Regression is performing very bad on it also the relationship between features and target variable is not linear.
2. How to change dates into days

and thus not able to proceed with the project work.
Please help.

Thanks & Regards
Shorya


Hi,
For retrieve data semester wise :

First I created the quarter for every date like this:

df_sales['Quarter'] = df_sales['date_new'].dt.quarter


StoreDateWeekly_SalesHoliday_FlagTemperatureFuel_PriceCPIUnemploymentdate_newquartermonthQuarter
6430
45 28-09-2012 713173.95 0 64.88 3.997 192.013558 8.684 2012-09-28 2012Q3 9 3

2. I created semester, where q1 and q2 belong to first semester, and q3 and q4 belong to second semester.

df_sales['semester'] = np.where(df_sales.Quarter.isin([1,2]),1,2)
df_sales.tail()


StoreDateWeekly_SalesHoliday_FlagTemperatureFuel_PriceCPIUnemploymentdate_newquartermonthQuarterYearsemester
6430
45 28-09-2012 713173.95 0 64.88 3.997 192.013558 8.684 2012-09-28 2012Q3 9 3 2012 2

Now that we have the semester, you can group your weekly sales on (year + semester)

This is what I have done.

Hope it helps. :)

Thanks and Regards,
Shreha.
 
Hi Ramanpreet,

I would suggest you use the Chi Square Test. it is much easier to use and code,

1. First create a cross tab for 'City' and 'Complaint Types'.

crosstab_data = pd.crosstab(df_dataset['City'],df_dataset['Complaint Type'],margins = True)

2. Observed_Values = crosstab_data.values

3. Now run the Chi Square test:

chi_square , p_value, degrees_of_freedom, expected_frequencies = stats.chi2_contingency(crosstab_data)

This is what I have done. But Please explore over the internet and also the jupyter file given by Rahul today, to come up with your own understanding as well.

Let me know if there is anything I am missing in my code too.. as we are all exploring and learning .

Thanks and Regards,
Shreha.
Hi Shreha,

Thanks a lot for your suggestion. As I submitted my project already but I will use these codes in my own notes. Correlation value came 0.101 for City and Complaint Type that means no correlation.

Thanks and regards
Raman
 
Can anybody help in below section of Movielens project. I have done rest.

Use column genres:

1. Find out all the unique genres (Hint: split the data in column genre making a list and then process the data to find out only the unique categories of genres)

2. Create a separate column for each genre category with a one-hot encoding ( 1 and 0) whether or not the movie belongs to that genre.

3. Determine the features affecting the ratings of any particular movie.

4. Develop an appropriate model to predict the movie ratings
 
Hi Everyone,

Can anyone help me in NYC project. I am stuck at point 4 Order the complaint types based on the average ‘Request_Closing_Time’, grouping them for different locations. I am using following code but got an error as "No numeric types to aggregate".
View attachment 9525

Hi Kanika
Did your previous code for Request_Closing_Time run successfully. It seems there is some issue in there.
df['Request_Closing_Time']

OUTPUT:
0 3330.0
1 5233.0
2 17494.0
3 27927.0
4 12464.0
...
364553 37067.0
364554 8434.0
364555 1143.0
364556 9653.0
364557 10020.0
Name: Request_Closing_Time, Length: 362177, dtype: float64
 

_12759

Active Member
Can anybody help in below section of Movielens project. I have done rest.

Use column genres:

1. Find out all the unique genres (Hint: split the data in column genre making a list and then process the data to find out only the unique categories of genres)

2. Create a separate column for each genre category with a one-hot encoding ( 1 and 0) whether or not the movie belongs to that genre.

3. Determine the features affecting the ratings of any particular movie.

4. Develop an appropriate model to predict the movie ratings

Hi Surendra,

Did you attend today's session with Rahul today?? He went through this project in detail.

Rgards,
Shreha
 

Mrudula Bhimavarapu

Active Member
Hi Surendra,

Did you attend today's session with Rahul today?? He went through this project in detail.

Rgards,
Shreha
Hello Surendra

Yes if you coud revisit the recording you should be able to do it .if not do let me know more than happy to help, as i have completed this project today only

regards
Mrudula
 
Hi Kanika
Did your previous code for Request_Closing_Time run successfully. It seems there is some issue in there.
df['Request_Closing_Time']

OUTPUT:
0 3330.0
1 5233.0
2 17494.0
3 27927.0
4 12464.0
...
364553 37067.0
364554 8434.0
364555 1143.0
364556 9653.0
364557 10020.0
Name: Request_Closing_Time, Length: 362177, dtype: float64
Hi Ramanpreet,

Yes it was not correct. I am unable to do the same. Trying to resolve this.
 
Hi Kanika,

The main problem here is that you have not converted your 'Request_Closing_Time' column to a float value. It is as of now - type 'datetime'.

Please convert your column to a float datatype - only then can you perform aggregate function on it.
You need to convert your column in one type - (days or hrs or seconds)

I have converted my column to represent the total time in seconds as below:

df_dataset.Request_Closing_Time = df_dataset.Request_Closing_Time.apply(lambda x: x.total_seconds())

After doing this, when you run your above groupby statement it should work.

Thanks and Regards,
Shreha.
Thanks for the reply shyera. I am doing this but it's not working for me.
 

Attachments

  • upload_2020-5-21_22-52-37.png
    upload_2020-5-21_22-52-37.png
    107.9 KB · Views: 8
Hello

there you go and see if this helps

# Question 4.: Order the complaint types based on the average ‘Request_Closing_Time’, group
# step 1 to check if there is missing values there
df_nyc311['City'].isnull().sum()
# step 2 Fill all missing value with some default value here i used - Not Available
df_nyc311['City'].fillna('Not Available', inplace=True)
df_nyc311['City'].head()
df_nyc311['City']
# ste p 4 Group according to City and Complaint Type
df_nyc311_grouped = df_nyc311.groupby(['City', 'Complaint Type'])
# step 5 get average of city wise grouped list, and get Request_Closing_Time column
df_nyc311_mean = df_nyc311_grouped.mean()['Request_Closing_In_Hr']
df_nyc311_mean.isnull().sum()
# step 5 Group by City, Complain Type and showing average of Request Closing in Hour
df_nyc311_grouped = df_nyc311.groupby(['City','Complaint Type']).agg({'Request_Closing_In_H
df_nyc311_grouped
print(df_nyc311_grouped)


regards
Mrudula
Thank you soo much mrudula for your help but i am still stuck in converting datetime datatype to float. Unable to convert it into hours.
 

_12759

Active Member
Thanks for the reply shyera. I am doing this but it's not working for me.

Hi Kanika,

Just wanted to make sure... have you converted the 'created date' and 'closed date' to datetime as below:

df_dataset['Created_Dt_n'] = pd.to_datetime(df_dataset['Created Date'])

df_dataset['Closed_Dt_n'] = pd.to_datetime(df_dataset['Closed Date'])

#Calculate the 'Request Closing Time' as the difference between the 'closed date' and 'created date'
df_dataset['Request_Closing_Time'] = df_dataset['Closed_Dt_n'] - df_dataset['Created_Dt_n']

#Convert the 'Request Closing Time' to a float value to represent in seconds. This will make our calculations easier.
df_dataset.Request_Closing_Time = df_dataset.Request_Closing_Time.apply(lambda x: x.total_seconds())

It should result in the following output:
0 3315.0
1 5176.0
2 17491.0
3 27914.0
4 12422.0

Hope this helps.

Regards,
Shreha
 

_12759

Active Member
Hi All,

Has anybody worked on the Comcast project? Though I have submitted other projects, I was working on this one as well and needed some pointers on the last question:

Provide the percentage of complaints resolved till date, which were received through the Internet and customer care calls.

If somebody has done it please do share some pointers. I am just doing this project for my learning.

Regards,
Shreha.
 

Renuka Badduri

Active Member
Hi All,

My project got passed and unlocked my certificate just now. We will receive a mail regarding from Simplilearn.
All the BEST:)

How are you guys doing? Have you enrolled for next courses??

Regards,
Renuka.
 

Mrudula Bhimavarapu

Active Member
Hello Renuka
Hearty Congratulations !!!

@ All, happy to share both my projects have been evaluated successfully and even i have unlocked the certificate

Regards
Mrudula
 

_12759

Active Member
Hi All,
I passed all my projects too... :) and unloacked the certificate. Thanks to you all for all the support and definitely Rahul.. I have learnt a lot from you. Thanks a lot :)

Regards,
Shreha.
 

Rahul_Aggarwal

Active Member
Alumni
Hello All,

Congratulations All for successfully completing the Data Science With Python Course and Passing the Certification. You can post these achievements over LinkedIn but, don't forget to add what all you covered/learnt in the course. Since, I'm not watching this thread actively therefore you can reach out to me on a different thread for any help/suggestion/feedback - https://community.simplilearn.com/threads/data-science-with-rahul-aggarwal.51091/

My next course would be Data Science with R starting from 13th June. Looking forward to see some of you again :)

Happy Learning!! :-D

Thanks and Regards,
Rahul A
 
Top