Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Data Science with Python | Rahul Aggarwal

_12759

Active Member
Thanks Shreha,





Hi Shreha,

thanks for your help.

I am now stucked at point 3..(Provide major insights/patterns)

1. do I need to use matplotlib here?
2. How I can draw 4 conclusions.

Thanks in advance.

Yes Harshal,
U need to use Matplotlib here in order to plot the graphs.4 conclusions can be anything--
1. You can group data based on Borough,complaint type and get the count. This way u can calculate which borough has the max complaints and what type of complaint is max over there.
2. You can find out which city/borough has the max complaints.
3. you can get the Top 10 complaints from the dataset.
4. The total of Open/closed complaints .

These are some of the things I have done. Hope this helps.

Regards,
Shreha.
 

PAYAL_27

Member
Hi Renuka,

Yeah.. I have done this but not very sure about it. Since they are asking for monthly and semester sales, I was not sure whether they wanted it store wise or for the whole dataset. So, I tried for both.

You need to retrieve the month, and the semester from the weekly sales , and then do a groupby.

monthly_sales = df_sales.groupby(['Store','month'])['Weekly_Sales'].sum()
monthly_sales

Store month
1 1 11203741.49
2 19505306.58
3 20380666.86
4 21623140.34
5 18505332.90

and the same way for semester.

And then I used a line plot to plot them.

Let me know what u are stuck on? Were u able to retrieve the month and semester from the dates? If u r stuck here, I will let u know how to do that.

Regards,
Shreha.
Hi ,
We need to plot graph between monthly sales and what?
I am confused...If anyone can help
 

Renuka Badduri

Active Member
Renuka,

How were you able to calculate these accuracies. This is the part I dont understand well. :(
COuld you please share the steps you followed over here?

Regards,
Shreha.

Hi Shreha,

We need to find the accuracy score of CPI, Umemployment and Fuel Price.

1. We need to build features and target/response sets with above.
2. Split data into respective train and test data sets
3. By using train data sets, calculate prediction values
4. With the help of metrics:

from sklearn import metrics
metrics.accuracy_score(y_test_cpi,y_pred)

I have still a doubt on holiday sales to be calculated as per date or group of dates. Confusing !!

Regards,
Renuka
 

Renuka Badduri

Active Member
Yes Harshal,
U need to use Matplotlib here in order to plot the graphs.4 conclusions can be anything--
1. You can group data based on Borough,complaint type and get the count. This way u can calculate which borough has the max complaints and what type of complaint is max over there.
2. You can find out which city/borough has the max complaints.
3. you can get the Top 10 complaints from the dataset.
4. The total of Open/closed complaints .

These are some of the things I have done. Hope this helps.

Regards,
Shreha.

Hi Shreha,

It seems you are working on multiple projects. Glad to know that.
you will get more exposure on details and steps to follow.

how are you managing? did you submitted any of projects yet?

Regards,
Renuka
 

Renuka Badduri

Active Member
Hi Sreha,

i calculated sales for individual holiday dates and compared with avg non holiday sales.(for all stores together, as mentioned in the question)
i got solution,

2010-02-12 1 48336677.63
2010-11-26 1 65821003.24
2011-02-11 1 47336192.79
2011-11-25 1 66593605.26
2012-02-10 1 50009407.92
2012-09-07 1 48330059.31

so on these holidays sales are more than avg sales on non holiday sales.

regards,
Gyaneswar
Hi Gyaneswar,

Did you calculated on each holiday date given? will you find any impact on dates.?
It seems you haven't taken total holiday dates weekly sales.

Regards,
Renuka
 

PAYAL_27

Member
Dear Learners,

Agenda for tomorrow's session is to learn more about Big Data Analytics - Hadoop/Apache, Map/Reduce, PySpark with Python and PandaSql.

Therefore, I request you all to come prepared before joining the live class by following below steps -

(1.) Download the material uploaded on Google Drive : https://drive.google.com/open?id=1FugYbFcU7Urks54VsQ4w_FhMvJf5ZgEO

(2.) Install following packages using anaconda prompt :
(a.) conda install -c anaconda pandasql
(b.) conda install -c conda-forge findspark
(c.) conda update --all

(3.) Setup the PySpark in your local machine in order to integrate it with Jupyter notebook
(a.) Follow the steps mentioned in this link : https://bigdata-madesimple.com/guide- to-install-spark-and-use-pyspark-from-jupyter-in-windows/
(b.) Link to download apache software : http://spark.apache.org/downloads.html (c.) There's a folder [Setup Guide] in the content folder for tomorrow's session -
There are some instruction files and winutil tool which will be required during the setup. Just setup PySpark in your machine and incase you don't succeed, we will discuss that during the session.

Have a Good Night! :)

Regards, Rahul
 

_12759

Active Member
Hi Shreha,

It seems you are working on multiple projects. Glad to know that.
you will get more exposure on details and steps to follow.

how are you managing? did you submitted any of projects yet?

Regards,
Renuka

Yes my dear...
I have been working on all the 4 projects simultaneously and the major part where I get confused is building a model. Need to practice more to do it. :)

Shreha.
 

_12759

Active Member
Monthly Sales or normal Weekly Sales?

Hi Payal,

I grouped the dataset, based on month and calculated a sum of 'weekly sales'. Plotted a line plot for it. I also plotted a graph by grouping store,month and their weekly sales.
Same for semester too.
Regards,
Shreha.
 

_12759

Active Member
Hi Shreha,

We need to find the accuracy score of CPI, Umemployment and Fuel Price.

1. We need to build features and target/response sets with above.
2. Split data into respective train and test data sets
3. By using train data sets, calculate prediction values
4. With the help of metrics:

from sklearn import metrics
metrics.accuracy_score(y_test_cpi,y_pred)

I have still a doubt on holiday sales to be calculated as per date or group of dates. Confusing !!

Regards,
Renuka


Hi Renuka,

But for building the model you have not considered the dates as one of the features?
I am a bit confused as the question says:
  • Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.

  • Change dates into days by creating new variable.
The 2 bolded parts of the question I am not sure why they have given? I used Label encoder for converting the dates from 1 and so on...
Then I was thinking that we need to pass this in the features to build our model.

Also, for changing dates into days... I am confused why have they asked that?? how is that related to regression model? Or is it just a statistical question.

Regards,
Shreha.
 

Renuka Badduri

Active Member
Hi Renuka,

But for building the model you have not considered the dates as one of the features?
I am a bit confused as the question says:
  • Linear Regression – Utilize variables like date and restructure dates as 1 for 5 Feb 2010 (starting from the earliest date in order). Hypothesize if CPI, unemployment, and fuel price have any impact on sales.

  • Change dates into days by creating new variable.
The 2 bolded parts of the question I am not sure why they have given? I used Label encoder for converting the dates from 1 and so on...
Then I was thinking that we need to pass this in the features to build our model.

Also, for changing dates into days... I am confused why have they asked that?? how is that related to regression model? Or is it just a statistical question.

Regards,
Shreha.

Hi Shreha,

As per my understanding:
For Step1: I did it as a separate step by using index. But nowhere else mentioned to utilize it.
For Step 2: I did it as a separate step by using datetime() function to find days of week. Thats it.

Regards,
Renuka
 

Rahul_Aggarwal

Active Member
Alumni
Dear Learners,

Agenda for today's session is to learn more about Big Data Analytics - Hadoop/Apache, Map/Reduce, PySpark with Python and PandSql. Therefore, I request you all to come prepared before joining the live class by following below steps -

(1.) Download the material uploaded on Google Drive : https://drive.google.com/open?id=1FugYbFcU7Urks54VsQ4w_FhMvJf5ZgEO

(2.) Install following packages using anaconda prompt :
(a.) conda install -c anaconda pandasql
(b.) conda install -c conda-forge findspark
(c.) conda update --all

(3.) Setup the PySpark in your local machine in order to integrate it with Jupyter notebook
(a.) Follow the steps mentioned in this link : https://bigdata-madesimple.com/guide-to-install-spark-and-use-pyspark-from-jupyter-in-windows/
(b.) Link to download apache software : http://spark.apache.org/downloads.html
(c.) There's a folder [Setup Guide] in the content folder for tomorrow's session - There are some instruction files and winutil tool which will be required during the setup.

Just setup PySpark in your machine and incase you don't succeed, we will discuss that during the session.

Happy Learning! :)

Regards,
Rahul
 

Mrudula Bhimavarapu

Active Member
Hi Shreha,

We need to find the accuracy score of CPI, Umemployment and Fuel Price.

1. We need to build features and target/response sets with above.
2. Split data into respective train and test data sets
3. By using train data sets, calculate prediction values
4. With the help of metrics:

from sklearn import metrics
metrics.accuracy_score(y_test_cpi,y_pred)

I have still a doubt on holiday sales to be calculated as per date or group of dates. Confusing !!

Regards,
Renuka


Hi Renuka

thats the correct approach for the accuracy

whats the doubt on holiday sales, sorry dint get that Question

apologies was tied up in the weekend software release and hence couldnt monitor this link

regards
Mrudula
 

Mrudula Bhimavarapu

Active Member
Hi Shreha,

As per my understanding:
For Step1: I did it as a separate step by using index. But nowhere else mentioned to utilize it.
For Step 2: I did it as a separate step by using datetime() function to find days of week. Thats it.

Regards,
Renuka


hi folks

i guess for the 2nd question we asked to populate the day of the week i.e use the day_name() function to label as Monday, Wednesday for instance

hope this helps

regards
Mrudula
 

Mrudula Bhimavarapu

Active Member
Yes Harshal,
U need to use Matplotlib here in order to plot the graphs.4 conclusions can be anything--
1. You can group data based on Borough,complaint type and get the count. This way u can calculate which borough has the max complaints and what type of complaint is max over there.
2. You can find out which city/borough has the max complaints.
3. you can get the Top 10 complaints from the dataset.
4. The total of Open/closed complaints .

These are some of the things I have done. Hope this helps.

Regards,
Shreha.


Hi Shreha

The 4 insights mentioned above is something we need understand the data and arrive at different insights
@Harshal. hope this helps

regards
Mrudula
 
Hi Shreha

The 4 insights mentioned above is something we need understand the data and arrive at different insights
@Harshal. hope this helps

regards
Mrudula

Hi @Shreha/ Mrudula,

I am taking as max closing time taken as a one of 4 conclusion

df1[df1.Request_Closing_Time == df.Request_Closing_Time.max()]

so for matplotlib...how can i graph/plot this..means what to take as x axis and y axis.

Thanks in advance.
 
Last edited:

Renuka Badduri

Active Member
Hi Renuka

thats the correct approach for the accuracy

whats the doubt on holiday sales, sorry dint get that Question

apologies was tied up in the weekend software release and hence couldnt monitor this link

regards
Mrudula

Hi Mrudula,

I got confused about step 4 where we need to get store sales on holidays. initially i have calculated for all super bowl dates and compared.
later realized to calculate each holiday DATE level to compare with avg of non-holiday sales

I think it may understand now.

Regards,
Renuka.
 

Rahul_Aggarwal

Active Member
Alumni
Hey Guys,

I restarted my system and tried running this command 'bin\pyspark' and it worked for me. Could you all please try at your end after restarting the system? Ensure you first run this command on your anaconda prompt - cd C:\Users\USERNAME\Desktop\spark\spark-2.4.5-bin-hadoop2.7
i.e. cd [space] followed by entire path of spark-2.4.5-bin-hadoop2.7 folder. refer to the below screenshot for your reference.

upload_2020-5-19_0-7-6.png
 
Hi Ramanpreet,

Since you are doing this project, could you please help me with the part, as to how to calculate the 'total Response time'??

Hi there,
If you are asking about request closing time, please use the code below for calculating the same.
df['Request_Closing_Time']= df['Closed Date']-df['Created Date']
df['Request_Closing_Time']=df['Request_Closing_Time']/np.timedelta64(1,'s')
 
NYC project problem:

Someone please help me to conduct ANOVA test for checking whether the average response time across complaint types is similar or not. Do I need to make new columns for different complaint types or anything else for sub-setting. I am using the following code but its giving error for Complaint types!!!
mod=ols('Request_Closing_Time ~ Complaint Type',data=df).fit()
anova_table=sm.stats.anova_lm(mod)
anova_table

PLEASE HELP!!!

Thanks in advance.
 

Renuka Badduri

Active Member
Hey Guys,

I restarted my system and tried running this command 'bin\pyspark' and it worked for me. Could you all please try at your end after restarting the system? Ensure you first run this command on your anaconda prompt - cd C:\Users\USERNAME\Desktop\spark\spark-2.4.5-bin-hadoop2.7
i.e. cd [space] followed by entire path of spark-2.4.5-bin-hadoop2.7 folder. refer to the below screenshot for your reference.

View attachment 9511
Hi Rahul,

I had succeeded it with following steps:
1. First I had copied winutil.exe as given in setup notes to SPARK_HOME\Hadoop\bin.
2. Next updated the HADOOP_HOME environment variable in Control Panel Settings with above folder path
3. Later Opened Anaconda. command prompt and >>cd SPARK_HOME then >>cd Hadoop\bin
4. Finally I used 'jupyter notebook' instead of using bin\pyspark. It worked and opened notebook.

Regards,
Renuka
 

Mrudula Bhimavarapu

Active Member
Hi Mrudula,

I got confused about step 4 where we need to get store sales on holidays. initially i have calculated for all super bowl dates and compared.
later realized to calculate each holiday DATE level to compare with avg of non-holiday sales

I think it may understand now.

Regards,
Renuka.
hiya

yes thats how it shld be compared with

lemme know if u still stuck

regards
Mrudula
 
Hello Kanika,

I have gone through your code and found the mistake -
a. Incorrect : m=hr_attr.average_montly_hours.std() : you are computing standard deviation here instead of mean
b. Correct : m=hr_attr.average_montly_hours.mean()
s=hr_attr.average_montly_hours.std()
(m-200)/(s/np.sqrt(14999))
It should return the correct T- statistics value. Use this T-stat value to get the P-value.

Hope this helps!
Yes, this solves my problem. Thanks and sorry for this silly mistake.
 

_12759

Active Member
Hi Mrudula,

Since you have completed NYC project. Could you please let me know what was the pvalue you got after the Anova testing and Chi Square testing?
For both these tests I am getting pvalue as 0.0

Regards,
Shreha.
 

Mrudula Bhimavarapu

Active Member
Hi Mrudula,

Since you have completed NYC project. Could you please let me know what was the pvalue you got after the Anova testing and Chi Square testing?
For both these tests I am getting pvalue as 0.0

Regards,
Shreha.
Hello Shreha

yes even i am getting 0 for both the tests. i tried for hours together but gave up :( and submitted. assumption that its correct karke

regards
Mrudula
 

_12759

Active Member
Hello Shreha

yes even i am getting 0 for both the tests. i tried for hours together but gave up :( and submitted. assumption that its correct karke

regards
Mrudula


Ohhh Really... U have no idea... your statement above gives immense relief.. :D I too hvae been trying for hours since 3 days... tried different methods.. but got the same Pvalue.,..
Now I think even I will submit the same.

Thank so much for a quick reply.

Regards,
Shreha.
 

_12759

Active Member
NYC project problem:

Someone please help me to conduct ANOVA test for checking whether the average response time across complaint types is similar or not. Do I need to make new columns for different complaint types or anything else for sub-setting. I am using the following code but its giving error for Complaint types!!!
mod=ols('Request_Closing_Time ~ Complaint Type',data=df).fit()
anova_table=sm.stats.anova_lm(mod)
anova_table

PLEASE HELP!!!

Thanks in advance.

Hi Ramanpreet,

I had seen your query but I am not sure how to resolve this. I think you cannot pass the 'complaint Type' like this above...
I think the dataset should be in a different format.
I found this link which could be helpful:

https://reneshbedre.github.io/blog/anova.html

I have tried f_oneway function for anova in NYC

Was trying this ols summary option too.. but haven't got through it yet. In case I am able to do it , will let you know.

Thanks and Regards,
Shreha.
 

Mrudula Bhimavarapu

Active Member
Hi Ramanpreet,

I had seen your query but I am not sure how to resolve this. I think you cannot pass the 'complaint Type' like this above...
I think the dataset should be in a different format.
I found this link which could be helpful:

https://reneshbedre.github.io/blog/anova.html

I have tried f_oneway function for anova in NYC

Was trying this ols summary option too.. but haven't got through it yet. In case I am able to do it , will let you know.

Thanks and Regards,
Shreha.

NYC project problem:

Someone please help me to conduct ANOVA test for checking whether the average response time across complaint types is similar or not. Do I need to make new columns for different complaint types or anything else for sub-setting. I am using the following code but its giving error for Complaint types!!!
mod=ols('Request_Closing_Time ~ Complaint Type',data=df).fit()
anova_table=sm.stats.anova_lm(mod)
anova_table

PLEASE HELP!!!

Thanks in advance.

hi ramanpreet

apologies took some time to revert, one way to do it is pass top 5 complaint type, drop all null values and run as per below
stats.f_oneway(s1, s2, s3, s4, s5)

wherein s1,s2... are sample 1, sample 2,... and so on

hope this helps

regards
mrudula
 

_12759

Active Member
hi ramanpreet

apologies took some time to revert, one way to do it is pass top 5 complaint type, drop all null values and run as per below
stats.f_oneway(s1, s2, s3, s4, s5)

wherein s1,s2... are sample 1, sample 2,... and so on

hope this helps

regards
mrudula

Hi Mrudula,

Do we pass only 5 complaint types?? I have passed all .... And I had confirmed this from Rahul also yesterday that we need to pass all the complaint types.

Regards,
Shreha.
 

Mrudula Bhimavarapu

Active Member
Hi Mrudula,

Do we pass only 5 complaint types?? I have passed all .... And I had confirmed this from Rahul also yesterday that we need to pass all the complaint types.

Regards,
Shreha.
hi Shreha

we can do eitherways, take top 5 and proceed further with test or pass all complaint types. the above reply to ramanpreet was just an example as to how to execute the function

hope this helps

regards
Mrudula
 

Mrudula Bhimavarapu

Active Member
Hi Rahul,

I had succeeded it with following steps:
1. First I had copied winutil.exe as given in setup notes to SPARK_HOME\Hadoop\bin.
2. Next updated the HADOOP_HOME environment variable in Control Panel Settings with above folder path
3. Later Opened Anaconda. command prompt and >>cd SPARK_HOME then >>cd Hadoop\bin
4. Finally I used 'jupyter notebook' instead of using bin\pyspark. It worked and opened notebook.

Regards,
Renuka

Agreed, even i am able to run & test the same
 
hi ramanpreet

apologies took some time to revert, one way to do it is pass top 5 complaint type, drop all null values and run as per below
stats.f_oneway(s1, s2, s3, s4, s5)

wherein s1,s2... are sample 1, sample 2,... and so on

hope this helps

regards
mrudula
hi Shreha

we can do eitherways, take top 5 and proceed further with test or pass all complaint types. the above reply to ramanpreet was just an example as to how to execute the function

hope this helps

regards
Mrudula
Thanks Mrudula and Shreha for your replies. However, i tried to get using this code and got results
mod=ols('Request_Closing_Time~Complaint_Type',data=df).fit()
anova_table=sm.stats.anova_lm(mod)
anova_table

I feel the space within Cpmplaint Type was the issue. Please check if the results are correct?

RESULTS:
upload_2020-5-19_15-46-42.png
 
Last edited:

_12759

Active Member
Thanks Mrudula and Shreha for your replies. However, i tried to get using this code and got results
mod=ols('Request_Closing_Time~Complaint_Type',data=df).fit()
anova_table=sm.stats.anova_lm(mod)
anova_table

I feel the space within Cpmplaint Type was the issue. Please check if the results are correct?

RESULTS:
View attachment 9519
Hey ramanpreet,
Awsome...you were able to do it. Even I am getting p value as 0. It's 0 for chi square test too.
 
Hey ramanpreet,
Awsome...you were able to do it. Even I am getting p value as 0. It's 0 for chi square test too.
Hello
For last question, instead of Chi-square test I used correlation plot (after encoding the two categorical values). Not sure if it is correct?? : (

I was not able to figure out code for performing Chi-square test.

#CT for complaint type......DOES THIS SHOW NO CORRELATION?
upload_2020-5-19_17-49-33.png
 
Last edited:

_12759

Active Member
Hello
For last question, instead of Chi-square test I used correlation plot (after encoding the two categorical values). Not sure if it is correct?? : (

I wa
chi_square , p_value, degrees_of_freedom, expected_frequencies = stats.chi2_contingency(crosstab_data)s not able to figure out code for performing Chi-square test.

#CT for complaint type......DOES THIS SHOW NO CORRELATION?
View attachment 9522

Hi Ramanpreet,

I would suggest you use the Chi Square Test. it is much easier to use and code,

1. First create a cross tab for 'City' and 'Complaint Types'.

crosstab_data = pd.crosstab(df_dataset['City'],df_dataset['Complaint Type'],margins = True)

2. Observed_Values = crosstab_data.values

3. Now run the Chi Square test:

chi_square , p_value, degrees_of_freedom, expected_frequencies = stats.chi2_contingency(crosstab_data)

This is what I have done. But Please explore over the internet and also the jupyter file given by Rahul today, to come up with your own understanding as well.

Let me know if there is anything I am missing in my code too.. as we are all exploring and learning .

Thanks and Regards,
Shreha.
 
Hi Everyone,

Can anyone help me in NYC project. I am stuck at point 4 Order the complaint types based on the average ‘Request_Closing_Time’, grouping them for different locations. I am using following code but got an error as "No numeric types to aggregate".
upload_2020-5-20_11-57-38.png
 

Mrudula Bhimavarapu

Active Member
Hi Everyone,

Can anyone help me in NYC project. I am stuck at point 4 Order the complaint types based on the average ‘Request_Closing_Time’, grouping them for different locations. I am using following code but got an error as "No numeric types to aggregate".
View attachment 9525


Hello

there you go and see if this helps

# Question 4.: Order the complaint types based on the average ‘Request_Closing_Time’, group
# step 1 to check if there is missing values there
df_nyc311['City'].isnull().sum()
# step 2 Fill all missing value with some default value here i used - Not Available
df_nyc311['City'].fillna('Not Available', inplace=True)
df_nyc311['City'].head()
df_nyc311['City']
# ste p 4 Group according to City and Complaint Type
df_nyc311_grouped = df_nyc311.groupby(['City', 'Complaint Type'])
# step 5 get average of city wise grouped list, and get Request_Closing_Time column
df_nyc311_mean = df_nyc311_grouped.mean()['Request_Closing_In_Hr']
df_nyc311_mean.isnull().sum()
# step 5 Group by City, Complain Type and showing average of Request Closing in Hour
df_nyc311_grouped = df_nyc311.groupby(['City','Complaint Type']).agg({'Request_Closing_In_H
df_nyc311_grouped
print(df_nyc311_grouped)


regards
Mrudula
 

_12759

Active Member
Hi Everyone,

Can anyone help me in NYC project. I am stuck at point 4 Order the complaint types based on the average ‘Request_Closing_Time’, grouping them for different locations. I am using following code but got an error as "No numeric types to aggregate".
View attachment 9525

Hi Kanika,

The main problem here is that you have not converted your 'Request_Closing_Time' column to a float value. It is as of now - type 'datetime'.

Please convert your column to a float datatype - only then can you perform aggregate function on it.
You need to convert your column in one type - (days or hrs or seconds)

I have converted my column to represent the total time in seconds as below:

df_dataset.Request_Closing_Time = df_dataset.Request_Closing_Time.apply(lambda x: x.total_seconds())

After doing this, when you run your above groupby statement it should work.

Thanks and Regards,
Shreha.
 

Rahul_Aggarwal

Active Member
Alumni
Hey Guys,

If we get time today, we will also discuss H2O package and learn how to deploy machine learning solution in the production environment. Request you to please install the H20 library before today's session.

Install H2Othrough conda command prompt
1. Run the following command to remove any existing H2O module for Python -
pip uninstall h2o
2. Use pip to install this version of the H2O Python module -
pip install -fhttp://h2o-release.s3.amazonaws.com/h2o/latest_stable_Py.htmlh2o

Once installed, run these commands in Jupyter notebook
1. import h2o
2. h2o.init()

Happy Learning!

Regards,
Rahul A
 
Working on Walmart Project and facing following issues:

Question number 5 i.e.
Provide a monthly and semester view of sales in units and give insights
1. Not able to fetch the data semester wise.
2. How to increase bins for line chart?
3. To get the monthly data I used Periodindex and did a groupby to get the total monthly sale.
Then did reset_index to change the format, as Period format was not accepted as an input to plot command. Now months displayed on the x axis are 0-35, but i want them to b displayed in date format , i.e. Jan 2010.. Feb 2010 and so on
so how to do that.

B. Statistical Model
1. I am not able to decide on the ML modal for this project. Linear Regression is performing very bad on it also the relationship between features and target variable is not linear.
2. How to change dates into days

and thus not able to proceed with the project work.
Please help.

Thanks & Regards
Shorya
 

_12759

Active Member
Hi All,

Need help on NYC project step 5:

Thanks in advance..

Hi Harshal,
For 5th quqestion:
point1 - u need to perform Anova test. Pass all the complaint types and their reponse times.
point 2 - U need to perform Chi Square test. You can scroll up on the other posts in our thread, we have discussed chi square.

Regards,
Shreha.
 
Top