Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Data Science with Python | Rahul Aggarwal

Rahul_Aggarwal

Active Member
Alumni
Hello All,

Content for Today's class (Session -4) has been uploaded to Google Drive. Request you to please download that before coming to live session. Also, it would be wonderful if you can go through the Numpy self paced videos

Happy Learning!

Regards,
Rahul
 

Sumayya_1

Member
Hi Rahul

Please find the assignments done
Can you help me with assignment regarding the filter function and lambda expression,as im unable to find a solution with
 

Attachments

  • Assignment.zip
    4.8 KB · Views: 32
Assignment from session 3 is attached. I am also into solving assignment 2 & 3. Have few doubts. Please find attached assignment 1
 

Attachments

  • Session 3 - Python_For_DS_Assignment1.zip
    2.1 KB · Views: 17

Rahul_Aggarwal

Active Member
Alumni
Dear Learners,

I've uploaded the following content to Google Drive

1.) Numpy_Xtra_Mentor_Content.zip - This includes the image processing demonstration and Numpy Basics Jupyter notebook using some visuals .
2.) Session 5 Mentor Content - Pandas.zip - Pandas Jupyter notebook along with Python E-book and Puzzle for today.

Agenda for Today's session is to learn Pandas library and different ways to explore the data. I request you to please download the content and go through the Pandas self paced videos before coming to live session.

Happy Learning!

Regards,
Rahul
 
Last edited:

Rahul_Aggarwal

Active Member
Alumni
Dear Learners,

Hope you're doing great and have been enjoying the course so far! :)

Just wanted to do a quick recap of what we covered last week and in the subsequent post I will also be sharing the agenda for upcoming week i.e. Week 2 , to ensure that you come prepared for that.

Week-1 Recap :

1. Introduction to Data Science : Using Wrangler Case-Study, Intro to Jupyter Notebook/Python
2. Basics of Python Part 1 : Calculator, Data Types, Comparison & Logical Operators, Branching using if, elif & else
3. Basics of Python Part 2 : Functions, For Loop, range function, lambda function, Containers - List, Set, Dictionary & Tuples
4. Advance Python Library : Numpy - Array/Matrix, Built-in-functions, Changing the dimension & Selecting/Searching Element. I also demonstrated the vast applications of Numpy library through Super-Heroes Images datasets.
5. Advance Python Library : Pandas - Data Ingestion, Pandas Data Frame/Series, Various Operations, Data Traversing and Slicing.

As week-1 is the foundation block for upcoming sessions therefore, I request everyone to revisit the self-paced or mentor recorded session videos (if you'd like) and must :

a. Execute the code yourself and understand the working of different functionalities.
b. Solve Assignments and Practice problems provided by me to deepen your understanding - Submission is not required for these.
c. Solve and Submit Practice Projects linked in your dashboard - Attached Screenshot for your reference.
d. Give Quizzes and Knowledge Check linked in your dashboard - Attached Screenshot for your reference.

I have already uploaded the Week-1 content to Google ShareDrive. Along with cheat sheets, E-Books and Super-Heroes Numpy Demonstration Data/Code.

Please leverage this discussion form to post your queries and your assigned TA (SME) would give his best to resolve them.

Regards,
Rahul A

upload_2020-4-26_17-0-41.pngupload_2020-4-26_16-59-35.png
 

Attachments

  • upload_2020-4-26_17-3-25.png
    upload_2020-4-26_17-3-25.png
    284.5 KB · Views: 10

Rahul_Aggarwal

Active Member
Alumni
Learners,

Hope you had a relaxing weekend!

Below is the agenda for current week - This primarily focuses more on topics that are relevant from industry and interviews perspective.

1. Advance Pandas - Treating missing values, Group By Operation, Import/Export options, Data Summarization & Aggregation
2. Data Visualization using Matplotlib & Seaborn libraries - Line/Bar/Pie Charts, Histograms, Heat maps etc.
3. Exploratory Data Analysis (EDA) Case Study - Understanding the data, Data Cleaning, Feature Engineering, Finding relationship b/w variables
4. Basics Statistics - Central Tendency, Dispersion/Variability, Introduction to Probability, Data Distributions, Central Limit Theorem
5. Advance Statistics - Hypothesis Testing, Z-test, T-test, P-value, Type -1&2 Error

Will be uploading the content for today's session before the live class.

“Anyone who stops learning is old, whether at twenty or eighty. Anyone who keeps learning stays young.” – Henry Ford

Regards,
Rahul A
 

Renuka Badduri

Active Member
In the current DS market, what kind of projects going on and under what domains?
which areas of topic we need to stress out to learn and build skills on it?
As a techinical I know that we need to learn all concepts at One GO for client requirements. However As a fresher in DS, Initially which all topics good to know and practice.
 
Hi there,

I was wondering if someone can help me to convert text files into ipynb format/python .py format. The files provided in google drive are in txt format and I am unable to read them.

Thanks in advance!
 

sunny1637

Member
Alumni
Hi there,

I was wondering if someone can help me to convert text files into ipynb format/python .py format. The files provided in google drive are in txt format and I am unable to read them.

Thanks in advance!
You should open these files in Jupyter Note book. If you still need to use pdf you can download from Juptyer Note book as .pdf
 

PAYAL_27

Member
TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'..
I am getting this error while plotting barplot using Seaborn Library.
CODE---tips is a built-in Dataset,
sns.barplot(x=tips.sex , y = tips.total_bill);
 
Thanks Sunny for your help!
It worked. So, we need to download the zipped file, unzip it and then open jupyter notebook and fetch the file from download folder.
 
Hi,

I want to request to pls help me out, as I installed Anaconda in my laptop 3 time but every time after intalltion I am able to open it through start search option one time and after shutting down and opening my laptop I am not able to open it again. Every time I hit Anaconda/Jupyter Note book it pop up cmd window for fraction of secs and then goes off. Then after no respons, due to this I am not able to practise anything
 

sunny1637

Member
Alumni
Hi,

I want to request to pls help me out, as I installed Anaconda in my laptop 3 time but every time after intalltion I am able to open it through start search option one time and after shutting down and opening my laptop I am not able to open it again. Every time I hit Anaconda/Jupyter Note book it pop up cmd window for fraction of secs and then goes off. Then after no respons, due to this I am not able to practise anything
Click only once and wait for almost a minute. Its usually slow to start. It may work as i encountered same problem initially. Hope it works for you
 
Hi there,

I was wondering if someone can help me to convert text files into ipynb format/python .py format. The files provided in google drive are in txt format and I am unable to read them.

Thanks in advance!
If you open the
Hi,

I want to request to pls help me out, as I installed Anaconda in my laptop 3 time but every time after intalltion I am able to open it through start search option one time and after shutting down and opening my laptop I am not able to open it again. Every time I hit Anaconda/Jupyter Note book it pop up cmd window for fraction of secs and then goes off. Then after no respons, due to this I am not able to practise anything





Most probably while installing you might have corupted the file. You might have to reinstall anaconda but unlike normal uninstallation you have to delete the root files for the same as well to do a proper uninstallation. The documentation for the uninstallation is this "https://docs.anaconda.com/anaconda/install/uninstall/" follow the steps and redo it again.
 

_56018

Member
while trying to label the hist plot for the blood sugar level, it gives the following error. Someone, please let me know if my code is wrong, thank you!
TypeError Traceback (most recent call last)
<ipython-input-46-c8683c070260> in <module>
1 #plt.hist(blood_sugar,bins=3)
2 plt.hist(blood_sugar,bins=3,rwidth=0.4)
----> 3plt.xlabel(blood_sugar)
4 plt.ylabel(Number_patients)

TypeError: 'str' object is not callable

The code I wrote is :

plt.hist(blood_sugar,bins=3,rwidth=0.4)
plt.xlabel(blood_sugar)
plt.ylabel(Number_patients)
 

Raja Siddharth

New Member
Sir My code is running but still not showing the output.(I tried restarting the kernel)
Task - To replace all the "Nan" values of column "FATAL FLAG" with " No".
Code - "Data2['FATAL_FLAG'].fillna(value = 'No',inplace = True)".(Data2 is the name of the DataFrame.)
Its from the self learning assignment (Pandas Assignment 1). In the Demo video the similar code was running and showing the required output.
Looking forward for your response.
 

_12759

Active Member
Hi Rahul,

I have a question on datetime formats as below:

1. I have a date stored in my DF in int64 format. When I plot this date without converting it to 'Datetime' format the graph looks like this below:

[In]: df["Date_month_year"].value_counts().plot();

upload_2020-5-2_7-18-33.png

2. When I change the same date field in 'DateTime' format, the graph looks like this:

upload_2020-5-2_7-19-28.png


So what I infer from this is that , when plotting dates, we should always convert them into 'Datetime' format. otherwise the graph shown would not be correct.

Please confirm my understanding.

Thanks and Regards,
Shreha.
 

Souptik Saha

New Member
I am unable to solve these two parts:

Read HR_Employee_Attrition_Data.csv file as pandas DataFrame

1. ## Find the Department where Attrition rate is highest

2. ## For people who Travel Rarely and from Sales department what is the average daily rate?
 

Rahul_Aggarwal

Active Member
Alumni
Hi,
The ecommerce purchases file in Exercises Folder in Session 6 is a text file,But we need csv for Assignment.Can anyone help?
Hello Payal,
You still can ingest a text file and create a pandas data frame.

Option 1 : Change the separator from ',' to '[space]'.
data = pd.read_csv('filename.txt', sep=" ", header=None)

Option 2 : fwf stands for fixed width formatted lines
df = pd.read_fwf('output_list.txt')

I would also encourage all of you to start google your queries. StackOverflow and stackexhcnage are good websites to get your queries resolved. This will not help you now but later as a Data Scientist professional :)

Hope this helps. Happy Learning!

Regards,
Rahul A
 

Rahul_Aggarwal

Active Member
Alumni
I am unable to solve these two parts:

Read HR_Employee_Attrition_Data.csv file as pandas DataFrame

1. ## Find the Department where Attrition rate is highest

2. ## For people who Travel Rarely and from Sales department what is the average daily rate?

Souptik, has this been resolved ?
 

Rahul_Aggarwal

Active Member
Alumni
Hi Rahul,

I have a question on datetime formats as below:

1. I have a date stored in my DF in int64 format. When I plot this date without converting it to 'Datetime' format the graph looks like this below:

[In]: df["Date_month_year"].value_counts().plot();

View attachment 9340

2. When I change the same date field in 'DateTime' format, the graph looks like this:

View attachment 9341


So what I infer from this is that , when plotting dates, we should always convert them into 'Datetime' format. otherwise the graph shown would not be correct.

Please confirm my understanding.

Thanks and Regards,
Shreha.

Yes Shreha, that's the correct understanding. Plot -1 is visualizing the data incorrectly because the sequence of dates didn't come properly. As you could see from the plot that some of the June dates are coming prior to May month. This issue gets resolved when you convert the data field in date-time format. However, you could still plot the data correctly without converting into datetime format - As I demonstrated you guys in Matplotlib session.

Regards,
Rahul A
 
hi Guys,
could you please tell me how to Create a new dataframe from existing dataframe to view only the the data that meets a specific condition in column?for eg the where column value = yes.
 

_12759

Active Member
Yes Shreha, that's the correct understanding. Plot -1 is visualizing the data incorrectly because the sequence of dates didn't come properly. As you could see from the plot that some of the June dates are coming prior to May month. This issue gets resolved when you convert the data field in date-time format. However, you could still plot the data correctly without converting into datetime format - As I demonstrated you guys in Matplotlib session.

Regards,
Rahul A


Thanks a lot Rahul for your reply. :)

Regards,
Shreha.
 

K ASHOK_1

New Member
Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together.
we have to find non-holiday vs holiday or which holiday have impact[holiday name]?


Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation.
we have find maximum standard deviation of store or the coefficient of mean to standard deviation?
what is the coefficient of mean to standard deviation?
 

Renuka Badduri

Active Member
Hi All,

While installing "conda install -c anaconda graphviz" through Anaconda Command prompt, it will give an error like
"EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
environment location: C:\ProgramData\Anaconda3
".

To overcome above error, Open Anaconda Cmd window - Run as administrater. Later use above installation code then it will work successfully.

Regards,
Renuka
 

Renuka Badduri

Active Member
Hi Rahul,

In the whitepaper saw below points:
1. First image inner circle mentioned as "DT" but I think it supposed to be "Deep Learning(DL)".
2. In the Types of ML algorithms didn't give 3rd type i.e., "Reinforcement Algorithm" and related details.

Regards,
Renuka
 
@rahul, I have finished 70% of the first project . over that project i am facing problem with EDA . How can i improve that ,bcz its very important to learn and w/o EDA i will not be a good Data scientist . Please help me with your experience
 
Hi Rahul,

Attaching the file from statistics session. Here, under 1 sample t-test, my t-stats value is coming incorrect i.e. -367. Please see the line output[36]. Can you please help me with this.
 

Attachments

  • 1. Statistics Session - 3 Hypothesis Testing.pdf
    201.4 KB · Views: 14

PAYAL_27

Member
Which store/s has good quarterly growth rate in Q3’2012?
Anyone found its solution...I couldn't write its complete code..Can anyone help?
 

_12759

Active Member
Hi Rahul,

Could you please help me as to how to access the columns when we unstack, after groupby.

sales_Q = df_sales.loc[(df_sales['quarter'] == '2012Q2') | (df_sales['quarter'] == '2012Q3') ]
sales_Q


Store Date Weekly_Sales Holiday_Flag Temperature Fuel_Price CPI Unemployment date_new quarter
113
1 06-04-2012 1899676.88 0 70.43 3.891 221.435611 7.143 2012-04-06 2012Q2
114 1 13-04-2012 1621031.70 0 69.07 3.891 221.510210 7.143 2012-04-13 2012Q2
115 1 20-04-2012 1521577.87 0 66.76 3.877 221.564074 7.143 2012-04-20 2012Q2
116 1 27-04-2012 1468928.37 0 67.23 3.814 221.617937 7.143 2012-04-27 2012Q2

# Now I am trying to do a groupby store and quarter, on the above dataset, and sum the sales as below:

Qtr3_sales = pd.DataFrame(sales_Q.groupby(['Store','quarter'])['Weekly_Sales'].sum().unstack()).reset_index().rename_axis(None, axis=1)
Qtr3_sales

Store 2012Q2 2012Q3
0
1 20978760.12 20253947.78
1 2 25083604.88 24303354.86
2 3 5620316.49 5298005.47
3 4 28454363.67 27796792.46

Now, in order to subtract the sales for both the quarters, I am trying to access the above columns. But it is giving me an error.

Qtr3_sales['growth'] = Qtr3_sales['2012Q3'] - Qtr3_sales['2012Q2']

KeyError: '2012Q3'


How can I access my columns '2012Q2', '2012Q3' in my dataframe above, after I have done an Unstack() ????

Please help. I am stuck on this for hours now. Tried various things but to no avail. Any pointers would be greatly appreciated.

Thanks and Regards,
Shreha.
 

Renuka Badduri

Active Member
Hi Rahul,

I am also facing same issue with finding Quarter sales after getting individual quarters. How do extract Quarters data specifically?

Q2_data=Walmart_data.groupby(['Store','Quarter'])['Weekly_Sales'].sum().sort_v
Q2_data.head(20)

Store Quarter
20 2010Q4 32573122.65
14 2010Q4 31925970.95
4 2011Q4 31478429.92
20 2011Q4 31152336.70
4 2010Q4 30889467.22
13 2010Q4 30248200.77
10 2010Q4 30151714.85
13 2011Q4 29747837.83
2 2010Q4 29673875.52
14 2011Q4 29437064.94
2010Q2 29039924.35
4 2011Q3 28888165.93
2012Q2 28454363.67
10 2011Q4 28282114.59
20 2011Q3 28196615.83
4 2012Q1 27930310.30
2012Q3 27796792.46
20 2012Q2 27524197.32
14 2011Q3 27509662.21
13 2011Q3 27499449.78


Regards,
Renuka
 

_12759

Active Member
Hi Rahul,

I am also facing same issue with finding Quarter sales after getting individual quarters. How do extract Quarters data specifically?

Q2_data=Walmart_data.groupby(['Store','Quarter'])['Weekly_Sales'].sum().sort_v
Q2_data.head(20)

Store Quarter
20 2010Q4 32573122.65
14 2010Q4 31925970.95
4 2011Q4 31478429.92
20 2011Q4 31152336.70
4 2010Q4 30889467.22
13 2010Q4 30248200.77
10 2010Q4 30151714.85
13 2011Q4 29747837.83
2 2010Q4 29673875.52
14 2011Q4 29437064.94
2010Q2 29039924.35
4 2011Q3 28888165.93
2012Q2 28454363.67
10 2011Q4 28282114.59
20 2011Q3 28196615.83
4 2012Q1 27930310.30
2012Q3 27796792.46
20 2012Q2 27524197.32
14 2011Q3 27509662.21
13 2011Q3 27499449.78


Regards,
Renuka


Hi Renuka,

First of all for the quarter growth we need to filter out data only for Q3’2012. Hence as discussed in class with Rahul, we will filter the data for 2012 Q3 and 2012 Q2. Then we will add the total sales for these 2 quarters for every store and subtract it. : 2012 Q3 - 2012Q2

Then after subtracting, we will divide it with 2012Q2. This will give us the rate of growth.

In your data above, right now you have data for all the quarters of all the years. Please filter it first and then proceed.

This is my understanding for the question. Please correct me if I am wrong. I am also working in the same project right now,

Regards,
Shreha.
 

Renuka Badduri

Active Member
Hi Renuka,

First of all for the quarter growth we need to filter out data only for Q3’2012. Hence as discussed in class with Rahul, we will filter the data for 2012 Q3 and 2012 Q2. Then we will add the total sales for these 2 quarters for every store and subtract it. : 2012 Q3 - 2012Q2

Then after subtracting, we will divide it with 2012Q2. This will give us the rate of growth.

In your data above, right now you have data for all the quarters of all the years. Please filter it first and then proceed.

This is my understanding for the question. Please correct me if I am wrong. I am also working in the same project right now,

Regards,
Shreha.

Hi Shreha,

Thanks much for helping here. Let me try again.

Regards,
Renuka.
 

PAYAL_27

Member
Hi Renuka,

First of all for the quarter growth we need to filter out data only for Q3’2012. Hence as discussed in class with Rahul, we will filter the data for 2012 Q3 and 2012 Q2. Then we will add the total sales for these 2 quarters for every store and subtract it. : 2012 Q3 - 2012Q2

Then after subtracting, we will divide it with 2012Q2. This will give us the rate of growth.

In your data above, right now you have data for all the quarters of all the years. Please filter it first and then proceed.

This is my understanding for the question. Please correct me if I am wrong. I am also working in the same project right now,

Regards,
Shreha.
But how to filter the data?as when we use DataFrame Columns converted into index...
 
Top