Data Science with Python | Rahul Aggarwal

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Apr 21, 2020.

  1. Nishant_Singh

    Nishant_Singh Well-Known Member
    Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    258
    Likes Received:
    71
    #1
    BAJRANG GUPTA likes this.
  2. Surendra Prabhakar Joshi

    Joined:
    Jul 24, 2019
    Messages:
    11
    Likes Received:
    1
    Nishant - My attendance of 20th April class is not marked . Could you check and update please?
     
    #2
  3. rahul91.aggarwal

    rahul91.aggarwal Active Member

    Joined:
    Apr 1, 2020
    Messages:
    16
    Likes Received:
    9
    Hello All,

    Content for Today's class (Session -4) has been uploaded to Google Drive. Request you to please download that before coming to live session. Also, it would be wonderful if you can go through the Numpy self paced videos

    Happy Learning!

    Regards,
    Rahul
     
    #3
  4. Sumayya_1

    Sumayya_1 Member

    Joined:
    Mar 28, 2020
    Messages:
    3
    Likes Received:
    0
    Hi Rahul

    Please find the assignments done
    Can you help me with assignment regarding the filter function and lambda expression,as im unable to find a solution with
     

    Attached Files:

    #4
  5. Surendra Prabhakar Joshi

    Joined:
    Jul 24, 2019
    Messages:
    11
    Likes Received:
    1
    Assignment from session 3 is attached. I am also into solving assignment 2 & 3. Have few doubts. Please find attached assignment 1
     

    Attached Files:

    #5
  6. Surendra Prabhakar Joshi

    Joined:
    Jul 24, 2019
    Messages:
    11
    Likes Received:
    1
    Assignement 2 & 3 session 3
     

    Attached Files:

    #6
  7. rahul91.aggarwal

    rahul91.aggarwal Active Member

    Joined:
    Apr 1, 2020
    Messages:
    16
    Likes Received:
    9
    Dear Learners,

    I've uploaded the following content to Google Drive

    1.) Numpy_Xtra_Mentor_Content.zip - This includes the image processing demonstration and Numpy Basics Jupyter notebook using some visuals .
    2.) Session 5 Mentor Content - Pandas.zip - Pandas Jupyter notebook along with Python E-book and Puzzle for today.

    Agenda for Today's session is to learn Pandas library and different ways to explore the data. I request you to please download the content and go through the Pandas self paced videos before coming to live session.

    Happy Learning!

    Regards,
    Rahul
     
    #7
    Last edited: Apr 24, 2020
    MITESH DANAK(4580) likes this.
  8. BAJRANG GUPTA

    BAJRANG GUPTA Member

    Joined:
    Sep 23, 2019
    Messages:
    2
    Likes Received:
    0
    #8
  9. BAJRANG GUPTA

    BAJRANG GUPTA Member

    Joined:
    Sep 23, 2019
    Messages:
    2
    Likes Received:
    0
    kindly upload session 5 recording
    Thanks & Regards
    Bajrang
     
    #9
  10. rahul91.aggarwal

    rahul91.aggarwal Active Member

    Joined:
    Apr 1, 2020
    Messages:
    16
    Likes Received:
    9
    Dear Learners,

    Hope you're doing great and have been enjoying the course so far! :)

    Just wanted to do a quick recap of what we covered last week and in the subsequent post I will also be sharing the agenda for upcoming week i.e. Week 2 , to ensure that you come prepared for that.

    Week-1 Recap :

    1. Introduction to Data Science : Using Wrangler Case-Study, Intro to Jupyter Notebook/Python
    2. Basics of Python Part 1 : Calculator, Data Types, Comparison & Logical Operators, Branching using if, elif & else
    3. Basics of Python Part 2 : Functions, For Loop, range function, lambda function, Containers - List, Set, Dictionary & Tuples
    4. Advance Python Library : Numpy - Array/Matrix, Built-in-functions, Changing the dimension & Selecting/Searching Element. I also demonstrated the vast applications of Numpy library through Super-Heroes Images datasets.
    5. Advance Python Library : Pandas - Data Ingestion, Pandas Data Frame/Series, Various Operations, Data Traversing and Slicing.

    As week-1 is the foundation block for upcoming sessions therefore, I request everyone to revisit the self-paced or mentor recorded session videos (if you'd like) and must :

    a. Execute the code yourself and understand the working of different functionalities.
    b. Solve Assignments and Practice problems provided by me to deepen your understanding - Submission is not required for these.
    c. Solve and Submit Practice Projects linked in your dashboard - Attached Screenshot for your reference.
    d. Give Quizzes and Knowledge Check linked in your dashboard - Attached Screenshot for your reference.

    I have already uploaded the Week-1 content to Google ShareDrive. Along with cheat sheets, E-Books and Super-Heroes Numpy Demonstration Data/Code.

    Please leverage this discussion form to post your queries and your assigned TA (SME) would give his best to resolve them.

    Regards,
    Rahul A

    upload_2020-4-26_17-0-41.png upload_2020-4-26_16-59-35.png
     

    Attached Files:

    #10
    Sandeep Talwar likes this.
  11. rahul91.aggarwal

    rahul91.aggarwal Active Member

    Joined:
    Apr 1, 2020
    Messages:
    16
    Likes Received:
    9
    Learners,

    Hope you had a relaxing weekend!

    Below is the agenda for current week - This primarily focuses more on topics that are relevant from industry and interviews perspective.

    1. Advance Pandas - Treating missing values, Group By Operation, Import/Export options, Data Summarization & Aggregation
    2. Data Visualization using Matplotlib & Seaborn libraries - Line/Bar/Pie Charts, Histograms, Heat maps etc.
    3. Exploratory Data Analysis (EDA) Case Study - Understanding the data, Data Cleaning, Feature Engineering, Finding relationship b/w variables
    4. Basics Statistics - Central Tendency, Dispersion/Variability, Introduction to Probability, Data Distributions, Central Limit Theorem
    5. Advance Statistics - Hypothesis Testing, Z-test, T-test, P-value, Type -1&2 Error

    Will be uploading the content for today's session before the live class.

    “Anyone who stops learning is old, whether at twenty or eighty. Anyone who keeps learning stays young.” – Henry Ford

    Regards,
    Rahul A
     
    #11
    Sandeep Talwar likes this.
  12. Sevanthy Subramanian

    Joined:
    Jul 22, 2019
    Messages:
    1
    Likes Received:
    0
    My attendance of 21st April class is not marked . Could you check and update please?
     
    #12
  13. Sriram V_2

    Sriram V_2 Member

    Joined:
    Nov 11, 2019
    Messages:
    3
    Likes Received:
    0
    sir unable to login through laptop or lab
     
    #13
  14. Renuka Badduri

    Renuka Badduri Active Member

    Joined:
    Feb 11, 2020
    Messages:
    23
    Likes Received:
    3
    In the current DS market, what kind of projects going on and under what domains?
    which areas of topic we need to stress out to learn and build skills on it?
    As a techinical I know that we need to learn all concepts at One GO for client requirements. However As a fresher in DS, Initially which all topics good to know and practice.
     
    #14
  15. Ramanpreet kaur

    Joined:
    Jul 25, 2019
    Messages:
    12
    Likes Received:
    0
    Hi there,

    I was wondering if someone can help me to convert text files into ipynb format/python .py format. The files provided in google drive are in txt format and I am unable to read them.

    Thanks in advance!
     
    #15
  16. sunny1637

    sunny1637 Member
    Alumni

    Joined:
    Jul 10, 2015
    Messages:
    3
    Likes Received:
    1
    You should open these files in Jupyter Note book. If you still need to use pdf you can download from Juptyer Note book as .pdf
     
    #16
    Rohan Joy Mathew likes this.
  17. PAYAL_27

    PAYAL_27 Member

    Joined:
    Mar 30, 2020
    Messages:
    13
    Likes Received:
    2
    TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'..
    I am getting this error while plotting barplot using Seaborn Library.
    CODE---tips is a built-in Dataset,
    sns.barplot(x=tips.sex , y = tips.total_bill);
     
    #17
  18. Ramanpreet kaur

    Joined:
    Jul 25, 2019
    Messages:
    12
    Likes Received:
    0
    Thanks Sunny for your help!
    It worked. So, we need to download the zipped file, unzip it and then open jupyter notebook and fetch the file from download folder.
     
    #18
  19. VIVEK KUMAR AGARWAL

    VIVEK KUMAR AGARWAL New Member

    Joined:
    Feb 28, 2019
    Messages:
    1
    Likes Received:
    0
    Hi,

    I want to request to pls help me out, as I installed Anaconda in my laptop 3 time but every time after intalltion I am able to open it through start search option one time and after shutting down and opening my laptop I am not able to open it again. Every time I hit Anaconda/Jupyter Note book it pop up cmd window for fraction of secs and then goes off. Then after no respons, due to this I am not able to practise anything
     
    #19
  20. Gurender Kumar Kush

    Alumni

    Joined:
    Sep 12, 2017
    Messages:
    2
    Likes Received:
    0
    How to debug the code line by line in a function
     
    #20
  21. sunny1637

    sunny1637 Member
    Alumni

    Joined:
    Jul 10, 2015
    Messages:
    3
    Likes Received:
    1
    Click only once and wait for almost a minute. Its usually slow to start. It may work as i encountered same problem initially. Hope it works for you
     
    #21
  22. Rohan Joy Mathew

    Joined:
    Feb 17, 2020
    Messages:
    2
    Likes Received:
    2
    If you open the




    Most probably while installing you might have corupted the file. You might have to reinstall anaconda but unlike normal uninstallation you have to delete the root files for the same as well to do a proper uninstallation. The documentation for the uninstallation is this "https://docs.anaconda.com/anaconda/install/uninstall/" follow the steps and redo it again.
     
    #22
  23. _56018

    _56018 Member

    Joined:
    Jan 15, 2019
    Messages:
    3
    Likes Received:
    1
    while trying to label the hist plot for the blood sugar level, it gives the following error. Someone, please let me know if my code is wrong, thank you!
    TypeError Traceback (most recent call last)
    <ipython-input-46-c8683c070260> in <module>
    1 #plt.hist(blood_sugar,bins=3)
    2 plt.hist(blood_sugar,bins=3,rwidth=0.4)
    ----> 3plt.xlabel(blood_sugar)
    4 plt.ylabel(Number_patients)

    TypeError: 'str' object is not callable

    The code I wrote is :

    plt.hist(blood_sugar,bins=3,rwidth=0.4)
    plt.xlabel(blood_sugar)
    plt.ylabel(Number_patients)
     
    #23
  24. PAYAL_27

    PAYAL_27 Member

    Joined:
    Mar 30, 2020
    Messages:
    13
    Likes Received:
    2
    Hi,
    The ecommerce purchases file in Exercises Folder in Session 6 is a text file,But we need csv for Assignment.Can anyone help?
     
    #24
    Gautam Kumar_12 likes this.
  25. Raja Siddharth

    Raja Siddharth New Member

    Joined:
    Feb 28, 2020
    Messages:
    1
    Likes Received:
    1
    Sir My code is running but still not showing the output.(I tried restarting the kernel)
    Task - To replace all the "Nan" values of column "FATAL FLAG" with " No".
    Code - "Data2['FATAL_FLAG'].fillna(value = 'No',inplace = True)".(Data2 is the name of the DataFrame.)
    Its from the self learning assignment (Pandas Assignment 1). In the Demo video the similar code was running and showing the required output.
    Looking forward for your response.
     
    #25
    rahul91.aggarwal likes this.
  26. _12759

    _12759 Active Member

    Joined:
    Sep 22, 2017
    Messages:
    43
    Likes Received:
    5
    Hi Rahul,

    I have a question on datetime formats as below:

    1. I have a date stored in my DF in int64 format. When I plot this date without converting it to 'Datetime' format the graph looks like this below:

    [In]: df["Date_month_year"].value_counts().plot();

    upload_2020-5-2_7-18-33.png

    2. When I change the same date field in 'DateTime' format, the graph looks like this:

    upload_2020-5-2_7-19-28.png


    So what I infer from this is that , when plotting dates, we should always convert them into 'Datetime' format. otherwise the graph shown would not be correct.

    Please confirm my understanding.

    Thanks and Regards,
    Shreha.
     
    #26
    rahul91.aggarwal likes this.
  27. Souptik Saha

    Souptik Saha New Member

    Joined:
    Jan 17, 2020
    Messages:
    1
    Likes Received:
    0
    I am unable to solve these two parts:

    Read HR_Employee_Attrition_Data.csv file as pandas DataFrame

    1. ## Find the Department where Attrition rate is highest

    2. ## For people who Travel Rarely and from Sales department what is the average daily rate?
     
    #27
  28. rahul91.aggarwal

    rahul91.aggarwal Active Member

    Joined:
    Apr 1, 2020
    Messages:
    16
    Likes Received:
    9
    Hello Payal,
    You still can ingest a text file and create a pandas data frame.

    Option 1 : Change the separator from ',' to '[space]'.
    data = pd.read_csv('filename.txt', sep=" ", header=None)

    Option 2 : fwf stands for fixed width formatted lines
    df = pd.read_fwf('output_list.txt')

    I would also encourage all of you to start google your queries. StackOverflow and stackexhcnage are good websites to get your queries resolved. This will not help you now but later as a Data Scientist professional :)

    Hope this helps. Happy Learning!

    Regards,
    Rahul A
     
    #28
  29. rahul91.aggarwal

    rahul91.aggarwal Active Member

    Joined:
    Apr 1, 2020
    Messages:
    16
    Likes Received:
    9
    Souptik, has this been resolved ?
     
    #29
  30. rahul91.aggarwal

    rahul91.aggarwal Active Member

    Joined:
    Apr 1, 2020
    Messages:
    16
    Likes Received:
    9
    Yes Shreha, that's the correct understanding. Plot -1 is visualizing the data incorrectly because the sequence of dates didn't come properly. As you could see from the plot that some of the June dates are coming prior to May month. This issue gets resolved when you convert the data field in date-time format. However, you could still plot the data correctly without converting into datetime format - As I demonstrated you guys in Matplotlib session.

    Regards,
    Rahul A
     
    #30
  31. Deepak Kumar Bawa

    Joined:
    Apr 9, 2020
    Messages:
    2
    Likes Received:
    1
    just give sometime, in my laptop it opens but first command prompt open 3-4 times and disappears.so wait for some time.
     
    #31
    rahul91.aggarwal likes this.
  32. Deepak Kumar Bawa

    Joined:
    Apr 9, 2020
    Messages:
    2
    Likes Received:
    1
    hi Guys,
    could you please tell me how to Create a new dataframe from existing dataframe to view only the the data that meets a specific condition in column?for eg the where column value = yes.
     
    #32
  33. _12759

    _12759 Active Member

    Joined:
    Sep 22, 2017
    Messages:
    43
    Likes Received:
    5

    Thanks a lot Rahul for your reply. :)

    Regards,
    Shreha.
     
    #33
    rahul91.aggarwal likes this.
  34. K ASHOK_1

    K ASHOK_1 New Member

    Joined:
    Apr 8, 2020
    Messages:
    1
    Likes Received:
    0
    Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together.
    we have to find non-holiday vs holiday or which holiday have impact[holiday name]?


    Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation.
    we have find maximum standard deviation of store or the coefficient of mean to standard deviation?
    what is the coefficient of mean to standard deviation?
     
    #34
  35. sandeep kumar singh_1

    Joined:
    Jan 13, 2020
    Messages:
    4
    Likes Received:
    1
    Capture.PNG
    @ Rahul Aggarwal, The content white paper you shared. Its saying Classification is an unsupervised learning, as per my knowledge Its supervised learning technique.Please clear me if i am wrong.
     
    #35
    rahul91.aggarwal likes this.
  36. rahul91.aggarwal

    rahul91.aggarwal Active Member

    Joined:
    Apr 1, 2020
    Messages:
    16
    Likes Received:
    9
    Thanks for calling it out my friend. This is just a typo error.
     
    #36
  37. Renuka Badduri

    Renuka Badduri Active Member

    Joined:
    Feb 11, 2020
    Messages:
    23
    Likes Received:
    3
    Hi All,

    While installing "conda install -c anaconda graphviz" through Anaconda Command prompt, it will give an error like
    "EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
    environment location: C:\ProgramData\Anaconda3
    ".

    To overcome above error, Open Anaconda Cmd window - Run as administrater. Later use above installation code then it will work successfully.

    Regards,
    Renuka
     
    #37
    rahul91.aggarwal likes this.
  38. Renuka Badduri

    Renuka Badduri Active Member

    Joined:
    Feb 11, 2020
    Messages:
    23
    Likes Received:
    3
    Hi Rahul,

    In the whitepaper saw below points:
    1. First image inner circle mentioned as "DT" but I think it supposed to be "Deep Learning(DL)".
    2. In the Types of ML algorithms didn't give 3rd type i.e., "Reinforcement Algorithm" and related details.

    Regards,
    Renuka
     
    #38
    rahul91.aggarwal likes this.
  39. _56018

    _56018 Member

    Joined:
    Jan 15, 2019
    Messages:
    3
    Likes Received:
    1
    Rahul, please post the project for discussion for tomorrow's session(05/12/2020)
     
    #39
    rahul91.aggarwal likes this.
  40. rahul91.aggarwal

    rahul91.aggarwal Active Member

    Joined:
    Apr 1, 2020
    Messages:
    16
    Likes Received:
    9
    Hello Everyone,

    We will solve "Retail Analysis with Walmart Data" project in today's session i.e. 12th May 2020. Request you to please try at your end before coming to session.

    Regards,
    Rahul A
     
    #40
  41. Harish Aseri

    Harish Aseri Member

    Joined:
    Dec 17, 2019
    Messages:
    4
    Likes Received:
    0
    @rahul, I have finished 70% of the first project . over that project i am facing problem with EDA . How can i improve that ,bcz its very important to learn and w/o EDA i will not be a good Data scientist . Please help me with your experience
     
    #41
  42. Kanika Soni

    Kanika Soni Member

    Joined:
    Jul 9, 2019
    Messages:
    7
    Likes Received:
    0
    Hi Rahul,

    Attaching the file from statistics session. Here, under 1 sample t-test, my t-stats value is coming incorrect i.e. -367. Please see the line output[36]. Can you please help me with this.
     

    Attached Files:

    #42
  43. Renuka Badduri

    Renuka Badduri Active Member

    Joined:
    Feb 11, 2020
    Messages:
    23
    Likes Received:
    3
    Hey Rahul,

    Did you uploaded the latest Notebook with pseudo code of project?


    Regards,
    Renuka.
     
    #43
  44. PAYAL_27

    PAYAL_27 Member

    Joined:
    Mar 30, 2020
    Messages:
    13
    Likes Received:
    2
    Which store/s has good quarterly growth rate in Q3’2012?
    Anyone found its solution...I couldn't write its complete code..Can anyone help?
     
    #44
  45. _12759

    _12759 Active Member

    Joined:
    Sep 22, 2017
    Messages:
    43
    Likes Received:
    5
    Hi Rahul,

    Could you please help me as to how to access the columns when we unstack, after groupby.

    sales_Q = df_sales.loc[(df_sales['quarter'] == '2012Q2') | (df_sales['quarter'] == '2012Q3') ]
    sales_Q


    Store Date Weekly_Sales Holiday_Flag Temperature Fuel_Price CPI Unemployment date_new quarter
    113
    1 06-04-2012 1899676.88 0 70.43 3.891 221.435611 7.143 2012-04-06 2012Q2
    114 1 13-04-2012 1621031.70 0 69.07 3.891 221.510210 7.143 2012-04-13 2012Q2
    115 1 20-04-2012 1521577.87 0 66.76 3.877 221.564074 7.143 2012-04-20 2012Q2
    116 1 27-04-2012 1468928.37 0 67.23 3.814 221.617937 7.143 2012-04-27 2012Q2

    # Now I am trying to do a groupby store and quarter, on the above dataset, and sum the sales as below:

    Qtr3_sales = pd.DataFrame(sales_Q.groupby(['Store','quarter'])['Weekly_Sales'].sum().unstack()).reset_index().rename_axis(None, axis=1)
    Qtr3_sales

    Store 2012Q2 2012Q3
    0
    1 20978760.12 20253947.78
    1 2 25083604.88 24303354.86
    2 3 5620316.49 5298005.47
    3 4 28454363.67 27796792.46

    Now, in order to subtract the sales for both the quarters, I am trying to access the above columns. But it is giving me an error.

    Qtr3_sales['growth'] = Qtr3_sales['2012Q3'] - Qtr3_sales['2012Q2']

    KeyError: '2012Q3'


    How can I access my columns '2012Q2', '2012Q3' in my dataframe above, after I have done an Unstack() ????

    Please help. I am stuck on this for hours now. Tried various things but to no avail. Any pointers would be greatly appreciated.

    Thanks and Regards,
    Shreha.
     
    #45
  46. Renuka Badduri

    Renuka Badduri Active Member

    Joined:
    Feb 11, 2020
    Messages:
    23
    Likes Received:
    3
    Hi Rahul,

    I am also facing same issue with finding Quarter sales after getting individual quarters. How do extract Quarters data specifically?

    Q2_data=Walmart_data.groupby(['Store','Quarter'])['Weekly_Sales'].sum().sort_v
    Q2_data.head(20)

    Store Quarter
    20 2010Q4 32573122.65
    14 2010Q4 31925970.95
    4 2011Q4 31478429.92
    20 2011Q4 31152336.70
    4 2010Q4 30889467.22
    13 2010Q4 30248200.77
    10 2010Q4 30151714.85
    13 2011Q4 29747837.83
    2 2010Q4 29673875.52
    14 2011Q4 29437064.94
    2010Q2 29039924.35
    4 2011Q3 28888165.93
    2012Q2 28454363.67
    10 2011Q4 28282114.59
    20 2011Q3 28196615.83
    4 2012Q1 27930310.30
    2012Q3 27796792.46
    20 2012Q2 27524197.32
    14 2011Q3 27509662.21
    13 2011Q3 27499449.78


    Regards,
    Renuka
     
    #46
  47. _12759

    _12759 Active Member

    Joined:
    Sep 22, 2017
    Messages:
    43
    Likes Received:
    5

    Hi Renuka,

    First of all for the quarter growth we need to filter out data only for Q3’2012. Hence as discussed in class with Rahul, we will filter the data for 2012 Q3 and 2012 Q2. Then we will add the total sales for these 2 quarters for every store and subtract it. : 2012 Q3 - 2012Q2

    Then after subtracting, we will divide it with 2012Q2. This will give us the rate of growth.

    In your data above, right now you have data for all the quarters of all the years. Please filter it first and then proceed.

    This is my understanding for the question. Please correct me if I am wrong. I am also working in the same project right now,

    Regards,
    Shreha.
     
    #47
  48. Renuka Badduri

    Renuka Badduri Active Member

    Joined:
    Feb 11, 2020
    Messages:
    23
    Likes Received:
    3
    Hi Shreha,

    Thanks much for helping here. Let me try again.

    Regards,
    Renuka.
     
    #48
    _12759 likes this.
  49. PAYAL_27

    PAYAL_27 Member

    Joined:
    Mar 30, 2020
    Messages:
    13
    Likes Received:
    2
    But how to filter the data?as when we use DataFrame Columns converted into index...
     
    #49
  50. rahul91.aggarwal

    rahul91.aggarwal Active Member

    Joined:
    Apr 1, 2020
    Messages:
    16
    Likes Received:
    9
    #50

Share This Page