Programming Basics and Data Analytics with Python | Anand | June 13 - July 18

Discussion in 'Big Data and Analytics' started by Raghavendra B M, Jun 13, 2020.

  1. Raghavendra B M

    Raghavendra B M Active Member
    Simplilearn Support

    Joined:
    Jan 6, 2020
    Messages:
    40
    Likes Received:
    30
    #1
  2. _80787

    _80787 Member

    Joined:
    May 30, 2020
    Messages:
    2
    Likes Received:
    0
    Dear Raghav,
    Somehow my webex session got closed and try join again but it is not possible?
    upload_2020-6-13_21-52-56.png
    Restart of system then getting like this but not connecting
    upload_2020-6-13_22-7-32.png

    Solution found my-self
    I tried with IE 11 and start using Java mode worked for me to reconnect WebEx meeting
     
    #2
    Last edited: Jun 13, 2020
  3. _80787

    _80787 Member

    Joined:
    May 30, 2020
    Messages:
    2
    Likes Received:
    0
    #3
  4. Sumit Patel_2

    Sumit Patel_2 New Member

    Joined:
    Jun 10, 2020
    Messages:
    1
    Likes Received:
    0
    dear Raghav ,


    i have not assigned path for jupyter which is a problem i guess so if the file will not get save in the path folder,,, so i want to assign path tell me how if you know without re installing
    included in anaconda ?
    what i need to do like re install or what?/ upload_2020-6-14_19-11-13.png
     
    #4
  5. Aparisim Saha

    Aparisim Saha Active Member

    Joined:
    Mar 9, 2020
    Messages:
    19
    Likes Received:
    1
    Hi Anand,

    Hope you are doing good.

    please help to share the session-2(14.06.2020) jupiter notebook in the google drive. so that i can start practicing.
    Its a simple suggestion , if you could share the jupiter notebook of each class before closing the session. i think this will be more feasible for us to work on that codes from the next morning itself.

    Dont't take it otherwise, if a simple suggestion only.

    Regards,
    Aparisim Saha
     
    #5
    Last edited: Jun 15, 2020
    VIPIN KUMAR_11 likes this.
  6. Anoop Tiwari_1

    Joined:
    Feb 15, 2020
    Messages:
    6
    Likes Received:
    0
    Hi Anand,

    During first session you taught us how to delete a code in Jupyter. I followed the same method i.e enter esc and then Ctrl Delete, However jupyter is asking me to define a location to save the code. And it is not getting deleted. Kindly help
     
    #6
  7. Aparisim Saha

    Aparisim Saha Active Member

    Joined:
    Mar 9, 2020
    Messages:
    19
    Likes Received:
    1
    Same going for me as well. not able to delete cell.
     
    #7
  8. Aparisim Saha

    Aparisim Saha Active Member

    Joined:
    Mar 9, 2020
    Messages:
    19
    Likes Received:
    1
    HI Anand,
    Still the juputer notebook for session-2(14.06.2020) has been saved in google drive.
    kindly do the needful.

    Thanks
     
    #8
  9. _80580

    _80580 New Member

    Joined:
    May 30, 2020
    Messages:
    1
    Likes Received:
    0
    Refer : From Chapter one theory and home work : Section 5 Practical example descriptive statistics : Practical example. Descriptive statistics_exercise.xlxs > says +ve skew : Task 9: "We will only comment on the skew, as it is a bit tougher. The skew is right (positive). This means that most aparments are relatively cheap with a tiny portion that is more expensive."

    But as per my understanding skew will be understood from center of the symmetry. most apartments are relatively costlier isnt it??
    can you please help me to understand this Homework
     
    #9
  10. Aparisim Saha

    Aparisim Saha Active Member

    Joined:
    Mar 9, 2020
    Messages:
    19
    Likes Received:
    1
    can this bee done today
     
    #10
  11. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
    Dear Aparisim,
    As i mentioned in the class last week, i had already added the necessary notebooks into the "Python Getting Started Folder" even before the class. I also mentioned that the notebooks created during the class work is just for solving some examples before the class and all the examples are already in the original notebooks i had shared.
    While i have added the classwork notebooks as well now, i suggest, you should focus on the original notebooks i had shared.

    Hope this helps
     
    #11
  12. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
    Dear Aparisim,
    Last week, we covered till lists. All the necessary examples about lists and all other data types are already there in the notebooks i shared.
    Please take a look at them at least once and you will then understand what i am saying.
    The name of all the notebooks for this course will start with SL-13JUN18JUL2020. Please do not allow that to confuse you
     
    #12
  13. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
    Hi can you please explain the source of your problem a little more clearly. Are you referring to some notebook or are you referring to material in simplilearn?
     
    #13
  14. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
    Dear Anoop,
    When you want to delete a line in Jupyter,
    1. first go to that specific line in jupyter
    2. press Esc key. you will now see the color of the selection changing from green to blue
    3. then press crtl + d ==> to delte
    4. if the above is not working, it means that ctrl + d is mapped to something else in your system
    5. go to keyboard shortcuts in jupyter and you can reset the keys to what you want

    6. please watch the video in the first session again
     
    #14
  15. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
    Hi Sumit,
    From your screen shot, its clear to me that you have installed Jupyter . The screen shot is actually your directory or folder where Jupyter installation has happened by default.

    As mentioned in the class , please follow the below

    1. Go to C:/users/(you will see a folder with your name ) / ==> this is the actual folder whose contents are displayed in your screen shot
    2. Now, if you had already created the folder called "My First Python code" then
    3. go to google drive, download your jupyter notebooks to the folder above
    4. If you have not created the folder in steps 2 and 3, then create a new folder in the same director
    5. now, go to google drive and download the jupyter notebooks (as shown in the video)
    6. the download process will ask you for a folder to save the jupyter notebooks
    7. you can save it in the folder in step 2
    8. now open jupyter as mentioned in video
    9. it will open a browser and show the contents of working directory as mentioned in your screen shot
    10. you will see your folder and also the jupyter notebook there


    please watch the video one more time to understand this better
     
    #15
  16. VIPIN KUMAR_11

    Joined:
    Oct 10, 2019
    Messages:
    3
    Likes Received:
    0

    Good Suggestion
     
    #16
  17. _80262

    _80262 Member

    Joined:
    May 27, 2020
    Messages:
    2
    Likes Received:
    0
    Hello Anand,
    Last week (13.06.2020) you have explained about Jupyter root node setting. I could not able to follow that.

    Can you please how to setup Jupyter Note book root node & its purpose?

    Thanks
    Senthil
     
    #17
  18. _80262

    _80262 Member

    Joined:
    May 27, 2020
    Messages:
    2
    Likes Received:
    0
    Hello Anand S,

    Can you please help me to understand the purpose and difference between below two methods in Python
    - POP Method
    - POP ITEM Method

    Thanks
    Senthil
     
    #18
  19. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
    Hi All,
    Fyi. The notebooks used in the class have been saved in google drive. kindly check and confirm
     
    #19
  20. Aparisim Saha

    Aparisim Saha Active Member

    Joined:
    Mar 9, 2020
    Messages:
    19
    Likes Received:
    1
    break, pass , continue, file and exception handling(finally part)
    you are working on same notebook of 20th June before you jumped in to functions.
    do share that also.
    pls share that
     
    #20
    Last edited: Jun 22, 2020
  21. Anoop Tiwari_1

    Joined:
    Feb 15, 2020
    Messages:
    6
    Likes Received:
    0
    Hi Anand in below mentioned dictionary, i am not able to delete key 'k4'. Is there any specific reason for that when in fact we have been taught that dictionaries are mutable?
    mydict={'k1':[1,2,3],'k2':199,'k3':{'k1':19990,'k2':'a','k3':[1,2,3]},'k4':'i am string','k1':19289,'k1':1010101}

    the command i used is del mydict['k4']. I am able to delete all the keys except 'k4'.
     
    #21
  22. Anoop Tiwari_1

    Joined:
    Feb 15, 2020
    Messages:
    6
    Likes Received:
    0

    Hi Anand i have checked my keyboard shortcuts and it appears the it is set to ctrl+del only but still i am not able to delete the entire row. Attached is the snap shot for your reference. Please let me know if i have to enable any feature in my computer.
     

    Attached Files:

    #22
  23. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
    Anoop,
    press Esc Key
    then press ctrl + d
    for deleting a row.

    what you are trying is ctrl + del , what i am saying is ctrl + d


    please check this link for setting keyboard shortcuts in jupyter

    https://jupyter-notebook.readthedoc...o the ``Help,several shortcuts attached to it.
     
    #23
  24. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
    shared. please check in the classwork folder in gdrive..

    also, as i keep mentioning, every example you see in class work , is also the specific notebooks. please refer to that as well
     
    #24
  25. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
     
    #25
  26. Priyanka TC_1

    Priyanka TC_1 Customer
    Customer

    Joined:
    Mar 19, 2020
    Messages:
    1
    Likes Received:
    0
    Hi Anand,

    I've registered for the class spanning from june-13th to july-18th, but missed to attend the last two weekends due to unavoidable circumstances. Anyhow i have gone through all the 4 classes recordings. But would like to understand the procedure for procuring certificate for this course as it mentioned "Complete 1 class ( Attend 80% or more sessions of 1 class )". So am i still eligible to get this certificate as my percentage class attendance goes below expectation. Please confirm.

    Regards,
    Priyanka
     
    #26
  27. _79419

    _79419 Member

    Joined:
    May 19, 2020
    Messages:
    6
    Likes Received:
    0
    Hello Anand,

    As per my understanding 'with open ' will be retaining the File open status until we close as file.close(). If so in the screenshot attached, Why the code block 1 highlighted with Red is prompted as error but Block 2 when I split the code, it works fine. Whats the reason for the same? Can you help understanding the same?

    Regards,
    SathishKumar, Sakthivel
     

    Attached Files:

    #27
  28. _79961

    _79961 New Member

    Joined:
    May 25, 2020
    Messages:
    1
    Likes Received:
    0
    Short cut key to delete a cell ---> Esc , D, and D again
     
    #28
  29. Tarun Khanna

    Tarun Khanna New Member

    Joined:
    Apr 14, 2020
    Messages:
    1
    Likes Received:
    0
    Hi i want to join this batch. How can I join it?
     
    #29
  30. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25


    Hi,
    As clearly mentioned during the class, the default mode for any file open is read mode. So, in the first block, when the file is opened in read more, you are trying to write into that file. So, you get an error, which is very clear stating "cannot write"

    on the second block , since i am only reading from the file, it works.

    Please go over the recordings in line with going over the notebooks. you will understand
     
    #30
  31. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25

    Priyanka,
    Thank you for writing back. Yes, for the certificate, you need to attend 80% of the weekend classes. since you have less, than this, you might need to check with the simplilearn operations team on what can be done. you could call the simplilearn help line and explain your problem, so that they can provide some other option
     
    #31
  32. _79494

    _79494 Member

    Joined:
    May 20, 2020
    Messages:
    6
    Likes Received:
    1
    approach_for_sizeofApp.png approach1.png approach2.png Hello Anand,

    Hope you are doing great!
    I have some doubts regarding the project.
    For app rate predictor, there are no clear instructions as to what approach to follow for apps with size 'Varies with Device'
    So i tried with two approaches
    1-> Replaced all records of 'Size' with 'Varies with Device' to 'NaN' and dropped them from further analysis.
    2-> Replaced all record of 'Size' with 'Varies with Device' to 'MEAN OF THE SIZE RECORD'
    Which would be the best approach?

    Also, my final predicted score seems to be pretty low ( for same number 'random_state' in each case for train_test_split of 70-30) with both approaches.

    Are there any other regression technique / in general other prediction techniques to increase my prediction?

    Thanks in advace,
    Balachandar


    approach_for_sizeofApp.png approach1.png approach_for_sizeofApp.png approach1.png approach2.png approach_for_sizeofApp.png approach1.png approach2.png
     
    #32
    Last edited: Jul 3, 2020
  33. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
     
    #33
    Rahul_1078, _79494 and _76960 like this.
  34. _79494

    _79494 Member

    Joined:
    May 20, 2020
    Messages:
    6
    Likes Received:
    1
    Hello Anand,

    Thanks for your inputs.
    I tried to now use imputation approach 5 suggested by you, with median as my strategy. {I explored further into the links that you shared, they seem a lot interesting :) }
    This approach seems to reduce my final MSE for the linear regression model by at least 0.04 in each of my test and train data.

    However, my r2 score also has reduced. What does this mean? Hopefully this does not mean that my model is bad.
    Also, what is an acceptable r2 score for our problem description?

    I tried my luck with a few models like SVM , Logistic Regression etc., here are how the results look like -


    My another question would be, how to select the model(s) based on the problem description / data set analysis.

    Thanks,
    Balachandar
     

    Attached Files:

    #34
    _77025 likes this.
  35. _77025

    _77025 Member
    Alumni

    Joined:
    Apr 27, 2020
    Messages:
    3
    Likes Received:
    0
    Hello Anand,

    Good Morning. Hope you are well

    I tried to solve the Practice Problem (Bike-Sharing Demand Analysis). I have few doubts in that. Please help me to understand below points:

    1. What if the number of casual + registered bikes were not equal to total count. How to find which row were junk values

    2. The distribution in windspeed was not normalized (No proper bell curve) why we did not remove the outliers as it might cause in correct prediction


    3. For Bivariate Analysis, we see that box plot of cnt v/s hour has outliers. Why we did not remove that.


    Similarly, we had outliers for cnt vs mnth and cnt/season


    4. As a part of Data preprocessing, why we did not club the values for weekday as it showed data which are very similar

    Please find the attached document for your reference

    Regards,
    Arun Mathew
     

    Attached Files:

    #35
  36. _79419

    _79419 Member

    Joined:
    May 19, 2020
    Messages:
    6
    Likes Received:
    0
    Hello Anand,

    It seems the Classwork we worked out on Sunday (5th Jul) was not still uploaded to Google drive. Can you please share it across and acknowledge.
     
    #36
  37. Pinak Das

    Pinak Das Member

    Joined:
    Oct 15, 2019
    Messages:
    5
    Likes Received:
    0
    Hi Anand ,

    Hope you are doing fine .
    I have few queries regarding Playstore Project

    1 . Column "Reviews" , "Price" are of dtype 'object' . I'm trying to change it to 'int' using astype(int) , but error showing "ValueError: invalid literal for int() with base 10: '3.0M'" .
    Please see my syntax - dfplay['Reviews'] = dfplay['Reviews'].astype(int)

    Can you help me where I'm doing wrong ?

    2. In the project it stated "Drop records with nulls in any of the columns" - As discussed in class Dropping values would lead to data loss so Here should I drop the null values or do the imputation ?


    Regards
    Pinak
     
    #37
    Last edited: Jul 7, 2020 at 2:16 AM
  38. Aparisim Saha

    Aparisim Saha Active Member

    Joined:
    Mar 9, 2020
    Messages:
    19
    Likes Received:
    1
    Hi Anand,
    please share the class notebook for 5th july.
     
    #38
  39. _79419

    _79419 Member

    Joined:
    May 19, 2020
    Messages:
    6
    Likes Received:
    0
    Hi Raghav,
    Can you please help in getting the Classwork file of 5th Jul. Seems its not uploaded to G.Drive yet.
     
    #39
  40. _77025

    _77025 Member
    Alumni

    Joined:
    Apr 27, 2020
    Messages:
    3
    Likes Received:
    0
    Hello Anand,


    Do we have any inbuilt method for shifting the values of the row by one cell.


    In the Project we have one row where data has to be shifted. Please find the attached screenshot for the same
     

    Attached Files:

    #40
  41. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
    This is a case where the column values are incorrectly positioned and hence needs to be shifted. not the rows.


    Please remember that the entire data fraem is made of row and column indexes. so, there is no specific inbuilt method to move one column to another because, the movement can be done easily by using an index and a function.

    That said, in this case, you cannot just directly apply a function or method, because, the data in the first column needs to be split. so, we need to treat this row separately.


    my suggestion is that, remove this row from the data frame and proceed with others. when you have completed the others, add this row back into the data frame. dont waste your time on this own row alone.
     
    #41
  42. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25


    its placed in google drive under the folder all classworks. kindly check the last 2. the name may exactly not be july4 and 5th. you will see 2 notebooks for each of sat urday and sunday
     
    #42
  43. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25

    Done
     
    #43
  44. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25



    Pinak,
    As we discussed in the class, int function will only convert a string to integer if the contents within that string are digits.

    if you are trying to convert a string "3.0M" to an integer, don't you think "M" is an alphabet and you first need to take care of removing that before converting.

    Please make sure that you go over the concepts of string and integers before jumping into the project.


    What you need to do is the below

    1. use string processing methods like split, or replace, 3.0M and then process integer separately.
     
    #44
  45. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
    2. In the project it stated "Drop records with nulls in any of the columns" - As discussed in class Dropping values would lead to data loss so Here should I drop the null values or do the imputation ?


    Yes, you are correct. so, if you do not want to drop the columns directly, then start working on imputing the missing values.
     
    #45
  46. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    190
    Likes Received:
    25
    Please see my answers inline

    I tried to solve the Practice Problem (Bike-Sharing Demand Analysis). I have few doubts in that. Please help me to understand below points:

    Anand : - Glad that you are working on this problem. kindly also work on the project.


    1. What if the number of casual + registered bikes were not equal to total count. How to find which row were junk values

    Anand : - There could be a possibility. But, as we discussed in the class, there is no direct way to find junk values. Please follow the steps we worked in the class
    - Identify missing values
    - To try and find out , if there are junks , check if the columns casual and registered are Isnumeric(). this will ascertain whether the values are numbers. any non numeric values can be extracted and processed separately


    2. The distribution in windspeed was not normalized (No proper bell curve) why we did not remove the outliers as it might cause in correct prediction
    Anand : - If you insist on working only on normalized data, then there is no need for data science. all data in this world is generally
    skewed. Non normalized data is not bad data. It only helps you to be aware that, while you are analyzing, the analysis may not always be 100 % accurate. Dont worry about why you were not given the normal data. Worry about what you are going to do with the data given to you.


    3. For Bivariate Analysis, we see that box plot of cnt v/s hour has outliers. Why we did not remove that.
    Anand : - We clearly discussed in the class that outliers are actually good data. and you cannot blindly remove them.
    But, you need to treat outlier separately. If someone has not done it, it could be because of the business need. The example of bike sharing was given to make you comfortable with the process of analysis. you can do your best to make it perfect further.



    Similarly, we had outliers for cnt vs mnth and cnt/season
    Anand - Asked and Answered


    4. As a part of Data preprocessing, why we did not club the values for weekday as it showed data which are very similar
    Anand - As i said, simplilearn might have given this example for you to understand what data analytics is . So, kindly improve it, if you feel, you can do so.



    Please find the attached document for your reference

    Regards,
    Arun Mathew[/QUOTE]
     
    #46
  47. _79419

    _79419 Member

    Joined:
    May 19, 2020
    Messages:
    6
    Likes Received:
    0
    Dear Anand,

    Yes we could able to see 2 file, But the issue here is - Jul 4 notebook is updated only until Saturday's session and the topics we discussed on Sunday ( Groupby , Boxplot and so on ) are missing. ie., Latest version file is not updated still in G.Drive.

    Request you to kindly verify and do the needful. Thank you in advance.
     
    #47
  48. Aparisim Saha

    Aparisim Saha Active Member

    Joined:
    Mar 9, 2020
    Messages:
    19
    Likes Received:
    1
    Hi Anand,
    5th july part of pandas class work is not there.
    you have worked on the 4th july class notebook.
    kindly check.
     
    #48
  49. _79419

    _79419 Member

    Joined:
    May 19, 2020
    Messages:
    6
    Likes Received:
    0
    Dear Anand,
    In the project requirement 4.1

    4. Variables seem to have incorrect type and inconsistent formatting. You need to fix them:

    4.1 Size column has sizes in Kb as well as Mb. To analyze, you’ll need to convert these to numeric.

    a. Extract the numeric value from the column

    b. Multiply the value by 1,000, if size is mentioned in Mb

    >> In the Source file, we have 3 type of values in Size column ( xxM ,xxk, Varies with device ). So while I treat the MB values and convert to numeric, What should I do for records with values 'Varies with device' After treating and converting to numeric we will have blank values for this. So should I ignore from further processing or to have thw blank values as such.
     
    #49
  50. Aparisim Saha

    Aparisim Saha Active Member

    Joined:
    Mar 9, 2020
    Messages:
    19
    Likes Received:
    1
    i am using the below syntex for split the size columns. so that i can covert to numeric.

    playstore['Size'].split(sep='M')

    getting the below error.
    AttributeError: 'numpy.int64' object has no attribute 'split'

    few of the records in Size columns has string values like(Varies with device)
     
    #50

Share This Page