Programming Basics and Data Analytics with Python | Anand

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Apr 25, 2020.

  1. Nishant_Singh

    Nishant_Singh Well-Known Member
    Staff Member Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    340
    Likes Received:
    126
    #1
  2. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    201
    Likes Received:
    25
    #2
  3. Fatema ARSIWALA

    Joined:
    Jan 16, 2020
    Messages:
    2
    Likes Received:
    0
    while doing project in class lab i am not able to load my project data in my notebook .i imported pandas and then df=pd.read_excel(r'c:/path/file.xlsx')
     
    #3
  4. Yogendra Meena

    Yogendra Meena New Member

    Joined:
    Mar 10, 2020
    Messages:
    1
    Likes Received:
    0
    I have a data set with 7 lakhs of rows in that, and I have one unique ID column which may have duplicate entries. I would like to have 2 different data set. 1) with dups and 2nd with clean data. Is it possible in Python?
     
    #4
  5. MONALISA DASH

    MONALISA DASH Member

    Joined:
    Nov 11, 2019
    Messages:
    3
    Likes Received:
    0
    Good morning Sir,
    Thank you for the files shared on the google drive. I could find only the classwork of 25th April.Kindly upload the classwork of 26th April.
    Monalisa Dash.
     
    #5
  6. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    201
    Likes Received:
    25
    Thank you for the heads up. I have added the class work for python basics. that note book covers
    -arithmetic
    - variables
    - data types (lists,strings, tuples, dictionaries and sets)
    - we also covered the for loop and control statements (IF,Else)
     
    #6
  7. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    201
    Likes Received:
    25
    Hi All,
    Nishant, who is our TA, confirmed to me that all of you have access to the google drive.
    I have updated the drive with the necessary notebooks. Kindly practice during the week.
     
    #7
  8. Kalindi Dharamsey

    Joined:
    Nov 25, 2019
    Messages:
    14
    Likes Received:
    0
    #8
  9. Kalindi Dharamsey

    Joined:
    Nov 25, 2019
    Messages:
    14
    Likes Received:
    0
    Hello Sir,
    Only initial - first file is on the google drive. Can you please see if you have posted the all classes' Jupyter notebooks. Thanks, Kalindi
     
    #9
  10. Kalindi Dharamsey

    Joined:
    Nov 25, 2019
    Messages:
    14
    Likes Received:
    0
    notebook files for the May classes can not be seen.-Kalindi
     
    #10
  11. varnika vasisth

    Joined:
    Oct 12, 2019
    Messages:
    2
    Likes Received:
    0
    I am trying to print the variables which are less than 50 and are even but I want to use 'and' operator for it. And when I proceed this statement then neither it give any output nor it give any error. Is there anything that I am doing wrong??

    var=0
    while var<50 & var%2==0:
    print("var is even and less than 50",var)
    var=var+1
     
    #11
  12. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    201
    Likes Received:
    25
    the code is perfectly working for me in my jupyter local.

    Are you sure, you are trying with the right path?
     
    #12
  13. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    201
    Likes Received:
    25
    Yes,
    This is doable.

    You should used pandas to find the duplicate unique ids using get_duplicates
    then you can split the datasets into different files.

    if you want to do it in python alone, then you will need to use the logic

    - read first record
    - save unique id in temp variable
    - read next record
    - compare if unique id in first record is same as unique id in second

    you need to iterate the above logic

    simpler option is to use pandas
     
    #13
  14. Gaurav Kilania

    Joined:
    Feb 29, 2020
    Messages:
    4
    Likes Received:
    0
    Hi Anand sir,

    You didn't upload the jupyter workbook which we did this Sunday.
    Topic- File handling
     
    #14
  15. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    201
    Likes Received:
    25
    Please check the google drive, one more time. all the necessary jupyter notebooks have been added to the respective folders , early this week.
     
    #15
  16. anand.s.subramaniam

    anand.s.subramaniam Well-Known Member
    Alumni

    Joined:
    Mar 28, 2018
    Messages:
    201
    Likes Received:
    25
    While Loop , many times fails to iterate if the multiple conditions are conflicting. in this case, since you are trying to increment and check even for the same variable, the while loop stops after the first iteration.

    The best way to code this is as below

    var=0
    while var<50:
    if var%2 == 0:
    print('var is even',var)
    else:
    print('var is odd',var)
    var = var + 1
    else:
    print('var is greater than 50',var)



    There are way to overcomplicate the while statement like you have done and achieve the same result. But, i would prefer following, whats easier and logical.
     
    #16
  17. Sahil Singla

    Sahil Singla New Member

    Joined:
    Apr 20, 2020
    Messages:
    1
    Likes Received:
    1
    Hi Anand,

    I am facing a problem in the python code.
    please see attached screenshot
     

    Attached Files:

    #17
    ssbabu71 likes this.
  18. Arvindh Gunasekar

    Arvindh Gunasekar New Member

    Joined:
    Oct 11, 2019
    Messages:
    1
    Likes Received:
    0
    Can someone guide me how to navigate to ppt which are used in class trainings (Told by peers that its in LMS) how to find ?

    Thanks
    Arvindh G
     
    #18
  19. Gaurav Kilania

    Joined:
    Feb 29, 2020
    Messages:
    4
    Likes Received:
    0
    Anand Sir, can you please upload this Sunday's codes?
     
    #19
  20. Sonali Dhamija

    Sonali Dhamija New Member

    Joined:
    Apr 6, 2020
    Messages:
    1
    Likes Received:
    0
    Built in Function for string - alnum
    i tried the following code but receiving error. please help what wrong code i am typing.?

    str1 = "123456"
    str2 = "abcde"
    str3 = "abc123"
    str4 = "/*%$"
    str5 = "ABcde"
    str6 = "129.36"

    str1.isalnum()

    ---------------------------------------------------------------------------
    NameError Traceback (most recent call last)
    <ipython-input-1-6842788f0ec1> in <module>
    ----> 1str1.isalnum()

    NameError: name 'str1' is not defined
     
    #20
  21. varnika vasisth

    Joined:
    Oct 12, 2019
    Messages:
    2
    Likes Received:
    0
    I don't know but I am still unable to access file handling, and this week notes on google drive link.
     
    #21
  22. Davis Nelaturi

    Joined:
    Feb 25, 2020
    Messages:
    2
    Likes Received:
    0
    the recordings of session 5 n 6 of 10th May and 11th May mr.anand are still not there to download
     
    #22
  23. Joanna Quintero

    Joined:
    Dec 2, 2019
    Messages:
    9
    Likes Received:
    2
    Hi Anand,
    Is this the correct forum for the Python Live Classes April-May 2020?
    I just can find old messages from 2018-2019.... The latest are from February 2020, so I'm afraid that I am in the wrong forum.
    Please let me know before I started posting here my questions.
    Thank you so much
     
    #23
  24. ssbabu71

    ssbabu71 Member
    Alumni

    Joined:
    Jan 20, 2020
    Messages:
    3
    Likes Received:
    0
    put the numbers in side the brackets, it will be recorded, check immediately after creating one string, as print (str1)
     
    #24
  25. ssbabu71

    ssbabu71 Member
    Alumni

    Joined:
    Jan 20, 2020
    Messages:
    3
    Likes Received:
    0
    me too faced the same problem, some other functions after to this also not working. We dont know whether we need to install any packages like in R ?
     
    #25
  26. Harshwardhan Gawande

    Joined:
    May 17, 2020
    Messages:
    2
    Likes Received:
    0
    Hello Anand,
    Please help me to understand how to do log transformation to dataframe (as specified in project). Why we use this log transformation even though we have values. What is the practical use of it.

    Regards,
    Harsh
     
    #26
  27. Sanjeev Kaul

    Sanjeev Kaul Member

    Joined:
    Apr 18, 2020
    Messages:
    3
    Likes Received:
    0
    i am unable to run pyton and jupyter
     
    #27
  28. Harshwardhan Gawande

    Joined:
    May 17, 2020
    Messages:
    2
    Likes Received:
    0
    Hi Anand,

    I need help in project specific to question
    11 . Model building
    • Use linear regression as the technique
    When I did dummy encoding it created almost 194 columns. Now which regression to use here. Linear or Multi-linear Regression and which columns to consider for regression.
     
    #28
  29. Soumi Roy

    Soumi Roy Member

    Joined:
    Feb 24, 2020
    Messages:
    3
    Likes Received:
    0
    Hi Anand
    Did you upload yesterday's ( 17th May) notebook into your google drive ? Could you please help to do it asap as we need to start the assignment.

    Thanks
    Soumi
     
    #29
  30. Shishir Dwarkanath

    Joined:
    Dec 9, 2019
    Messages:
    5
    Likes Received:
    1
    Hi Anand,
    For this
    Size column has sizes in Kb as well as Mb. To analyze, you’ll need to convert these to numeric.

    1. Extract the numeric value from the column

    2. Multiply the value by 1,000, if size is mentioned in Mb
    I tried to replace M with 10**3 and datatype int but got the message (invalid literal for int() with base 10: '19M')

    Next I tried to replace M with 1000, I was successful but I could not convert the type to Object. Now I am stuck with values such as
    2.6*1000.

    Could you please help.

    Thanks
    Shishir
     
    #30
    Sakthi Vijaya Kumar likes this.
  31. Sedric Hibler

    Sedric Hibler Member

    Joined:
    Mar 30, 2020
    Messages:
    14
    Likes Received:
    5
    When I tried to create the price boxplot, I end up with a line, showing basically all data is considered an outlier. By the time I reach this step, majority of the values appear to be 0, and even my quartiles show that. If I removed all the outliers i dont know if that would be the right approach or if this was the expected result?
    upload_2020-5-18_10-16-1.png


    -- just wanted to update this to say that I believe this is probably a correct representation of the box plot based on the data after further review.
     
    #31
    Last edited: May 21, 2020
  32. Firoz Syed

    Firoz Syed Member

    Joined:
    Mar 9, 2020
    Messages:
    14
    Likes Received:
    0
    Hi Anand ,

    we are still waiting for the jupyter notebooks for the yesterday's class (Pandas , plots..). Please share at your earliest convenience as we need to start the project assessment.
     
    #32
  33. Shishir Dwarkanath

    Joined:
    Dec 9, 2019
    Messages:
    5
    Likes Received:
    1
    Hi Anand,

    I have a problem wherein I was able to convert the 'Reviews', 'Installs' and 'Price' into int. However when I perform the info function I see that they are still showing up as object.
    Could you please help.

    Thanks
    Shishir
     

    Attached Files:

    #33
  34. Sedric Hibler

    Sedric Hibler Member

    Joined:
    Mar 30, 2020
    Messages:
    14
    Likes Received:
    5
    Maybe you already have figured out this answer but since nothing was posted i figured I'd share with you and @ssbabu71

    Python uses indentation to identify a block of code that operates under the same condition. The indentation should automatically happen when you delete and retype this functon out, but it seems in your for loop, there is no indent which causes that error. Hitting tab once on all those lines below your for loop will put it in the right indentation level.

    In other words, this is properly indented:
    upload_2020-5-18_12-59-28.png
     
    #34
  35. Sakthi Vijaya Kumar

    Sakthi Vijaya Kumar Active Member

    Joined:
    Apr 27, 2020
    Messages:
    26
    Likes Received:
    6
    Hi Anand,
    I have a doubt in the python project, in data-googleplaystore.colunm.'Size' we have data in size MB and KB and also in 'Varies wirth device'(do we consider this as null?), do we need to drop those columns or how can we assign a size for this. there are 1695 entries with this type. can you pls suggest an idea how to clear this step
     
    #35
  36. Siva Shankar Biswal

    Joined:
    Mar 23, 2020
    Messages:
    3
    Likes Received:
    0
    Hi,

    I am stuck at step number 4(1 and 2) in the project. Could you please help regarding same.
     
    #36
  37. Sakthi Vijaya Kumar

    Sakthi Vijaya Kumar Active Member

    Joined:
    Apr 27, 2020
    Messages:
    26
    Likes Received:
    6
    Hi Shishir,
    Have you tried splitting a column and multiply by 1000??
     
    #37
  38. Sedric Hibler

    Sedric Hibler Member

    Joined:
    Mar 30, 2020
    Messages:
    14
    Likes Received:
    5
    I have the same question as you, as previously I converted them to null. However, if you see my question around the price, I believe it could be causing me some problems with that category. That is because about 76 of these rows have a price, but no size, so removing them may be problematic.
    The only issue is, i am not sure how I would fix the ones with this size just yet. I will groupby on device and a couple fields to see if it makes sense to replace the values, while waiting to hear back on this as well..

    Update: I marked my values as "Nan" in the case of "Varies with device" and decided to update the size values based on the groupby of Android Ver. The fillna function worked for setting them to the mean value. However keeping those rows rather than dropping them didnt help my price column. I personally feel like the dataset is stronger this way than deleting so many 'NaN" values so I believe this to be the best option.
     
    #38
    Last edited: May 20, 2020
  39. Sakthi Vijaya Kumar

    Sakthi Vijaya Kumar Active Member

    Joined:
    Apr 27, 2020
    Messages:
    26
    Likes Received:
    6
    Didyou change the column 'size' M to K by multiplying by 1000? how did you achieve that?
     
    #39
  40. Sedric Hibler

    Sedric Hibler Member

    Joined:
    Mar 30, 2020
    Messages:
    14
    Likes Received:
    5
    Yes I did, by using string functions to look at the last character of the string (x[-1] for example). Then you would just need to see if the last character is M, and if so, remove it and multiply the value by 1000. If its k, you just need to remove the last character.
    Using a function or maybe even just the lambda notation, you should be able to achieve it.Then convert to float. I'd post my code here for an example, but Im not sure that is allowed for the project.

    edit: i dont recall what class we covered these topics but I do think you can find how to create the function on google for sure.
     
    #40
    Last edited: May 20, 2020
    Sakthi Vijaya Kumar likes this.
  41. Sedric Hibler

    Sedric Hibler Member

    Joined:
    Mar 30, 2020
    Messages:
    14
    Likes Received:
    5
    I was curious about your question, but noticed you posted an empty zip file so I couldnt see what you posted. Is it possible to screenshot your code instead? One thing I will say about converting is that all the information in that column has to be numeric (no special characters like comma, $ or words in there) or you will run into issues with that convert.
     
    #41
  42. Sheel Jayesh Patel

    Sheel Jayesh Patel New Member

    Joined:
    Jan 26, 2020
    Messages:
    1
    Likes Received:
    1
    Log transformation helps us remove the skewness from the dataset , as given values are not normally distributed.
     
    #42
    Sedric Hibler likes this.
  43. Joanna Quintero

    Joined:
    Dec 2, 2019
    Messages:
    9
    Likes Received:
    2
    Hi Anand,
    It seems that part 4.1 from the project is a common question with no answers.
    I appreciate if you can help us with converting the Size data from Mb to Kb considering that we have values "varies with device"...
    Please help
    Thank you
     
    #43
  44. Joanna Quintero

    Joined:
    Dec 2, 2019
    Messages:
    9
    Likes Received:
    2
    Did you get help? I'm stuck in the same point
     
    #44
  45. Joanna Quintero

    Joined:
    Dec 2, 2019
    Messages:
    9
    Likes Received:
    2
    Were you able to do it?
     
    #45
  46. Sakthi Vijaya Kumar

    Sakthi Vijaya Kumar Active Member

    Joined:
    Apr 27, 2020
    Messages:
    26
    Likes Received:
    6
    No, I tried but I can't since its dtype is str.
    but we can achieve it by removing like x[-1] is M and multiply by 1000. by using for loop adn if condition. once remove it change the dtype as fload and multiply by 1000. if it is k leave as it is.
     
    #46
    Last edited: May 19, 2020
  47. jeeban Patro

    jeeban Patro Member

    Joined:
    Feb 16, 2016
    Messages:
    3
    Likes Received:
    0
    Hi Anand,
    Good morning!!

    got an error while performing the bellow simple operation .. please kindly help me with the reason.

    mydict = {"a": 102, "b": 222, "c": 322, "d": 422}
    data = list(mydict.items())
    print(data)


    the above code gives the bellow error..

    1 mydict = {"a": 102, "b": 222, "c": 322, "d": 422}
    2
    ----> 3data = list(mydict.items())
    4
    5 print(data)

    TypeError: 'numpy.ndarray' object is not callable
     
    #47
  48. SAURABH MALVI

    SAURABH MALVI Member

    Joined:
    Feb 6, 2020
    Messages:
    3
    Likes Received:
    0
    Hi Sir,

    I am stuck at step 4 as mentioned below.
    1. Size column has sizes in Kb as well as Mb. To analyze, you’ll need to convert these to numeric.
      1. Extract the numeric value from the column
      2. Multiply the value by 1,000, if size is mentioned in Mb
    I wrote the code for the above step and got the output dtype as object as shown below.

    0 19000
    1 14000
    2 8700
    3 25000
    4 2800
    ...
    10834 2600
    10836 53000
    10837 3600
    10839 0
    10840 19000
    Name: Size, Length: 9360, dtype: object

    Now i want to convert the dtype to int. I used the following code and got an error

    df.Size.astype(int)

    ValueError: invalid literal for int() with base 10: '19M'

    I am stuck here. Please help me resolve this error . I need to convert object to integer.
     
    #48
  49. Sedric Hibler

    Sedric Hibler Member

    Joined:
    Mar 30, 2020
    Messages:
    14
    Likes Received:
    5
    and for @Joanna Quintero
    yep you could create a loop, but I accomplished this step by defining my own function that function takes a couple steps.

    First in examining the column, we see some values end with k, some end with M, and then some rows have "Varies with device". Therefore you need 3 potential conditions in an if statement to handle all situations in order to convert them all to float. General next steps:

    1. take the string value from that column as an argument to the function.
    2. read the last letter of the string using the x[-1]
    3. then the function just needs to return a value based on each condition. if its k, you return the string without the last character. If its M remove the last character and return x[-1] * 1000. If its "Varies with device", that part you can convert to np.nan, 0, the mean or something else - as its a personal decision unless Anand gives advice on what to do.

    I didnt need a loop to do it at all but I believe you can do these steps with a loop probably. Using a function just felt like the better approach.

    Btw when I get stuck on syntax and such where my class notes were not that great, I google it to find examples of code. For example, my search "python remove character from string in dataframe" returned some ideas for this as well.
     
    #49
    Last edited: May 20, 2020
  50. Sakthi Vijaya Kumar

    Sakthi Vijaya Kumar Active Member

    Joined:
    Apr 27, 2020
    Messages:
    26
    Likes Received:
    6
    Yes thanks for the idea. I achieved it via loop function itself. And for 'varies with device' i tried to fill up with median but still got a box plot for price as like all are in outliers. check, my graph

    upload_2020-5-20_17-56-5.png
     
    #50
    Sedric Hibler likes this.

Share This Page