Data Science with Python | Ayushi | June 20 - July 25 (2020)

Discussion in 'Big Data and Analytics' started by Raghavendra B M, Jun 20, 2020.

  1. Raghavendra B M

    Raghavendra B M Active Member
    Simplilearn Support

    Joined:
    Jan 6, 2020
    Messages:
    45
    Likes Received:
    34
    #1
  2. Abdul basit yekeen

    Joined:
    May 21, 2020
    Messages:
    3
    Likes Received:
    0
    upload_2020-6-22_23-39-20.png
    This is data science with project 2.
    what do you mean by the below instruction.
    upload_2020-6-22_23-40-45.png
     

    Attached Files:

    #2
  3. Pritam Paul_3

    Pritam Paul_3 Member

    Joined:
    Jun 1, 2020
    Messages:
    7
    Likes Received:
    0
    Hello, Please help with the below code as it throws an error Screenshot (67).png
     
    #3
  4. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi Abdul,

    After you combine the data,
    u need to get genres column
    and create a list of unique genres something like, (after splitting them up)
    ['Musical', 'Animation', 'Action', 'Mystery', 'Comedy', 'Romance', 'Drama', 'Adventure', "Children's", 'Fantasy', 'Western', 'War', 'Thriller', 'Horror', 'Sci-Fi', 'Film-Noir', 'Documentary', 'Crime']

     
    #4
  5. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi Pritam,

    sklearn cross validation has been changed to sklearn model_selection. Please use it.
     
    #5
  6. Jitendra solanki_1

    Joined:
    Jun 9, 2020
    Messages:
    4
    Likes Received:
    0
    i am not able to open 2nd class link. Its asking credential.. kindly help me out
     
    #6
  7. Ashish Sinha_5

    Joined:
    Jun 3, 2020
    Messages:
    6
    Likes Received:
    0
    Hi All,

    PFB the assignment solutions.
    @Aayushi please let me know if solution looks good to you

    Thanks,
    Ashish
     

    Attached Files:

    #7
    Last edited: Jun 27, 2020
  8. Anshul Khairari

    Joined:
    Apr 8, 2020
    Messages:
    3
    Likes Received:
    0
    Hi Aayushi,

    Please review my assignment answers.
     

    Attached Files:

    #8
  9. _83274

    _83274 Member

    Joined:
    Jun 4, 2020
    Messages:
    2
    Likes Received:
    0
    I have a question from the class dated June 27, 2020.
    If we are looking at a sample of men's height in a class and we notice that
    mean = 160cm
    median = 163cm
    mode = 165cm
    standard deviation = 3cm

    Now what does this value of 3cm for standard deviation mean?
    What information am I getting from this number 3, here?
     
    #9
    Last edited: Jun 28, 2020
  10. _83274

    _83274 Member

    Joined:
    Jun 4, 2020
    Messages:
    2
    Likes Received:
    0
    Hello Aayushi,

    I have tried to optimize Anshul's responses.
    Can you please review this too and give your valuable feedback?

    Thank you.
     

    Attached Files:

    #10
  11. Reeta Elangovan

    Joined:
    Jun 2, 2020
    Messages:
    10
    Likes Received:
    0
    @Aayushi - Can you please post the answers for the assignment problems? Thanks
     
    #11
  12. _79478

    _79478 New Member

    Joined:
    May 19, 2020
    Messages:
    1
    Likes Received:
    0
    Hello Aayushi,
    PFA Python assignment's answers.
     

    Attached Files:

    #12
  13. _19902

    _19902 Member

    Joined:
    Jan 9, 2018
    Messages:
    2
    Likes Received:
    0
    Hi AAYUSHI, [ I wrote already this message to AAYUSHI_6

    I am Engineer with more that 30 years of experience, with solid background in statistics, C++ and Python programming , Data Architecture in field of Energy transport and distribution.

    I subscribed to the course DATA SCIENCE WITH PYTHON. I followed 98% of the course and I presented all the exercises and also the project #3 'Comcast Telecom Consumer Complaints '. Now, one of the rule for getting the certificate is to attend at least 80% of live class, reason why I applied to you class ending July 25th.

    I kindly ask the following:

    1) in view of 80% of participation, my I recover the lost days June 20 -21-27 by listening the registered lessons ?

    2) as I said, I followed the self-learning and I solved already one of the 4 projects; how does this impact on the attendance of live class? and final score to get the certificate?, i.e. will it be better I solve another of the 4 projects at the end of the live class ?


    I am happy to follow your live class, see you this [CEST] afternoon.


    regards

    Giuliano Basso




     
    #13
  14. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Please raise a ticket on Help & Support.
     
    #14
  15. Ravi Rajana

    Ravi Rajana Customer
    Customer

    Joined:
    Jun 1, 2020
    Messages:
    1
    Likes Received:
    0
    Hello Aayushi,
    Could you please have a look at the assignment which i completed.

    -Ravi
     

    Attached Files:

    #15
  16. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    okay Ravi.
     
    #16
  17. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Couldn't understand your point. Please elaborate.
     
    #17
  18. Reeta Elangovan

    Joined:
    Jun 2, 2020
    Messages:
    10
    Likes Received:
    0
    Hi Aayushi,
    While trying to execute multiple conditions in pandas, I am getting error as attached.
    Can you please check and let me know the same? Attached the screenshot for the same.
    Thanks. Multiple conditions in pandas.JPG
     
    #18
  19. _80474

    _80474 Member

    Joined:
    May 29, 2020
    Messages:
    7
    Likes Received:
    0
    Hi Ayushi,

    can you explain about below issue, why fillna is not working for a particular row/index. please find the issue below.

    upload_2020-7-3_18-40-58.png
     
    #19
  20. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    H
    Hi Reeta,

    Please use & operator to avoid the above error.
    like this,
    # df[(condition1)& (condition2)]
    df[(df['sepal length'] <= 5.0) & (df['sepal width'] <= 3.0)]
     
    #20
  21. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi Gokul,

    A low standard deviation indicates that the data points tend to be very close to the mean; a high standard deviation indicates that the data points are spread out over a large range of values.
    In this case, sd is very low.

    Hope it helps!
     
    #21
  22. _79264

    _79264 Member

    Joined:
    May 18, 2020
    Messages:
    2
    Likes Received:
    0
    X = healthData.iloc [:,:-1].values
    y = healthData.iloc [:,3].values

    in X what does :-1 represents? How does it fetch from 0-2 location values
     
    #22
  23. _79264

    _79264 Member

    Joined:
    May 18, 2020
    Messages:
    2
    Likes Received:
    0
    Hi Aayushi,

    Here in this code what does categorical_feautures=[0] mean

    from sklearn.preprocessing import OneHotEncoder
    X_onehotencoder = OneHotEncoder (categorical_features = [0])

    X = X_onehotencoder.fit_transform(X).toarray()
    print (X)
     
    #23
  24. Reeta Elangovan

    Joined:
    Jun 2, 2020
    Messages:
    10
    Likes Received:
    0

    Thanks Aayushi - it worked now.
     
    #24
  25. _78016

    _78016 Member

    Joined:
    May 5, 2020
    Messages:
    4
    Likes Received:
    0
    Hi Aayushi, I am getting an error following import
     

    Attached Files:

    #25
  26. Satyajit Datta

    Satyajit Datta Active Member
    Simplilearn Support

    Joined:
    Jan 31, 2017
    Messages:
    22
    Likes Received:
    5
    Imputation based on other columns

    There are 3 columns col1, col2 and col3 in a data frame.
    col1 is missing some rows. I want to fill these missing values with value of col1 from other row which have matching value of col2 and col3.
    How can I do this?

    Example:
    How can I fill the NaN in row3:col1 with A, as row3:col2 and col3 matches row1:col2 and col3

    xxxx col1 col2 col3
    row1 A 1 11
    row2 B 2 22
    row3 NaN 1 11
    row4 C 3 33
     
    #26
  27. Ashish Sinha_5

    Joined:
    Jun 3, 2020
    Messages:
    6
    Likes Received:
    0
    Hi Aayushi,

    In the comcast project the complaint types are similar but are treated differently when used with value_counts. So DATA CAPS,DATA CAP and COMCAST DATA CAPS and COMCAST DATA CAP are same. How can we handle this.

    Please free to answer this anyone.

    Thanks,
    Ashish
     
    #27
  28. Pritam Paul_3

    Pritam Paul_3 Member

    Joined:
    Jun 1, 2020
    Messages:
    7
    Likes Received:
    0
    Hello Ayushi! This is the assignment provided on 6th July 2020 on the BigMart dataset. Kindly look into it and help me out with my mistakes. Also i was not able to convert the Outlet_Establishment_Year into age.Also i was not able to clean the Item_Fat_Content. Please help me with this as well.
     

    Attached Files:

    #28
    Last edited: Jul 8, 2020
  29. Ashish Sinha_5

    Joined:
    Jun 3, 2020
    Messages:
    6
    Likes Received:
    0
    Hi Aayushi,

    In comcast project i created dataframe like this
    upload_2020-7-7_23-52-58.png

    Now i want to plot this in barplot such that for the state number of closed and open complaints comes in the same graph. How can i achieve this. I have the following:
    upload_2020-7-7_23-54-26.png

    Thanks in advance.

    Ashish
     
    #29
  30. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi,

    :-1 represents - ignoring the last column (meaning all columns except last one because the last one is the target variable which we want to be in Y)

    Hope it helps!
     
    #30
  31. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi,

    It means we are applying one hot encoder to the first column i.e. [0] only which is Ethnicity in case of health data.
    Basically we are specifying in which column we are applying the categorical features.

    Hope it helps!
     
    #31
  32. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi,

    You need to install sklearn using pip install sklearn. Also, make sure it is installed at the correct location of site packages, where other libraries are installed.

    Hope it helps!
     
    #32
  33. _78909

    _78909 Member

    Joined:
    May 13, 2020
    Messages:
    2
    Likes Received:
    0
    Hi Aayushi,

    I was practising preprocessing programs,was doing Health.csv data, Found that after splitting train and test data.
    The shape of Xtrain is(7,5) in which you shared. But am getting (7,3) could you please tell me what is the mistake. health_csv3.png health_csv1.png health_csv2.png
     
    #33
  34. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26

    Hi Ashish,

    You can handle this in the following ways:
    1. pandas data frame replace function
    df['column_name']= df['column_name'].replace(['ABC','AB'],'A')

    where ['ABC','AB'] - list of values to be replaced
    and 'A' - new value

    2. pandas data frame isin
    df.loc[df['BrandName'].isin(['ABC','AB'])]='A'

    Hope it helps!
     
    #34
  35. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi Ashish,

    Please try with "hue" argument of barplot to plot both columns in the same graph.
    Hope it helps!
     
    #35
    Ashish Sinha_5 likes this.
  36. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi Pritam,

    to convert the Outlet_Establishment_Year into age
    df["age"] = 2020 - df["Outlet_Establishment_Year"]

    and Item_Fat_Content cleaning can be done in two ways:

    1. pandas data frame replace function
    df['Item_Fat_Content']= df['Item_Fat_Content'].replace(['ABC','AB'],'A')

    where ['ABC','AB'] - list of values to be replaced
    and 'A' - new value

    2. pandas data frame isin
    df.loc[df['Item_Fat_Content'].isin(['ABC','AB'])]='A'

    Hope it helps!
     
    #36
    Pritam Paul_3 likes this.
  37. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi Satyajit,

    Please explore numpy.where() function, it will help.
     
    #37
  38. Pritam Paul_3

    Pritam Paul_3 Member

    Joined:
    Jun 1, 2020
    Messages:
    7
    Likes Received:
    0
    Thank you maam the solution provided worked fine. And at the end i dropped the Outet_Establishment_Year. I am attaching the file as well kindly let me know if the procedure was correct
     

    Attached Files:

    #38
  39. Ashish Sinha_5

    Joined:
    Jun 3, 2020
    Messages:
    6
    Likes Received:
    0

    Thanks For replying Aayushi.

    Actually I was looking for something smaller. So you see Data Caps, Data Cap are same complaint type and there are 100+ complaint types so even if we consider that there are total of 20 unique complaint type then i would have to write replace 20 times. Is there a better way?

    Thanks in advance :)

    Ashish
     
    #39
  40. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi Ashish,

    I completely agree. In that case, we use Regular Expressions in Python. Explore it. You will like it for sure.
    And other could be some string functions like,
    str.startwith()
    str.endswith()
    str.contains()

    Hope it helps!
     
    #40
  41. Jitendra solanki_1

    Joined:
    Jun 9, 2020
    Messages:
    4
    Likes Received:
    0
    Hi Aayushi,
    In Panda csv file giving error can you kindly have look on that. Annotation 2020-07-09 195152.png
     
    #41
  42. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi Jitendra,

    You need to put forward slash while giving the path of the file. (Here it is backward).
    Else, you can also put the dataset at the same location of Jupiter notebook and directly read by : pd.read_csv("iris_dataset.csv") without specifying the path.

    Hope it helps!
     
    #42
  43. Reeta Elangovan

    Joined:
    Jun 2, 2020
    Messages:
    10
    Likes Received:
    0
    Hi Aayushi,
    In the data pre processing example 1 - health data,
    you have used two methods for train test data split - one using cross_validation function and another using model_selection function.
    Please clarify if we can use any one of the above for the same?
    Thanks.
    Train _test split.JPG
     
    #43
  44. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi Reeta,

    First method (cross_validation) should not be used as this is deprecated (if you see the warning message below).
    Second method (model_selection) is recommended always.

    Hope it helps!
     
    #44
  45. Reeta Elangovan

    Joined:
    Jun 2, 2020
    Messages:
    10
    Likes Received:
    0
    Thanks Aayushi for the clarification
     
    #45
  46. Reeta Elangovan

    Joined:
    Jun 2, 2020
    Messages:
    10
    Likes Received:
    0
    Hi Aayushi,
    in Data pre processing, (health data) for standardization, I am getting below error.
    Can you please help? Thanks
    E1.jpg

    E2.jpg
     
    #46
  47. BHARPUR SINGH BAWA_2

    Joined:
    Apr 16, 2020
    Messages:
    4
    Likes Received:
    0
    Hi Aayushi
    I am getting attached error while implementing the One Hot Encoder command , please suggest solution for same.

    One_Hot_Encoder_Error.PNG
     
    #47
  48. Amit Gupta_32

    Amit Gupta_32 Member
    Alumni

    Joined:
    Jun 11, 2020
    Messages:
    3
    Likes Received:
    0
    upload_2020-7-10_21-22-3.png

    Hey Ayushi,

    Please suggest, how can I get the store number here. Because if I put ['Store'] in the query written in cell 37 then it throws an error showing 'float object is not subscriptable'.
     
    #48
  49. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi Amit,

    You already put store in cell 37. I couldn't understand your question. Please elaborate.
     
    #49
  50. Aayushi_6

    Aayushi_6 Well-Known Member

    Joined:
    Sep 19, 2016
    Messages:
    201
    Likes Received:
    26
    Hi Reeta,

    Please make sure that you perform label encoding/ one hot encoding before standardization.
    Because the calculation is numeric. Here, African is still showing in the error message, which means string values still persist.
    Please encode and then do standardization.

    Hope it helps!
     
    #50

Share This Page