DS WITH Python | Jun- 6| Vaishali

Discussion in 'Big Data and Analytics' started by Mukesh Sahu, Jun 7, 2020.

  1. Mukesh Sahu

    Mukesh Sahu Moderator
    Simplilearn Support

    Joined:
    Feb 2, 2011
    Messages:
    850
    Likes Received:
    84
    Hi all,

    Please use this thread for python queries.
     
    #1
  2. Savaridasan

    Savaridasan Customer
    Customer

    Joined:
    May 12, 2020
    Messages:
    3
    Likes Received:
    0
    Good Morning Mam, and all friends. Happy Data Science with Python
     
    #2
  3. Shrawan_Kumar_Sahu

    Joined:
    May 11, 2020
    Messages:
    5
    Likes Received:
    0
    Hello Vaishali Mam , how to classify Data as Training Set and Test Set ?
     
    #3
  4. _79228

    _79228 New Member

    Joined:
    May 17, 2020
    Messages:
    1
    Likes Received:
    0
    Is it possible to reuse the training data set along with the testing data set for testing the ML model?
     
    #4
  5. Nishant Ranjan Verma

    Joined:
    May 31, 2020
    Messages:
    1
    Likes Received:
    0
    Sure
     
    #5
  6. Nishant_Singh

    Nishant_Singh Well-Known Member
    Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    281
    Likes Received:
    98
    Yes this is possible
     
    #6
    _79228 likes this.
  7. Nishant_Singh

    Nishant_Singh Well-Known Member
    Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    281
    Likes Received:
    98
    Hi Learner,

    You can do the train test split with the below package :

    # Python Import for train_test_split :
    from sklearn.model_selection import train_test_split

    I hope that this will help.

    Regards,
    Nishant Singh
    Sr. Global Teaching Assistant
    Simplilearn
     
    #7
    Naveen B likes this.
  8. Raja Jessayen

    Raja Jessayen New Member

    Joined:
    Oct 31, 2019
    Messages:
    1
    Likes Received:
    0
    Please share the google drive link for this batch
     
    #8
  9. Shrawan_Kumar_Sahu

    Joined:
    May 11, 2020
    Messages:
    5
    Likes Received:
    0
    test_score=np.array([[81,71,57,63],[54,68,82,45]])
    How to access 71 and 82?
     
    #9
    Last edited: Jun 25, 2020
  10. Kanak Das

    Kanak Das Member

    Joined:
    May 24, 2020
    Messages:
    2
    Likes Received:
    0
    72 is not present in the array
     
    #10
  11. Kanak Das

    Kanak Das Member

    Joined:
    May 24, 2020
    Messages:
    2
    Likes Received:
    0

    Sir/Madam,

    Please help me to do box plot & KDE plot through seaborn. I know the codes but in my data set , suppose, I am taking two variables, steel production, t/day & fuel consumption, t/day. For last 5 years, based on daily data, total data points for Steel production & fuel consumption is around = 5 (years) * 365 day
    =1825

    With these numbers of data, box plotting is appearing like scatter plot. No appearance of box.

    Difficulties with kde plot is that data range (min to max) is completely different for fuel rate (530 - 730) & steel production (32000- 40000).

    Please help me for this different range of data & huge number of data.

    When I am using only 20 days data, box plot is coming excellent but not with more data...

    Kde plot is getting difficult due wide variations in data range.

    I apologize, I am questioning this , although it was not in class till now
     
    #11
  12. Brittany Audia

    Joined:
    Mar 27, 2020
    Messages:
    2
    Likes Received:
    0
    Hello. I'm stuck on the last part of the NYC 311 Service Request Project.
    5. Perform a statistical test for the following:
    Please note: For the below statements you need to state the Null and Alternate and then provide a statistical test to accept or reject the Null Hypothesis along with the corresponding ‘p-value’.
    - Whether the average response time across complaint types is similar or not (overall)

    Null Hypothesis = No difference between means of different complaint types
    Alt Hypothesis = Some difference between means of different complaint types

    upload_2020-6-24_20-43-43.png

    After getting the average request closing type grouped by each complaint type I'm not sure how to test the hypothesis. I would appreciate any help. I tried to use the f test below but I must not have something set up correctly.

    upload_2020-6-24_20-45-11.png

    - Are the type of complaint or service requested and location related?
    I haven't even attempted this part yet since I haven't been able to figure out the other part yet.
     
    #12
  13. Shrawan_Kumar_Sahu

    Joined:
    May 11, 2020
    Messages:
    5
    Likes Received:
    0
    Hello Mam,
    Please Explain the Sum Output How does NaN ,13 and 25 are coming as Output ?
    Expected output should be 11,22,33,44,55

    upload_2020-6-27_18-46-53.png
     
    #13
  14. Prem V Jejurkar

    Prem V Jejurkar New Member

    Joined:
    Apr 20, 2020
    Messages:
    1
    Likes Received:
    0
    please help me to run this query successfully
    below is the query
    dataframe = pd.DataFrame({
    "Cricket":[1,2,np.nan,4,6,7,2,np.nan],
    "Baseball":[5,np.nan,np.nan,5,7,2,5],
    'Tennis':[1,2,3,4,5,6,7,8]})

    getting following error after executing the code
    ---------------------------------------------------------------------------
    ValueError Traceback (most recent call last)
    <ipython-input-66-d1030fc57241> in <module>
    2 "Cricket":[1,2,np.nan,4,6,7,2,np.nan],
    3 "Baseball":[5,np.nan,np.nan,5,7,2,5],
    ----> 4 'Tennis':[1,2,3,4,5,6,7,8]})
    ~\anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    433 )
    434 elif isinstance(data, dict):
    --> 435mgr = init_dict(data, index, columns, dtype=dtype)
    436 elif isinstance(data, ma.MaskedArray):
    437 import numpy.ma.mrecords as mrecords

    ~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in init_dict(data, index, columns, dtype)
    252 arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
    253 ]
    --> 254return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    255
    256

    ~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
    62 # figure out the index, if necessary
    63 if index is None:
    ---> 64index = extract_index(arrays)
    65 else:
    66 index = ensure_index(index)

    ~\anaconda3\lib\site-packages\pandas\core\internals\construction.py in extract_index(data)
    363 lengths = list(set(raw_lengths))
    364 if len(lengths) > 1:
    --> 365raise ValueError("arrays must all be same length")
    366
    367 if have_dicts:

    ValueError: arrays must all be same length
     
    #14
  15. Shrawan_Kumar_Sahu

    Joined:
    May 11, 2020
    Messages:
    5
    Likes Received:
    0
    I wanted to calculate the root of polynomial Function Y defined below but unable to get the output please let me know , where I am wrong?
    upload_2020-7-4_10-30-37.png
     
    #15
  16. Shrawan_Kumar_Sahu

    Joined:
    May 11, 2020
    Messages:
    5
    Likes Received:
    0
    0.5 should be the single output , what does the value 5.551115123125783e-15 Signify ?
    upload_2020-7-4_10-41-27.png
     
    #16
  17. _79635

    _79635 Member

    Joined:
    May 21, 2020
    Messages:
    2
    Likes Received:
    1
    Hello,

    I'm doing the Comcast Telcom Customer Complaints project.
    In that there is one question: 'Which complaint types are maximum i.e., around internet, network issues, or across any other domains.'

    There is no complaint types column in the dataset. Are we expected to categorize the 'customer complaints' column into different types?

    If yes, how do we decide the basis of separating it into types? Since each row in the 'customer complaints' column is having different string values, how do we classify those into separate categories?
     
    #17
    Ajinkya Kurlekar likes this.
  18. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    30
    Likes Received:
    0
    Hi Shrawan ,
    I guess we have discussed enough about this in our LVC sessions now :)
     
    #18
  19. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    30
    Likes Received:
    0
    #19
  20. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    30
    Likes Received:
    0
    #20
  21. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    30
    Likes Received:
    0
    Hi Prem ,

    The baseball key has only 7 values. Please introduce one more value so that all the keys have 8 values. This error is due to uneven lengths.
     
    #21
  22. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    30
    Likes Received:
    0
    I have already explained this in the class Shrawan. Please re-visit the session recording.
     
    #22
  23. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    30
    Likes Received:
    0
    Hi ,
    Complaint types column is available. Please check.
     
    #23
  24. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    30
    Likes Received:
    0
    Hi Brittany,

    This is a simple mean comparison test. Please write your inferences from the mean values.
     
    #24
  25. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    30
    Likes Received:
    0
    #25
  26. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    30
    Likes Received:
    0

    Hi Kanak,

    PFB a useful link.
    https://python-graph-gallery.com/30-basic-boxplot-with-seaborn/
     
    #26
  27. _79635

    _79635 Member

    Joined:
    May 21, 2020
    Messages:
    2
    Likes Received:
    1
    "Complaint types" column is not present (screenshot below).
    It only has "Customer complaint" column. But that column has different data for each obeservation. It is a description of the complaint. Not the complaint type.

    upload_2020-7-11_19-10-32.png
     

    Attached Files:

    #27
  28. Parameswar Sahoo

    Parameswar Sahoo New Member

    Joined:
    Mar 10, 2020
    Messages:
    1
    Likes Received:
    0
    Hi
    I am doing project Comcast Telecom Consumer Complaints . I am able to get monthwise numbe rof counts. However not able to sort result base on month for line chart, can someone help on this. here is the code that i have done.

    df = pd.read_csv (r'D:\file')
    monthly_complaints=pd.DataFrame(df,columns=['Ticket #','Date_month_year'])
    monthly_complaints['Month']=monthly_complaints['Date_month_year'].dt.strftime('%B')
    frq_complaints_monthly=monthly_complaints.groupby(monthly_complaints['Month'],sort=True).size()
    print((frq_complaints_monthly))

    Thanks,
    Parameswar
     
    #28
  29. Shailesh Tumma

    Joined:
    Apr 30, 2020
    Messages:
    4
    Likes Received:
    0
    Hi Vaishali,

    For movie lens project how can I do the following:
    Create a separate column for each genre category with a one-hot encoding ( 1 and 0) whether or not the movie belongs to that genre.

    The existing Genre column has multiple values separated by "|". I have used str.split("|",expand=True) to split the data which gives me a DF of size(1000209 rows × 6 columns). When I use one hot encoder I get the and error "TypeError: argument must be a string or number". Below is the code:

    binary_encoder = OneHotEncoder(categories = 'auto')
    genres_1hot = binary_encoder.fit_transform(genres)
    genres_1hot_mat = genres_1hot.toarray()
    genres_DF = pd.DataFrame(genres_1hot_mat)
    genres_DF.head()

    When I use get_dummies it gives a dataframe of size 1000209 rows × 64 columns. But there are only 19 unique genres. Below is the code:

    genres_DF= pd.get_dummies(genres, drop_first=True)
    genres_DF

    Please help.

    Regards,
    Shailesh Tumma
     
    #29
  30. Ankita Sahu_2

    Ankita Sahu_2 New Member

    Joined:
    Feb 29, 2020
    Messages:
    1
    Likes Received:
    0
    Hello,
    I need time for projects(Data science with pyhton)
     
    #30

Share This Page