Machine Learning | Vaishali | June 27 - Aug 01,2020

Discussion in 'Big Data and Analytics' started by Support Simplilearn(4685), Jun 26, 2020.

  1. Support Simplilearn(4685)

    Staff Member Alumni

    Joined:
    Feb 11, 2010
    Messages:
    277
    Likes Received:
    30
    #1
  2. Raghvendra Singh Chauhan

    Raghvendra Singh Chauhan Active Member

    Joined:
    Sep 30, 2019
    Messages:
    18
    Likes Received:
    0
  3. Neena Shankar

    Neena Shankar Member

    Joined:
    Jul 13, 2019
    Messages:
    3
    Likes Received:
    0
    Hi Vaishali,

    when trying to plot the Heatmap in jupyter notebook I was still getting an error. I did not want to disturb the whole flow of class. I have also attached the screen shot of the same. If you can please help me with this.

    Below is the code is used:

    import matplotlib as plt
    import seaborn as sns

    correlations = salary_df.corr()
    correlations

    sns.heatmap(correlations,square = True, cmap='bwr')
    plt.yticks(rotation=0)
    plt.xticks(rotation=90)

    But i ended up getting this error
    ---------------------------------------------------------------------------
    AttributeError Traceback (most recent call last)
    <ipython-input-17-6984e21040cf> in <module>
    1 sns.heatmap(correlations,square = True, cmap='bwr')
    ----> 2plt.yticks(rotation=0)
    3 plt.xticks(rotation=90)

    AttributeError: module 'matplotlib' has no attribute 'yticks'
     

    Attached Files:

    #3
  4. Manish Mundra

    Manish Mundra Member

    Joined:
    Apr 23, 2020
    Messages:
    5
    Likes Received:
    0
    please share today class code and please let me know about anaconda installation..because it through error named "failed to create menus", please let me know how to install it properly as i am not much aware about IT background..
     
    #4
  5. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    45
    Likes Received:
    2
    Hi Manish,

    PFB the link that will be useful for anaconda installation:
    https://docs.anaconda.com/anaconda/install/
     
    #5
  6. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    45
    Likes Received:
    2
    Hi Manish,

    Please check the google drive link of your batch.
     
    #6
  7. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    45
    Likes Received:
    2
    Hi Neena,

    Please check the google drive link of your batch. I have uploaded the solution.
     
    #7
  8. Vinay Chitrakathi

    Joined:
    Mar 19, 2020
    Messages:
    6
    Likes Received:
    0
    Hi Raghu. Vinay here.
     
    #8
  9. _37815

    _37815 Member

    Joined:
    Aug 18, 2018
    Messages:
    4
    Likes Received:
    1
    For Voice dataset and horse data set, I had predict values (accuracy and confusion matrix) using both SVM and DT.

    Very interesting !!
     

    Attached Files:

    #9
    Vaishali_26 likes this.
  10. Raghvendra Singh Chauhan

    Raghvendra Singh Chauhan Active Member

    Joined:
    Sep 30, 2019
    Messages:
    18
    Likes Received:
    0
    hello Vinay
     
    #10
  11. _57428

    _57428 New Member

    Joined:
    Jan 24, 2019
    Messages:
    1
    Likes Received:
    1
    Hello Vaishali,

    This is Hari shankar here.

    I have coded the prediction model using he voice dataset using SVM --> SVC

    refer attached my code. my accuracy score is 72% only, but when in compare your code (DAY_4 in google drive) the accuracy score is 98%

    please let me know what can be imported in my code.

    ------------------------------------------------------------------------------

    Post dated : 7th July:

    I tried using standardization on the voice data and my accuracy is now 98% , without standardization it was 72%.

    standardization make a lot of difference :)
     

    Attached Files:

    #11
    Last edited: Jul 7, 2020
    Vaishali_26 likes this.
  12. Hitesh Kumar_4

    Hitesh Kumar_4 Active Member

    Joined:
    Feb 11, 2020
    Messages:
    20
    Likes Received:
    2
    Hello Vaishali, Hello all..I have question...
    function Salary_df['Salary'].values convert Salary column into numpy array...so does it convert the column into numpy array permanently or for the time being just to perform calculation???
     
    #12
  13. PMI SwaroopmRamalingam(1567)

    Joined:
    Jul 23, 2014
    Messages:
    1
    Likes Received:
    0
    When we do Label encoding for the labels, let us say the outcome is Male, female and transgen so the result comes as 0,1,2. How to find which is what? What is for 0 or what is for 1 or what is for 2?
     
    #13
  14. Aman kumar verma

    Alumni

    Joined:
    Oct 27, 2019
    Messages:
    11
    Likes Received:
    1
    No, it doesn't although we can assign that numpy converted values to a variable so we can continue our work
     
    #14
    Hitesh Kumar_4 likes this.
  15. Harish S_3

    Harish S_3 Member

    Joined:
    Dec 14, 2019
    Messages:
    2
    Likes Received:
    0
    @vaishali, So I clearly understand that we fill na we see the correlation, remove highly correlated values fit model train model, test mode, etc etc etc. Is that all? What is the main other statistical analysis we have to do?
     
    #15
  16. GAURAV MATHUR_1

    GAURAV MATHUR_1 Active Member

    Joined:
    Mar 15, 2020
    Messages:
    16
    Likes Received:
    0
    Hi Vaishali

    I have attached Logistic Regression code where Iris dataset was used. This code is picked from Simplilearn reference material on Supervised Learning. In In[39], I am getting "unindent does not match any outer indentation level" error. I also made changes to the code, but any change is resulting into other error.

    What is the fix to this one?
     

    Attached Files:

    #16
  17. GAURAV MATHUR_1

    GAURAV MATHUR_1 Active Member

    Joined:
    Mar 15, 2020
    Messages:
    16
    Likes Received:
    0
    Hi Vaishali,

    Let say we have a dataset with 4 features and 500 rows. If number of Outliers is high(say 180) due to one of the feature, then is it advisable to drop that many rows from dataset or should we drop that feature with high number of outliers?
     
    #17
  18. GAURAV MATHUR_1

    GAURAV MATHUR_1 Active Member

    Joined:
    Mar 15, 2020
    Messages:
    16
    Likes Received:
    0
    Hi Vaishali,

    Do we have any different treatment for date variable in classification problem in case if it holds significance?
     
    #18
  19. kedar_17

    kedar_17 Member

    Joined:
    Apr 3, 2020
    Messages:
    2
    Likes Received:
    0
    hi vaishali,
    i am having problem in PCA. i want to ask that when pca is performed we get features that has high variance in and low variance so can we know that which feature is having high variance after we get variance_ratio as a output from PCA plz help me i am new to programming and machine learning as well.
     
    #19
  20. Sonali Bapte

    Sonali Bapte Member

    Joined:
    Mar 25, 2020
    Messages:
    9
    Likes Received:
    1
    hi Vaishali,
    i created model for Loan_Risk_analysis which is a practice project in supervised learning classification. In the dataset the target variable has count: 0 8045
    1 1533 means class is not balanced so we should not look as the accuracy score. we should look at the f1 score. I am not able to interpret the result of f1 score please let me know how can i interpret it, here attaching the classification report as well as confusion matrix.
    Please let me know what will be the possible solution to overcome this problem
     

    Attached Files:

    #20
  21. Sonali Bapte

    Sonali Bapte Member

    Joined:
    Mar 25, 2020
    Messages:
    9
    Likes Received:
    1
    Hi Vaishali,
    In case the categorical features have more that one categories like in our decision tree classification example of horse.csv data, the feature: 'temp_of_extremities' have categories like ['cool', nan, 'normal', 'cold', 'warm'] but after one hot encoding using get_dummies method it is giving output in the form of 0 and 1 only. why is it so ? as it should be according to no.of categories in the feature.
    Please Explain.

    thanks
     
    #21
  22. Hitesh Kumar_4

    Hitesh Kumar_4 Active Member

    Joined:
    Feb 11, 2020
    Messages:
    20
    Likes Received:
    2
    Hello Vaishali, in horse.csv data sets while performing decision tree you used

    for category in category_variables:
    animals[category] = pd.get_dummies(animals[category])

    for one hot encoding. can you please just tell me what we are assigning here???
     
    #22
  23. Hitesh Kumar_4

    Hitesh Kumar_4 Active Member

    Joined:
    Feb 11, 2020
    Messages:
    20
    Likes Received:
    2
    while performing Imputer function am getting this error...please help me

    upload_2020-7-15_15-17-0.png
     
    #23
  24. Hitesh Kumar_4

    Hitesh Kumar_4 Active Member

    Joined:
    Feb 11, 2020
    Messages:
    20
    Likes Received:
    2
    Aman I need your help with Imputer function.
     
    #24
  25. Saju PC

    Saju PC Member

    Joined:
    Aug 6, 2019
    Messages:
    6
    Likes Received:
    1
    Vaishali,
    I found some data and feel it more interesting. Using this we have to forecast the raw material requirement of material. I found many parameters like Inventory, Lead time, Existing Orders (i.e , material in pipeline), Quantity of Material required for production etc. . Attached sample data of one material. ( approx 70-80 materials are there, I took only one now).
    Details of data and problem statement are below. Can you please help me to create a modal for this data? I feel we need to use multilinear regression or Elastonet .. Not sure. kindly help me. This is a live example I got from a source.

    This is Material Requirement as on 25-05-2017

    Description of the data:
    Columns
    Material No : Raw material part no
    Req Dt : Requirement Date
    Req Qty : Quantity Required
    safetyinvt_qty : Safety Inventory
    Current Inventory : Inventory as on date (25-05-2017)
    Order Qty : Current Order Quantity
    Expect Date : Date of arrival of material
    PO Qty : Quantity of Material generated on 25/05/2017


    ============================================================

    a) Opening inventory Material '1' on 25-05-2017 is 1098
    b) The sales/production department says requirement of material and Date of requirement and Requirement Quantity is mentioned in Req Dt, Req Qty columns
    b) There must be a safety stock of 846 every day
    c) Order_Qty : This is the current order quantity and the expected date of arrival of material is mentioned in Expect Date. Eg: Delivery of 1200 pieces is expecting on 25/05/2017
    d) PO Qty : this much quantity of material must be required on "Req Dt" dates.

    Prepare a modal to predict day-wise order for the material required.
     

    Attached Files:

    #25
    Last edited: Jul 15, 2020
  26. Hitesh Kumar_4

    Hitesh Kumar_4 Active Member

    Joined:
    Feb 11, 2020
    Messages:
    20
    Likes Received:
    2
    Hello Vaishali, I know no one is bothered to look into this community link to give answer to our queries but again am asking.....
    in agglomerative clustering (on this data set shopping_data.csv) first we find out the number of cluster with dendrogram and then we saw the clusters in scatter plot. I want to know, if we want the data or records of all those people who falls in those particular cluster, how can we get that data?? I hope you understand my question.

    upload_2020-7-17_0-31-31.png
     
    #26
  27. Juliet_1

    Juliet_1 Member
    Alumni

    Joined:
    Dec 13, 2015
    Messages:
    7
    Likes Received:
    3
  28. Juliet_1

    Juliet_1 Member
    Alumni

    Joined:
    Dec 13, 2015
    Messages:
    7
    Likes Received:
    3
    Hi Vaishali,

    Do we need to know the steps for each model by heart or only the concept
    please advise

    Thanks
     
    #28
  29. Juliet_1

    Juliet_1 Member
    Alumni

    Joined:
    Dec 13, 2015
    Messages:
    7
    Likes Received:
    3
    Hi,
    the below link may help you
    https://datatofish.com/k-means-clustering-python/
     
    #29
    Hitesh Kumar_4 likes this.
  30. GAURAV MATHUR_1

    GAURAV MATHUR_1 Active Member

    Joined:
    Mar 15, 2020
    Messages:
    16
    Likes Received:
    0
    Hi Vaishali,

    Attached is the code related to cluster image processing. In this below are few things which I could not understand:
    1. We are creating labels here (in clustering) and then creating the image back, which is a blurred one. I increased the cluster value also upto 20, but didn't saw any improvement. Now sure, why we used label to create the image back and why increasing cluster not helping in improving quality of image
    2. What is blob?
    3. How can we do 2 image comparison using clustering?
     

    Attached Files:

    #30
  31. Vidhya Nagarajan

    Vidhya Nagarajan New Member

    Joined:
    Oct 23, 2019
    Messages:
    1
    Likes Received:
    0
    Hi Vaishali,

    When trying for XGBoost coding, the line
    import xgboost as xgb

    i get below error.
    ---------------------------------------------------------------------------
    ModuleNotFoundError Traceback (most recent call last)
    <ipython-input-43-5943d1bfe3f1> in <module>
    ----> 1import xgboost as xgb

    ModuleNotFoundError: No module named 'xgboost'

    Kindly help.
    Thanks,
    Vidhya
     
    #31
  32. Raghvendra Singh Chauhan

    Raghvendra Singh Chauhan Active Member

    Joined:
    Sep 30, 2019
    Messages:
    18
    Likes Received:
    0
    hello, ma'am could you please help with model deployment could you send any reference material so that I can learn bout it.
     
    #32
  33. Manish Mundra

    Manish Mundra Member

    Joined:
    Apr 23, 2020
    Messages:
    5
    Likes Received:
    0
    ma'am,

    can u please suggest column names in data set of income qualification project. There are some code which are not interpret able..how to understand ?
     
    #33
  34. Manish Mundra

    Manish Mundra Member

    Joined:
    Apr 23, 2020
    Messages:
    5
    Likes Received:
    0
    ma'am,

    while i am run code df.info().. it is showing as below:
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 9557 entries, 0 to 9556
    Columns: 143 entries, Id to Target
    dtypes: float64(8), int64(130), object(5)
    memory usage: 10.2+ MB

    why it is not showing details of each dtypes for each variable..
     
    #34
  35. Hitesh Kumar_4

    Hitesh Kumar_4 Active Member

    Joined:
    Feb 11, 2020
    Messages:
    20
    Likes Received:
    2
    Guys! do you know the accuracy score of the Amazon's recommendation system???
     
    #35
  36. Hitesh Kumar_4

    Hitesh Kumar_4 Active Member

    Joined:
    Feb 11, 2020
    Messages:
    20
    Likes Received:
    2


    there is a Data dictionary available as well. use it and you ll come to know about the columns.


    upload_2020-7-22_1-31-12.png
     
    #36
  37. Sonali Bapte

    Sonali Bapte Member

    Joined:
    Mar 25, 2020
    Messages:
    9
    Likes Received:
    1
    Hi,
    In project 2 i.e. "Income Qualification" there are some features which includes all datatypes like strings, int and float datatypes. So how we can convert it in to numeric form?

    PLEASE REPLY
    THANKS
     
    #37
  38. Hitesh Kumar_4

    Hitesh Kumar_4 Active Member

    Joined:
    Feb 11, 2020
    Messages:
    20
    Likes Received:
    2

    first find out the columns with the categorical data in both the train and test..after finding out do the prepossessing (histogram chart) on all those columns and replace the values with 0 or mean, mode, median..whatever you want.
     
    #38
  39. Raghvendra Singh Chauhan

    Raghvendra Singh Chauhan Active Member

    Joined:
    Sep 30, 2019
    Messages:
    18
    Likes Received:
    0
  40. Raghvendra Singh Chauhan

    Raghvendra Singh Chauhan Active Member

    Joined:
    Sep 30, 2019
    Messages:
    18
    Likes Received:
    0
    Ma'am I getting an error while applying the model could you please let me know where I doing wrong as I didn't understand where is the problem. why I am getting this error.
     
    #40
  41. _37815

    _37815 Member

    Joined:
    Aug 18, 2018
    Messages:
    4
    Likes Received:
    1
    In Project 2 , regarding income.. Is "Dependency Rate" the target ?
     
    #41
  42. _37815

    _37815 Member

    Joined:
    Aug 18, 2018
    Messages:
    4
    Likes Received:
    1
    Is Dependency Rate the output column?
     
    #42
  43. Hitesh Kumar_4

    Hitesh Kumar_4 Active Member

    Joined:
    Feb 11, 2020
    Messages:
    20
    Likes Received:
    2
    No, there is a specific column name 'Target'
     
    #43
    _37815 likes this.
  44. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    45
    Likes Received:
    2
    Yes Hari Shankar. It does.
     
    #44
  45. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    45
    Likes Received:
    2
    Hi Raghavendra,

    Please check the solution that I have uploaded in Google Drive.
     
    #45
  46. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    45
    Likes Received:
    2
    Hi Sonali,

    Please analyse the features individually and then take appropriate decisions. For eg: In a numerical column if there is one or 2 string values, replace those string values with integer values.
     
    #46
  47. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    45
    Likes Received:
    2
    Hi Naveen,

    Please refer the below blogs. They'll answers all your questions here :)

    https://towardsdatascience.com/transforming-skewed-data-73da4c2d0d16

    https://www.analyticsvidhya.com/blo...chine-learning-normalization-standardization/

    https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html
     
    #47
  48. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    45
    Likes Received:
    2
    It should show Manish. Please try rerunning the code.
     
    #48
  49. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    45
    Likes Received:
    2
    Hi Saju,
    Yes, the problem looks interesting. I would recommend you to use multiple linear regression first and check the RMSE value. If it is huge, then
    please go for regularization regression technique like Elastic net.
     
    #49
  50. Vaishali_26

    Vaishali_26 Active Member

    Joined:
    Sep 12, 2019
    Messages:
    45
    Likes Received:
    2
    Hi Hitesh,
    this is not a permanent conversion.
     
    #50

Share This Page