MACHINE LEARNING | Ashish

Discussion in 'Big Data and Analytics' started by SUNNY BHAVEEN CHANDRA, Sep 13, 2019.

  1. SUNNY BHAVEEN CHANDRA

    SUNNY BHAVEEN CHANDRA Well-Known Member

    Joined:
    Feb 4, 2019
    Messages:
    80
    Likes Received:
    13
    #1
    Last edited: Sep 13, 2019
  2. Ashish Kumar_33

    Joined:
    Jul 2, 2019
    Messages:
    4
    Likes Received:
    0
  3. kamal_71

    kamal_71 Member

    Joined:
    Feb 11, 2019
    Messages:
    2
    Likes Received:
    0
    please share link of (For merge) folder because it's not shown in google drive
    user_usage.csv
    user_device.csv
    android_devices.csv
     
    #3
  4. RAVI RANJAN CHAUBEY

    Joined:
    Mar 29, 2019
    Messages:
    7
    Likes Received:
    0
    Hello Sir,
    One of Project from Supervised Learning Chapter is Bigmart Outlet Sales. I am getting Mean Absolute Error = 800 something, But my distribution curve of residuals , regression curve between actual and predicated value, line plot of prediction and actual shows model is doing absolutely fine. So is that error considerable? I am sharing Jupyter notebook here: There is 2 files : 1). Cleaning 2). Models, Please see models.

    https://drive.google.com/open?id=1dQ-latYaPTOiu7piv0hpUVjebrfIBCC3

    You can also see these files in lab under temp> 1.Ravi ML
     
    #4
    Last edited: Oct 6, 2019
  5. RAVI RANJAN CHAUBEY

    Joined:
    Mar 29, 2019
    Messages:
    7
    Likes Received:
    0
    Hello Sir,
    I was doing Random Forest Classifier on Loan Borrower Data from Chapter 6. I am getting accuracy 84%, But my precision is 60% and recoll is only 0.02% even auc score is 50%. Can you please check my error where i am going wrong. I compared with solution given in PPT only difference is that i have treated outliers as well, May be that is reason. Please check in your free time.

    You can see model Evaluation part at end of notebook : https://drive.google.com/open?id=1ptcK57E_6vpIKXRZnifU-oR6rt5zO4rs
     
    #5
  6. Anuraj Agrawal

    Joined:
    Mar 20, 2019
    Messages:
    5
    Likes Received:
    0
    which approach we need to take to remove the variable with zero variance in a column??
     
    #6
  7. Ashish Kumar_33

    Joined:
    Jul 2, 2019
    Messages:
    4
    Likes Received:
    0

    Compare it with the average of the Y actual values. Suppose it is 4000. Then error percentage would be 800/4000 = 20%. The lesser the better. It means the prediction varies around 20% from the actual value.
     
    #7
  8. Ashish Kumar_33

    Joined:
    Jul 2, 2019
    Messages:
    4
    Likes Received:
    0
    What have you done to treat the outlier? And what is the definition of outlier that you have applied? A lot depends on that. You might have deleted some important data.
     
    #8
  9. Ashish Kumar_33

    Joined:
    Jul 2, 2019
    Messages:
    4
    Likes Received:
    0
    varcs = df.var(axis=0)
    varcs = varcs[varcs == 0]
    This will give you all the variables with zero variance.
     
    #9
  10. Koushik Radhakrishnan

    Joined:
    Apr 29, 2019
    Messages:
    8
    Likes Received:
    0
    Hi Ashish,

    I was not able to attend the 19/10 class, So i was going through the recording and I would like to know if 0.96 is the upper confidence level you are referring to. I have mentioned the line available in the code of Air Passengers.ipynb file.

    The ACF curve crosses the upper confidence value when the lag value is between 0 and 1. Thus, optimal value of q in the ARIMA model must be 0 or 1

    Regarde,
    Koushik R
     
    #10
  11. Koushik Radhakrishnan

    Joined:
    Apr 29, 2019
    Messages:
    8
    Likes Received:
    0
    Hi Ashish,

    Can you please upload the excel sheet which you showed us when teaching time series concept. It would be easy to remember when i'm working on the time series.

    Regards,
    Koushik R
     
    #11
  12. RAVI RANJAN CHAUBEY

    Joined:
    Mar 29, 2019
    Messages:
    7
    Likes Received:
    0
    Hello Sir,

    In Simulation test there is a question:

    Can decision trees be used for performing clustering?
    Ans : True

    Decision Tree is Supervised Learning Algorithm then How?
     
    #12
  13. RAVI RANJAN CHAUBEY

    Joined:
    Mar 29, 2019
    Messages:
    7
    Likes Received:
    0
    Hello Sir,

    In Simulation test there is a question:

    Which of the following is true for white noise?

    I marked Mean is 0 but according to answer this option is wrong. But you told us in one lecture that mean is 0 and variance is constant?
     
    #13
  14. Koushik Radhakrishnan

    Joined:
    Apr 29, 2019
    Messages:
    8
    Likes Received:
    0
    Hi Sir,

    When i try to implement Lime algorithm. I'm getting the error as "LIME does not currently support classifier models without probability scores. If this conflicts with your use case, please let us know: https://github.com/datascienceinc/lime/issues/16".

    I have added the code below. Kindly look into this and assist me on this.

    Code:
    from lime.lime_tabular import LimeTabularExplainer
    explainer = LimeTabularExplainer(xtrain_sm.values, mode="classification", feature_names=xtrain_sm.columns)
    i = 6
    X_observation = xtest_sm.iloc[, :]
    explanation = explainer.explain_instance(X_observation.values[0], lr.predict,num_features=5)

    Note: I have perfromed smote and logistic regression model

    Regards,
    Koushik R
     
    #14

Share This Page