ML | Amit Jain |

Discussion in 'Big Data and Analytics' started by Vikas Kumar_18, Jun 14, 2019.

  1. Vikas Kumar_18

    Vikas Kumar_18 Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Dec 17, 2018
    Messages:
    124
    Likes Received:
    19
    This is the dedicated community link to discuss with your peers and with TA and trainer.
     
    #1
  2. Vikas Kumar_18

    Vikas Kumar_18 Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Dec 17, 2018
    Messages:
    124
    Likes Received:
    19
    #2
    _3292 and Veronica Leong like this.
  3. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    #3
  4. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Amit Sir,
    The question is about 'Correlation'. After plotting a heatmap of correlation between columns(features), on which basis we can decide a particular column is useful for prediction or not. I understood until now that if there is a correlation(either positive or negative) between 2 columns then we will choose one out of them. But need more clarification on this issue.
    Do we need to perform 'Pearson Correlation' for this? I also observed that the 'Pearson Correlation' is only useful for 'Continous Numeric Variables'. Is it true?
    In 'Titanic Data' problem 'sex_male' and 'sex_female' are completely negatively correlated. But still, we have chosen both of these columns for prediction. So, which criteria we have used in this?
    So, please help me with this.
    Thank You.
     
    #4
  5. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Amit Sir,
    I am currently working on Titanic Dataset and getting the following Confusion Matrix:
    array([[142, 19],
    [ 23, 83]], dtype=int64)

    So, my question is how do I interpret with this Matrix? I mean to say which are TP, FP, TN and FN values in that matrix?
    Are those 142 - TP and 83 - TN, 19 - FN and 23 - FP?
    Thank You.
     
    #5
  6. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Sir,
    In Titanic Dataset problem when I fit model with data(include 'Fare' column and without 'Fare' column) I am getting values for roc_auc_score in the first case(with 'Fare' column) is -- 0.7431094383323683 and in the second case(i.e. without 'Fare' column)-- 0.83250322278214. As we discussed in our session that roc is the ratio of True Positive Rate vs False Positive Rate. So, from these values can I say like with "Fare" column as feature our Model is giving less accuracy in prediction and without 'Fare' column as feature Model is giving more accuracy in prediction? Or I can say when used 'Fare' column my model is overfitting and that's the reason it's getting less score on 'Test' data. I am correct or doing wrong something?
    Thank You.
     
    #6
    Last edited: Jul 5, 2019
  7. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Amit Sir,
    In the above example the first case(with 'Fare' column case) I am getting auc_score = 0.7431 and in the second case(without 'Fare' column) I am getting auc_score = 0.8325. So, can I interpret these scores as a percentage of occupies of the area_under_curve? I mean, the ideal auc_score is 1 or I can say 100% area_under_curve. Now in the first case, I am getting auc_score = 0.7431 or 74% area_under_curve and in the second case, I am getting auc_score = 0.8325 or 83% area_under_curve. Am I doing correct interpretation?
     
    #7
    Last edited: Jul 6, 2019
  8. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Sir,
    Confuse on how we choose 'Root-Node' in Decision Tree using 'Entropy' and 'Information-Gain?'
     
    #8
  9. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Sir,
    What is the use of the parameter ' n_jobs ' while creating the object of particular Model?
     
    #9
  10. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Amit Sir,
    I am currently working on the income_qualification project and facing some issues. The shape of the dataset is (9557, 143) and it's getting hard to find out which columns should keep and which should be drop. As we discussed in our sessions that by using correlation we can determine this part. But in this case how I can find the correlation between such huge no of columns. Is there any other way to find this?
    Please help me with this.
    Thanks in Advance.
     
    #10
  11. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    My Second question after that, is how to check biases in the dataset? Can you provide some hint to solve this question?
    Thank You.
     
    #11
  12. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Sir,
    I am struggling with reducing the feature columns in the income_qualification dataset. Based on our discussion in the classes, this part falls under Feature Engineering. I just recall the PCA technique at this movement for this but don't know how to use this? I am finding the video lecture of the class related to it but don't get it. So, can you tell me in which session we had covered this part?
    Thank You.
     
    #12
  13. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Vikas Kumar Sir,
    Is this link is active? Should I post any further questions on this?
     
    #13
  14. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    #14
  15. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Sir,
    I have checked all our session's recording for Feature Engineering discussion. But I didn't find any such kind of discussion on it. In sessions 5 or maybe 6 you were told that you will cover this topic after completing 'Supervised Learning'. But I think we forgot to discuss it.
    So, can you cover this part in the upcoming session?
    Because without Feature Engineering I can't proceed further in my Project.
    Thanks.
     
    #15
  16. sharmistha datta

    Joined:
    Apr 21, 2016
    Messages:
    3
    Likes Received:
    0
    Hi, While discussing Machine Learning techniques, you had mentioned about "Categorization" in "Machine Learning Techniques". Do you have any code to describe this.
     

    Attached Files:

    #16
  17. sharmistha datta

    Joined:
    Apr 21, 2016
    Messages:
    3
    Likes Received:
    0
    I want to read about Model Life cycle for Machine Learning. Could you please suggest any study material for the same. Thanks.
     
    #17
  18. sharmistha datta

    Joined:
    Apr 21, 2016
    Messages:
    3
    Likes Received:
    0
    Hi, I have checked with Simplilearn team on the Project submission deadline. They mentioned that I can submit anytime before my course expiry deadline. As I will not be able to attend tomorrows session, I will post my doubt here if I have any while working on the project submission.
     
    #18
  19. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Amit Sir,
    What does exactly mean "Check for biases in the dataset?". Does it mean "Check for Outliers in the feature columns?"
    Extremely Sorry for this trouble for asking questions again and again.
    --Tejas
     
    #19
  20. Amit_505

    Amit_505 Member
    Trainer

    Joined:
    Jan 8, 2018
    Messages:
    3
    Likes Received:
    1
    HI Tejas,

    It means,exploratory data analysis and see if algorithm’s tendency is to consistently learn the wrong thing by not considering complete information into account or wrong information (noise) into account.It is part of data pre-processing
     
    #20
    _3292 likes this.
  21. Amit_505

    Amit_505 Member
    Trainer

    Joined:
    Jan 8, 2018
    Messages:
    3
    Likes Received:
    1
    As we discussed in this class
     
    #21
  22. Amit_505

    Amit_505 Member
    Trainer

    Joined:
    Jan 8, 2018
    Messages:
    3
    Likes Received:
    1
    This does not look to be an error ,rather a warning.It essentially means,that some of the functions/libraries being used may have been upgraded so it prompts to update them.But,it is just a warning and ca
    Alright Sharmi.Thanks.Wish you the best
     
    #22
  23. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Amit Sir,
    In which situations do we use Standardization and in which situations use Normalization?
     
    #23
  24. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Amit Sir,
    In our Project 'Income Qualification' there is a column name 'dependency' which is categorical in nature(yes and no). But it also contains some numerical values(.5,2,1.5, .33333334 and so on).
    So, do I need to keep as it is?
    Because my intuition is, after Normalizing them all those values will convert into the range 0 to 1. Am I correct?
     
    #24
    Last edited: Jul 16, 2019
  25. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Sir,
    After applying PCA to the income qualification project dataset my feature columns are drastically reduced from 143 to 50. But now, the problem raise is that the new data frame is without any columns. So, how can I interpret with this new data frame? I am attaching here the screenshot of previous df(before applying PCA) and new df(after applying PCA).
    1.Df before PCA:
    [​IMG]


    2. DF after PCA:
    [​IMG]

    So in this situation what should I do further? Or Shall I start building Model?
     
    #25
  26. _3292

    _3292 TEJAS_PHASE
    Alumni

    Joined:
    Nov 2, 2016
    Messages:
    33
    Likes Received:
    3
    Hello Amit Sir,
    I want to submit my Project. But while submitting it the Simplilearn's system only accepts in pdf, doc, ppt, jpg, png format of the source code. I have tried to convert mine.ipynb notebook into pdf(via latex) using Jupyter's system but it gives me the error.
    So, what should I do in this case?
    Waiting for the reply.
     
    #26

Share This Page