Machine Learning | Kaustubh Sakhare

Discussion in 'Big Data and Analytics' started by Nishant_Singh, Mar 29, 2020.

  1. Nishant_Singh

    Nishant_Singh Well-Known Member
    Staff Member Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    307
    Likes Received:
    107
    #1
    Saravpreet Singh likes this.
  2. Subham Sarkar

    Subham Sarkar Member

    Joined:
    Nov 8, 2019
    Messages:
    11
    Likes Received:
    0
  3. Aditya Vyas_1

    Aditya Vyas_1 Active Member

    Joined:
    Jan 9, 2020
    Messages:
    19
    Likes Received:
    0
    hello all!
     
    #3
  4. Lu Zhao

    Lu Zhao New Member

    Joined:
    Jun 30, 2016
    Messages:
    1
    Likes Received:
    0
    hello
     
    #4
  5. Aditya Vyas_1

    Aditya Vyas_1 Active Member

    Joined:
    Jan 9, 2020
    Messages:
    19
    Likes Received:
    0
    Hello Kaustubh

    Kindly help me with following two queries:


    1)is there any comprehensive list where we can go through all kinds of packages we can download.
    ex: from sklearn.metrics, sklearn.linear_model...how many more are there?


    2) the .describe function gives output as 25%, 50%, 75% and max...what does it mean? what is the relevance of these figures...I have not been able to understand.
     
    #5
  6. Gautham M C

    Gautham M C New Member

    Joined:
    Oct 25, 2019
    Messages:
    1
    Likes Received:
    0
    I have downloaded the recordings and was going through, looks like Session 4 recording is updated in place of Session 3. Please check and update session 3 recordings
     
    #6
  7. Aditya Vyas_1

    Aditya Vyas_1 Active Member

    Joined:
    Jan 9, 2020
    Messages:
    19
    Likes Received:
    0
    in the following code(for normalization before performing logistic regression)...given along with the lesson 4 insurance claim project. why has the mean been calculated only on train data? since we are normalizing the test data too..the mean and stds should have been used of test data only.
    can we also take mean and stds of entire data??
    kindly explain this.

    upload_2020-4-8_16-25-57.png
     
    #7
  8. PRATYUSHA VUNNAVA

    Customer

    Joined:
    Jan 29, 2020
    Messages:
    1
    Likes Received:
    0
    Hi,

    I see no header row in Boston dataset. How do we assign column names in such case ?
     
    #8
  9. Subham Sarkar

    Subham Sarkar Member

    Joined:
    Nov 8, 2019
    Messages:
    11
    Likes Received:
    0
  10. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0
    Hello Shubham
     
    #10
  11. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0
    Hello Aditya
     
    #11
  12. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0
    Hello Lu Zhao
     
    #12
  13. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0

    in our database, what is the probability distribution of our sample values

    ex. lets say if I have total 450 samples

    out of it how many samples are coveying
    25
    50
    75 % of distribution is given here
     
    #13
  14. Aditya Vyas_1

    Aditya Vyas_1 Active Member

    Joined:
    Jan 9, 2020
    Messages:
    19
    Likes Received:
    0
    WhatsApp Image 2020-04-18 at 15.19.53.jpeg


    this is the code where we compressed tiger.png...
    pls explain what this for loop is doing.
     
    #14
  15. Raghvendra Singh Chauhan

    Raghvendra Singh Chauhan Active Member

    Joined:
    Sep 30, 2019
    Messages:
    18
    Likes Received:
    0
    Sir could you tell the cyclic seasonal pattern part and decomposition part .
     
    #15
  16. Raghvendra Singh Chauhan

    Raghvendra Singh Chauhan Active Member

    Joined:
    Sep 30, 2019
    Messages:
    18
    Likes Received:
    0
    hello, sir could you please review my query. and tell me about cyclic seasonal patterns and decomposition part in time series.
     
    #16
  17. Raghvendra Singh Chauhan

    Raghvendra Singh Chauhan Active Member

    Joined:
    Sep 30, 2019
    Messages:
    18
    Likes Received:
    0
    data['Month'] = data['Month'].apply(lambda x: dt(int(x[:4]), int(x[5:]), 15))

    sir could you elaborate this line as I have used 3 in place of int[:4] then get output different which I attached and if I use 6 0r any number in place of int([5:]) then it through error.
    ValueError: month must be in 1..12

    why is it so could elaborate the meaning of this line and why it throwing error.
     

    Attached Files:

    #17
  18. Nishant_Singh

    Nishant_Singh Well-Known Member
    Staff Member Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    307
    Likes Received:
    107
  19. Shuvankar Goutam

    Shuvankar Goutam Active Member

    Joined:
    May 6, 2019
    Messages:
    15
    Likes Received:
    0
    Hi can I have the final assessment supporting video link for the Mercedes-Benz Greener Manufacturing project pls
     
    #19
  20. KUNTAL CHOWDHURY

    Joined:
    Oct 14, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Kaustubh,

    ML - Project 1 - Mercedes-Benz Greener Manufacturing

    # 1

    after doing PCA i am getting the below :
    print(X_train.shape) =========================> (6734, 579)
    print(X_train_transformed.shape) ===============> (6734, 138)
    print(X_test.shape) ==========================> (1684, 579)
    print(X_test_transformed.shape) =================> (1684, 138)

    Here the dimension reduced to 138, but should i pass all these 138 columns to Regression Model ??????? could you please suggest here.
    NOTE : all these 138 columns are binary ( 1 or 0 )

    # 2

    Also should we build Ridge / Lasso / ElasticNet Regression with XGBoost ?????

    Thanks,
    Kuntal
     
    #20
  21. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0
    Dear Raghvendra I have added the document in addtional document folder on Google Drive/

    In the pdf there is additional link of the website given for derivation on Time Series Analysis
     
    #21
  22. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0

    Dear Aditya

    1. there is no particular guideline , limitation on various packages used in this regards.

    Normally for the curriculum sklearn package can be explored to the great details.

    there are few things to be taken into account

    1. Packages required for incorporating various ML, regression, classifiers
    2. Packages required for preprocessing, database selection, cross validation
    3. Packages required for performance analysis




    2) describe function gives descriptive statistics about the database.
    as Ex. how much of the data distributed over 25,50, 75 % of the distribution.
    The same can be explored by using various plots too
     

    Attached Files:

    #22
  23. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0

    Thanks for observation Guatham

    will share the concern to the coordination team
     
    #23
  24. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0
    Dea

    Dear Aditya we have selected to generate no of cluster 1-16

    Once the pixel value lies in the the range of the cluster,

    the loop will give us the optimum number of clusters required for the database.
     
    #24
  25. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0
    yes Aditya we can normalize the test data too
     
    #25
  26. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0
    Replied also check the google drive addtional material drive
     
    #26
  27. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0
    Dear Kuntal

    PCA part is working appropirately

    the columns can be sorted to be passed to the regression analysis.

    I would recommend to see the multicollinearity present in the database using technique like heat map

    and then you may apply Ridge/Lass with XGBoost
     
    #27
  28. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0
    The lambda function is written to add the date in the dateformat.
    at the index position 5.

    if you change the format, it well give an error, wrong answer.

     
    #28
  29. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0
    Dear Pratyusha

    explore the syntax

    Cov= pd.read_csv('Filename')
    Frame=pd.DataFrame([Cov], columns =["Column1","Column2","Column3","Column4"])

    Frame.to_csv("Outputfilename", sep='\t')
     
    #29
  30. Kaustubh Sakhare

    Kaustubh Sakhare Active Member
    Alumni

    Joined:
    Jul 2, 2019
    Messages:
    24
    Likes Received:
    0
    ca
    n you share the output of the previous line to this cell

    or try to print

    df[pickup_datetime]

    hope its working appropriately
     
    #30
  31. KUNTAL CHOWDHURY

    Joined:
    Oct 14, 2019
    Messages:
    4
    Likes Received:
    0
    Thanks Kaustubh.
     
    #31
  32. Raghvendra Singh Chauhan

    Raghvendra Singh Chauhan Active Member

    Joined:
    Sep 30, 2019
    Messages:
    18
    Likes Received:
    0
    sir in Mercedes Benz project two types of data one is trained and one is a test do we need to split it further in x_train, y_train, or x_test or y_test.
     
    #32
  33. Aditya Vyas_1

    Aditya Vyas_1 Active Member

    Joined:
    Jan 9, 2020
    Messages:
    19
    Likes Received:
    0
    Hello Kaustubh

    I had submitted the project on 2-May-2020. It has still not been assessed and the certificate is still locked. Please have the project assessed so that my certificate is unlocked.
     
    #33
  34. Shuvankar Goutam

    Shuvankar Goutam Active Member

    Joined:
    May 6, 2019
    Messages:
    15
    Likes Received:
    0
    Hi Kaustubh

    Can you pls guide me how to handle Outliers by winsorizer, what library is required for that? can u pls provide a demo code in reply?
     
    #34

Share This Page