Machine Learning - Label Encoding

Discussion in 'Masters Program - Customers only' started by Dipesh_Deb, Jan 15, 2020.

  1. Dipesh_Deb

    Dipesh_Deb Guest

    Machine Learning
    1) In Label Encoding - should I need to MERGE Train & Test data
    if so,
    2) From Train Dataframe as I drop some columns whose Variance is 0, I must to do the same with the Test Dataframe- is it?
     
    #1
  2. Dipesh Deb

    Dipesh Deb Member
    Alumni

    Joined:
    Oct 19, 2015
    Messages:
    13
    Likes Received:
    0
    I need to update my queries as follows (Please help):
    Machine Learning
    1) There is different '0' varinace column(s) among Train & Test DataFrame

    Train DataFrame Column: ['X11' 'X93' 'X107' 'X233' 'X235' 'X268' 'X289' 'X290' 'X293' 'X297' 'X330' 'X347']
    Test DataFrame Column: ['X257' 'X258' 'X295' 'X296' 'X369']

    If I combine both DataFrame there is NO '0' variance column(s).

    Now, Question is for best model what to do:

    (1) Test DataFrame '0' variance columns - drop from both DataSet > ['X257' 'X258' 'X295' 'X296' 'X369']

    (2) Train DataFrame '0' variance columns - drop from both DataSet > ['X11' 'X93' 'X107' 'X233' 'X235' 'X268' 'X289' 'X290' 'X293' 'X297' 'X330' 'X347']

    (3) [Train] + [Test] DataFrame '0' varinace columns need to drop from both DataFrame > ['X257' 'X258' 'X295' 'X296' 'X369'] + ['X11' 'X93' 'X107' 'X233' 'X235' 'X268' 'X289' 'X290' 'X293' 'X297' 'X330' 'X347']

    (4) Combine both DataFrame first and then drop if any '0' variance column(s) found.
    (Here there is NO '0' variance column(s) if I combine both DataFrame first)


    2) In Label Encoding - should I need to MERGE Train & Test data first and then Label Encode
     
    #2
  3. Yajna Brata Brahma

    Joined:
    Jun 25, 2019
    Messages:
    4
    Likes Received:
    0
    HI Dibesh,

    Label encoding has to be done on the object columns after we drop the zero variance:

    #Apply label encoder.
    from sklearn.preprocessing import LabelEncoder
    le = LabelEncoder()
    for col in label_columns:
    le.fit(train_df[col].append(test_df[col]).values)
    train_df[col] = le.transform(train_df[col])
    test_df[col] = le.transform(test_df[col])
     
    #3

Share This Page