ML

Discussion in 'General Discussions' started by _44015, Oct 19, 2018.

  1. _44015

    _44015 New Member

    Joined:
    Oct 15, 2018
    Messages:
    1
    Likes Received:
    0
    is there any alternative for OneHotEncoder
     
    #1
  2. Nishant_Singh

    Nishant_Singh Well-Known Member
    Simplilearn Support

    Joined:
    Aug 1, 2018
    Messages:
    166
    Likes Received:
    9
    Hi Learner,

    As per my knowledge, there are 7 different encoding methods which are widely used. They are as follows :

    1. Ordinal: as described above

    2. One-Hot: one column per category, with a 1 or 0 in each cell for if the row contained that column’s category

    3. Binary: first the categories are encoded as ordinal, then those integers are converted into binary code, then the digits from that binary string are split into separate columns. This encodes the data in fewer dimensions that one-hot, but with some distortion of the distances.

    4. Sum: compares the mean of the dependent variable for a given level to the overall mean of the dependent variable over all the levels. That is, it uses contrasts between each of the first k-1 levels and level k In this example, level 1 is compared to all the others, level 2 to all the others, and level 3 to all the others.

    5. Polynomial: The coefficients taken on by polynomial coding for k=4 levels are the linear, quadratic, and cubic trends in the categorical variable. The categorical variable here is assumed to be represented by an underlying, equally spaced numeric variable. Therefore, this type of encoding is used only for ordered categorical variables with equal spacing.

    6. Backward Difference: the mean of the dependent variable for a level is compared with the mean of the dependent variable for the prior level. This type of coding may be useful for a nominal or an ordinal variable.

    7. Helmert: The mean of the dependent variable for a level is compared to the mean of the dependent variable over all previous levels. Hence, the name ‘reverse’ being sometimes applied to differentiate from forward Helmert coding.

    You can use any one of the above as per your requirement.

    There are others as well but I believe that one the above 7 will help you. In data science, there are many ways of approaching a problem so apart from my solution you can might use any other approach as well as it completely depends on you.

    I hope that this will help you.

    Regards,
    Nishant Singh
    Global Teaching Assistant
     
    #2

Share This Page