Marketing Analysis Project

Discussion in 'Big Data and Analytics' started by Ritam Mishra, Sep 13, 2017.

  1. Ritam Mishra

    Ritam Mishra Member
    Alumni

    Joined:
    Jun 6, 2017
    Messages:
    4
    Likes Received:
    0
    Hi

    Every time I create Data Frame and run it, I get following error:

    17/09/13 20:19:37 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting joborg.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, ec2-52-202-171-254.compute-1.amazonaws.com): java.lang.NumberFormatException: For input string: """"age"""

    I tried removing header row after creating Data Frame and also before creating Data Frame. But keep getting error.

    Line of Codes are as following:

    scala> val file = sc.textFile("/user/ritamskmishra_gmail/bank-full edited.csv")

    file.take(2)
    res1: Array[String] = Array("""age"";""job"";""marital"";""education"";""default"";""balance"";""housing"";""loan"";""contact"";""day"";""month"";""duration"";""campaign"";""pdays"";""previous"";""poutcome"";""y""", "58;""management"";""married"";""tertiary"";""no"";2143;""yes"";""no"";""unknown"";5;""may"";261;1;-1;0;""unknown"";""no""")

    case class Bank(age:Int, job:String, marital:String, education:String, default:String, balance:Int, housing:String, loan:String, contact:String, day:Int, month:String, duration:Int, campaign:Int, pdays:Int, previous:Int, poutcome:String, y:String)

    val input_split = file.map(x => x.split(";"))

    val bankrdd = input_split.map(x => Bank(x(0).toInt, x(1), x(2), x(3), x(4), x(5).toInt, x(6), x(7), x(8), x(9).toInt, x(10), x(11).toInt, x(12).toInt, x(13).toInt, x(14).toInt, x(15), x(16)))

    val bankDF = bankrdd.toDF()

    bankDF.first()
     
    #1
  2. Sridhar_57

    Sridhar_57 Member
    Alumni

    Joined:
    Mar 22, 2017
    Messages:
    7
    Likes Received:
    0
    The input data has come extra spaces. You need to do Data cleansing by removing unwanted Quotes in the input data.
     
    #2
  3. Vinayak Nayak

    Vinayak Nayak New Member

    Joined:
    Aug 8, 2017
    Messages:
    1
    Likes Received:
    0
    How to remove unwanted quotes from the input data. Even I am getting the same kind of error. I have also used the second approach to load directly to a data frame. But I think the dataset needs to be cleaned. I ma not sure how to do that. It will be great if some one can help
     
    #3
  4. AnupriyaT

    AnupriyaT Well-Known Member
    Alumni

    Joined:
    May 29, 2017
    Messages:
    155
    Likes Received:
    33
    Sridhar,
    Yes, Your right. We need to clean the input data-set.

    Ritam,
    Please find the detailed explanation below,

    First of all, kudos to you, for what you've tried.

    There is no need to escape quotes and it is not good to do so because regular expressions are not the same everywhere. However, it is necessary to check if all the fields are enclosed in the same way, and the entire row is also enclosed by quotes.

    Here is the row that needs to be interpreted as the first

    "age;""job"";""marital"";""education"";""default"";""balance"";""housing"";""loan"";""contact"";""day"";""month"";""duration"";""campaign"";""pdays"";""previous"";""poutcome"";""y"""

    Include the quotes for the field age, and it should be able to interpret the data correctly. Replace the above line in the file, by the one below,

    """age"";""job"";""marital"";""education"";""default"";""balance"";""housing"";""loan"";""contact"";""day"";""month"";""duration"";""campaign"";""pdays"";""previous"";""poutcome"";""y"""

    Try your commands now and let us know what happened.

    Please refer to the below thread for more insights on the same,
    http://community.simplilearn.com/th...ect-1-for-submission-val-df.24098/#post-45742
     
    #4
  5. Ritam Mishra

    Ritam Mishra Member
    Alumni

    Joined:
    Jun 6, 2017
    Messages:
    4
    Likes Received:
    0
    Thank you all! I could create the Data Frame succesfully.
     
    #5
  6. AnupriyaT

    AnupriyaT Well-Known Member
    Alumni

    Joined:
    May 29, 2017
    Messages:
    155
    Likes Received:
    33
    Cheers Ritam :)
     
    #6
  7. NAVIN KUMAR_3

    NAVIN KUMAR_3 Member
    Alumni

    Joined:
    Jan 12, 2018
    Messages:
    5
    Likes Received:
    0
    I am not able to create data frame.
     
    #7

Share This Page