I'm working in Project 1 for submission.val df = ...

Discussion in 'Big Data and Analytics' started by shefali bisht, Jul 26, 2017.

  1. shefali bisht

    shefali bisht Member

    Joined:
    May 24, 2017
    Messages:
    10
    Likes Received:
    0
    I'm working in Project 1 for submission.
    val df = sqlContext.read.json("hdfs:///user/shefalibisht00_gmail/project1/marketingData.json")

    This is giving error.
    java.io.IOException: No input paths specified in job


    // I have provided the right path of the JSON file which is in the HDFS.

    Still its showing me error. help me out!
     
    #1
  2. AnupriyaT

    AnupriyaT Well-Known Member
    Alumni

    Joined:
    May 29, 2017
    Messages:
    155
    Likes Received:
    33
    Hi Shefali,

    Greetings from Simplilearn!

    The file provided in the path (/user/shefalibisht00_gmail/project1/marketingData) is not a json file. Hence the above error.
    Kindly follow the below steps to get the desired output,
    Step 1 : login to your web console of cloud lab
    Step 2 : Run this command : spark-shell --packages com.databricks:spark-csv_2.10:1.4.0
    Step 3 : You will now enter into spark shell.
    Scala>
    Now run the below commands,
    sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").option("delimiter",";").load("/user/shefalibisht00_gmail/project1/marketingData")

    df.show()

    Please find the output screenshot attached.

    Hope this helps :)

    Regards,
    Anupriya
     

    Attached Files:

    #2
    Sameer Aggarwal likes this.
  3. Srividya Ramaraju

    Joined:
    Feb 20, 2017
    Messages:
    3
    Likes Received:
    0
    Hi Anupriya,

    I tried above commands but did the achieve the output as per your screenshot.

    Do I have to convert csv file Text to columns and mention the data types in order to achieve the above output as per your screenshot?

    Please see below screenshot and suggest if I have to follow this process.

    upload_2017-8-17_0-32-2.png



    Thanks
    Srividya
     
    #3
  4. AnupriyaT

    AnupriyaT Well-Known Member
    Alumni

    Joined:
    May 29, 2017
    Messages:
    155
    Likes Received:
    33
    Hi Srividya,

    Greetings for the day!

    It was nice speaking to you.

    The above issue is because, you have not provided proper input data to your bank-full.csv file. Hence the above error.

    Please find the input data-set attached. Get your hands full with the input data-set, commands and steps I've mentioned above, and I'm sure you'll be able to get the desired output.

    Happy Learning!
     

    Attached Files:

    #4
  5. AADHARSHA MANOHARAN

    Joined:
    Apr 29, 2017
    Messages:
    3
    Likes Received:
    0
     
    #5
  6. AADHARSHA MANOHARAN

    Joined:
    Apr 29, 2017
    Messages:
    3
    Likes Received:
    0
    Hi AnupriyaT,

    Could you please explain this command : sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("inferSchema","true").option("delimiter",";").load("/user/shefalibisht00_gmail/project1/marketingData")
     
    #6
  7. Sameer Aggarwal

    Alumni

    Joined:
    May 23, 2017
    Messages:
    11
    Likes Received:
    0
    Hi Anupriya,

    I did exactly the same as mentioned in your post, but still the data is not loading properly.
    As much I can understand, the header is not delimiting after semicolon(;), therefore it is taking the whole header as single column. And thus the data in only single column.

    Attaching SS of the same , please help on this.
     

    Attached Files:

    #7
    Last edited: Nov 11, 2017
  8. _13121

    _13121 Active Member

    Joined:
    Oct 2, 2017
    Messages:
    17
    Likes Received:
    0
    Hi Anupriya,

    I tried the above command which you have given, but it does not give me output you mentioned.
    Seems the command is not able to recognize the demiliter and it is taking as single column. I suspect that the input file has some issue.

    Could you please help in debug this.
    Delimit_Issue.JPG
     
    #8
  9. _13121

    _13121 Active Member

    Joined:
    Oct 2, 2017
    Messages:
    17
    Likes Received:
    0

    Hi Again,

    After trying multiple combinations of data cleansing, reviewing other posts, Scala documents and spending some time on google articles,I was able to get the desired output, but had to make some changes in the format of the test data of the Project 1, by removing quotes and replacing semi-colon with an Under-score. Am not sure if we are suppose to do the same, could you confirm if its fine.
    I had referred to this post (http://community.simplilearn.com/threads/marketing-analysis-project.26528/) for data cleansing.
    Attaching the snaps of the same. It would be great if you can confirm this soon.
    Test_After Data Cleansing.JPG

    Test_Data Cleansing.JPG

    It seems that in Spark 2.1, some data is processed correctly without the data cleansing but still the desired output is not achieved (Forgot to take snap of it). To add on this in Spark 1.6 (for me atleast) Semi-colon is not getting properly delimited, am guessing its due to the quotes & semi-colon format but if I can get some technical person to review this data maybe that person can explain in better way.


    For now, please let us know if we can do data cleansing ourselves and consume it into Spark for project basis.
    Awaiting a positive response :)
     
    #9
  10. AnupriyaT

    AnupriyaT Well-Known Member
    Alumni

    Joined:
    May 29, 2017
    Messages:
    155
    Likes Received:
    33
    Hi Vaibhav,

    Yes, you can proceed. It's just that you have changed the delimiter from ";" to "_" and that's not an issue.
     
    #10
  11. AnupriyaT

    AnupriyaT Well-Known Member
    Alumni

    Joined:
    May 29, 2017
    Messages:
    155
    Likes Received:
    33
    Hi Sameer,

    The issue is not in the code. Please check your input data-set is proper.
    The below thread will give insights on the same,
    http://community.simplilearn.com/threads/marketing-analysis-project.26528/
     
    #11
  12. AnupriyaT

    AnupriyaT Well-Known Member
    Alumni

    Joined:
    May 29, 2017
    Messages:
    155
    Likes Received:
    33
    Hi Aadharsha,

    The above spark code is trying to read a file(say CSV) as a data frame.
    I have restructured the code to help you understand,
    val df = sqlContext.read
    .format("com.databricks.spark.csv")
    .option("header","true")
    .option("inferSchema","true")
    .option("delimiter",";")
    .load("/user/shefalibisht00_gmail/project1/marketingData")
     
    #12
  13. Md Masood Ali

    Md Masood Ali New Member

    Joined:
    Jul 22, 2017
    Messages:
    1
    Likes Received:
    0
    Data is not uploaded properly. See the attached screenshot. Please help.
     

    Attached Files:

    #13
  14. Sameer Aggarwal

    Alumni

    Joined:
    May 23, 2017
    Messages:
    11
    Likes Received:
    0
    Hi Anupriya,

    I am still unable to load data properly.
    I did what you mentioned to clean the data, added qoutes to "age" field, and also added qoutes to 1st line.

    But still getting the same data as earlier but showing extra quotes also.

    Kindly help on this.
     

    Attached Files:

    #14
  15. Karthik Shivana

    Karthik Shivana Moderator
    Simplilearn Support Alumni

    Joined:
    Apr 1, 2016
    Messages:
    688
    Likes Received:
    31
    Hi Sameer,

    Please refer the below recording and let us know, if you need further assistance on the same.

    https://simplilearnsolutions.webex....lsr.php?RCID=e0e046ef692900f0c9a74cf3258f47fd
     
    #15
  16. Sameer Aggarwal

    Alumni

    Joined:
    May 23, 2017
    Messages:
    11
    Likes Received:
    0
    Hi Karthik,

    Thanks for the reply. I am going through the video but I am unable to view video i.e. it is just audio.
    It is difficult to figure out what exactly is being done to clean the data so that the data frame is created properly.

    Kindly provide the recording with video & audio both.
     
    #16
  17. AnupriyaT

    AnupriyaT Well-Known Member
    Alumni

    Joined:
    May 29, 2017
    Messages:
    155
    Likes Received:
    33
    Hi Sameer,

    Please re-download using the above link. It has both audio and video. Ensure you have a Windows media player or any supportable player.
     
    #17
  18. Sameer Aggarwal

    Alumni

    Joined:
    May 23, 2017
    Messages:
    11
    Likes Received:
    0
    Hi Anupriya,

    NO, the file downloaded again still has AUDIO only. I tried it on Windows Media Player also but hard luck.
    I request please check at your end and provide the correct file.

    Thanks
     
    #18
  19. AnupriyaT

    AnupriyaT Well-Known Member
    Alumni

    Joined:
    May 29, 2017
    Messages:
    155
    Likes Received:
    33
    Hi Sameer,

    Please try again with a different player once. It is working fine from our end. I double checked it.

    If you still couldn't figure out. We can connect remotely.
     
    #19
  20. Ganesan Udayasooriyan

    Joined:
    Oct 20, 2017
    Messages:
    2
    Likes Received:
    0
    hi anupriya,

    the csv file given for download to student is in wrong format , each field is within two double quotes for eg :""income"" whereas it should be "income". i have manually edited the input csv file and now it is working properly. kindly request your team to provide dataset in correct format going forward, as it is waste of time . it just took lot of time to understand the problem and come up with solution..... i am trying to uploaded the correct csv file here, but its not possible it seems

    sample
    -----------

    age;"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y"
    58;"management";"married";"tertiary";"no";2143;"yes";"no";"unknown";5;"may";261;1;-1;0;"unknown";"no"
    44;"technician";"single";"secondary";"no";29;"yes";"no";"unknown";5;"may";151;1;-1;0;"unknown";"no"
    33;"entrepreneur";"married";"secondary";"no";2;"yes";"yes";"unknown";5;"may";76;1;-1;0;"unknown";"no"
    47;"blue-collar";"married";"unknown";"no";1506;"yes";"no";"unknown";5;"may";92;1;-1;0;"unknown";"no"
    33;"unknown";"single";"unknown";"no";1;"no";"no";"unknown";5;"may";198;1;-1;0;"unknown";"no"
    35;"management";"married";"tertiary";"no";231;"yes";"no";"unknown";5;"may";139;1;-1;0;"unknown";"no"
     
    #20
  21. _26090

    _26090 Member
    Alumni

    Joined:
    Mar 15, 2018
    Messages:
    3
    Likes Received:
    0
    I am doing the Banking Project but I am not getting to how to load the file..
     
    #21
  22. Neha_Pandey

    Neha_Pandey Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Jun 7, 2018
    Messages:
    95
    Likes Received:
    0
    #22

Share This Page