Social Media Project

Discussion in 'Big Data and Analytics' started by _2413, Oct 27, 2016.

  1. _2413

    _2413 Member
    Alumni

    Joined:
    Sep 1, 2016
    Messages:
    8
    Likes Received:
    0
    Hello,

    I am working on Social Media Project -Average time to answer questions.

    How do I solve this?
    Do we have to average of the column 'at'? (doesnot make sense).
    OR
    Do we have to find first answer time of each question and then find average?
    Your help is appreciated.
    Thank you,
    Mehul
     
    #1
  2. Richard_62

    Richard_62 Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Sep 9, 2016
    Messages:
    190
    Likes Received:
    35
    hey Mehul,

    Excellent question.

    In any data processing, start by knowing the data.
    I believe you'll want to look closer at both the at and the qt tags.
    You will also need to look up epoch time (/3600) as part of the problem.
    Then, yes, average.
    Even though you do not need to solve this in all the different forms, Pig, Hive and Map Reduce(Java), if you have the time I would suggest doing that so you can understand the differences between the 3.

    Happy learning and good luck
     
    #2
  3. _2413

    _2413 Member
    Alumni

    Joined:
    Sep 1, 2016
    Messages:
    8
    Likes Received:
    0
    Hey Richard,
    Thanks for the prompt reply. I am doing this exercise in Hive. I am thinking to get distinct qid at the same time min(at). That's how I get first respond of the question. After that, I extract time from epoch time and then average. Do you think, I am on the right track?
    Thanks again.
    Mehul
     
    #3
  4. Richard_62

    Richard_62 Well-Known Member
    Simplilearn Support Alumni

    Joined:
    Sep 9, 2016
    Messages:
    190
    Likes Received:
    35
    Hey Mehul,

    You are on the right track. You don't actually need the id of the line, just final time, startTime - stopTime.
    The first time I did it in Hive, I had 3 lines (NOT including create/load table/data).

    With an inner_querry and combining the steps, you can get it down to a single line. (an added challenge for you if you wish)

    Richard
     
    #4
    _2413 likes this.
  5. _2413

    _2413 Member
    Alumni

    Joined:
    Sep 1, 2016
    Messages:
    8
    Likes Received:
    0
    Thanks Richard.

    I have another question: Tags of questions which got answered within 1 hour.
    I tried to resolve and got approximately 12K tags which were answered in 1 hour. How can i submit this big result as a part of my project?
    Thanks again.
     
    #5
  6. Reuben L Owens

    Alumni

    Joined:
    Apr 6, 2017
    Messages:
    13
    Likes Received:
    0
    I think the posted solution to this needs to be corrected. It solves Avg Time by just getting the AVG of at... instead of converting both qt and at into DateTime then getting the MinutesBetween to get the time span between qt and at.
    Can you please confirm this?
     
    #6
  7. Reuben L Owens

    Alumni

    Joined:
    Apr 6, 2017
    Messages:
    13
    Likes Received:
    0
    Also, the project states, " Average time to answer questions". This led me to believe it was looking for the OVERALL average time to answer all the questions... not the avg time to answer each question.
     
    #7
  8. Megha_42

    Megha_42 Well-Known Member
    Simplilearn Support

    Joined:
    Dec 15, 2016
    Messages:
    206
    Likes Received:
    9
    Hi Reuben,

    If you can do both, then nothing like it!
    Hint:
    You can group your data by question ID and get the average answer time for each question. And further average all the results you've got.
    That way, you have both per-question and overall average time

    Happy learning!!
     
    #8

Share This Page