Big Data and Hadoop Administrator | Sarthak | Sep 22 - Oct 21

Discussion in 'Big Data and Analytics' started by ruhi.jain, Oct 13, 2018 at 12:25 PM.

  1. ruhi.jain

    ruhi.jain Well-Known Member
    Simplilearn Support

    Joined:
    Jun 7, 2018
    Messages:
    93
    Likes Received:
    0
    Hi Guys,

    This is a dedicated thread for the learners from the batch of Big Data and Hadoop Administrator, dated Sep 22 - Oct 21

    Please go ahead and post your queries here:)

    Happy Learning!
     
    #1
  2. ruhi.jain

    ruhi.jain Well-Known Member
    Simplilearn Support

    Joined:
    Jun 7, 2018
    Messages:
    93
    Likes Received:
    0
  3. Miitesh devle

    Miitesh devle Member
    Alumni

    Joined:
    Jan 18, 2018
    Messages:
    5
    Likes Received:
    0
  4. Miitesh devle

    Miitesh devle Member
    Alumni

    Joined:
    Jan 18, 2018
    Messages:
    5
    Likes Received:
    0
    Error while working with the lab. difficult to cope up.:
    scala> movieDF.filter($"year" >= 1952 && $"year" <= 1968 && $"subject" === "Horror").show()
    org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://ip-10-0-1-20.ec2.internal:8020/user/miiteshdevle_gmail/Movies
    at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:314)
    at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
    at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2853)
    at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2153)
    at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2153)
    at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2837)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2836)
    at org.apache.spark.sql.Dataset.head(Dataset.scala:2153)
    at org.apache.spark.sql.Dataset.take(Dataset.scala:2366)
    at org.apache.spark.sql.Dataset.showString(Dataset.scala:245)
    at org.apache.spark.sql.Dataset.show(Dataset.scala:644)
    at org.apache.spark.sql.Dataset.show(Dataset.scala:603)
    at org.apache.spark.sql.Dataset.show(Dataset.scala:612)
    ... 48 elided
     
    #4
  5. _31801

    _31801 New Member

    Joined:
    Jun 8, 2018
    Messages:
    1
    Likes Received:
    0
    ticket: #00285203


    I uploaded files using FTP to HDP sandbox local but not able to see it in root login. I also need to copy this file to HDFS. PL resolve ASAP as I am not able to do practicals. Thanks!
     
    #5
  6. Shalini Rana

    Shalini Rana Well-Known Member
    Simplilearn Support

    Joined:
    Jul 24, 2017
    Messages:
    307
    Likes Received:
    16
    Hi Learner,

    The response has been sent on the ticket number.

    Thanks!
     
    #6

Share This Page