Big Data Hadoop and Spark Developers | Shikher| JUNE 22 - JULY 28

Discussion in 'Big Data and Analytics' started by Yogitha Koppal, Jun 23, 2019.

  1. Yogitha Koppal

    Yogitha Koppal Active Member
    Simplilearn Support

    Joined:
    Dec 28, 2018
    Messages:
    19
    Likes Received:
    1
    Hi All,

    This thread is for you to discuss the queries and concepts related to Big Data Hadoop and Spark Developers

    Happy Learning !!

    Regards,
    Team Simplilearn
     
    #1
  2. Shikher

    Shikher Member

    Joined:
    Jun 4, 2019
    Messages:
    2
    Likes Received:
    2
    Hi Team, please find generic commands used in last session.

    How to list a file in HDFS directory:

    hdfs dfs -ls /user/your_user_name/

    How to list sub-directories in HDFS directory:

    hdfs dfs -ls -R /user/your_user_name/

    How to make a directory in HDFS

    hdfs dfs -mkdir -p /user/your_user_name/Directory_name/Sub_Directory_Name

    How to remove a file in HDFS

    hdfs dfs -rm -r /user/your_user_name/Directory_name/file_name

    How to put a file from local to HDFS directory

    hdfs dfs -put file_name /user/your_user_name/Directory_Name
     
    #2
  3. _62919

    _62919 Member

    Joined:
    Jun 1, 2019
    Messages:
    12
    Likes Received:
    0
    hello!
    i am trying to import table through sqoop using the following command in terminal in virtual box.
    sqoop import --connect jdbc:mysql://localhost/training --username training --password training --table countries;

    I am getting this error:
    ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use n
    ear 'sqoop import --connect jdbc:mysql://localhost/training --username training --pas' at line 1

    please help me out.
     
    #3
  4. Shikher

    Shikher Member

    Joined:
    Jun 4, 2019
    Messages:
    2
    Likes Received:
    2
    You need to mention --target-dir location as well
     
    #4
  5. Lakshmi Priya_1

    Joined:
    Apr 27, 2019
    Messages:
    2
    Likes Received:
    0
    Hello Shikher , could you please share the WordCount.jar file? I Need to do the practice
     
    #5
  6. Surjit Choudhury

    Joined:
    Jun 11, 2019
    Messages:
    2
    Likes Received:
    0
    Hi Shikher,

    Do we have any similar PWD command in HDFS?
     
    #6
    Last edited: Jul 25, 2019
  7. Koyel Sinha Chowdhury

    Koyel Sinha Chowdhury Well-Known Member

    Joined:
    Feb 14, 2019
    Messages:
    54
    Likes Received:
    5
    Hi Surjit,
    hdfs dfs -pwd" and -cd both the cmds are not exist because there is no "working directory" concept in HDFS when you run commands from command line. Your home dir is always the prefix of the path, unless it starts from "/".
     
    #7
  8. Koyel Sinha Chowdhury

    Koyel Sinha Chowdhury Well-Known Member

    Joined:
    Feb 14, 2019
    Messages:
    54
    Likes Received:
    5
    #8
  9. _45037

    _45037 Member

    Joined:
    Oct 24, 2018
    Messages:
    2
    Likes Received:
    0
    Hi Shiker,
    I am in the middle of Project2-k-means project execution and stuck with the below mentioned error while importing "VectorAssembler" ML package.
    >>> from pyspark.ml.feature import VectorAssembler
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/ml/__init__.py", line 22, in <module>
    from pyspark.ml.base import Estimator, Model, Transformer
    File "/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/ml/base.py", line 21, in <module>
    from pyspark.ml.param import Params
    File "/opt/cloudera/parcels/SPARK2-2.2.0.cloudera2-1.cdh5.12.0.p0.232957/lib/spark2/python/pyspark/ml/param/__init__.py", line 26, in <module>
    import numpy as np
    ImportError: No module named numpy

    To Install numpy , I dont have root account authorization as it is asking for password for Sudo su -. Please advice.
    Also find the attached sreenshot of commands in Webconsole
     

    Attached Files:

    #9
  10. Lakshmi Priya_1

    Joined:
    Apr 27, 2019
    Messages:
    2
    Likes Received:
    0
    #10
  11. Fabian Julian

    Fabian Julian Member

    Joined:
    Jun 8, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Shikher,
    I am try run to run the below code (Kmeans example), but it throws an error since numpy module not available. How do I install numpy and math packages. please help.

    from numpy import array
    from math import sqrt

    from pyspark.mllib.clustering import KMeans, KMeansModel

    # Load and parse the data
    data = sc.textFile("data/mllib/kmeans_data.txt")
    parsedData = data.map(lambda line: array([float(x) for x in line.split(' ')]))

    # Build the model (cluster the data)
    clusters = KMeans.train(parsedData, 2, maxIterations=10, initializationMode="random")

    # Evaluate clustering by computing Within Set Sum of Squared Errors
    def error(point):
    center = clusters.centers[clusters.predict(point)]
    return sqrt(sum([x**2 for x in (point - center)]))

    WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
    print("Within Set Sum of Squared Error = " + str(WSSSE))

    # Save and load model
    clusters.save(sc, "target/org/apache/spark/PythonKMeansExample/KMeansModel")
    sameModel = KMeansModel.load(sc, "target/org/apache/spark/PythonKMeansExample/KMeansModel")


    Regards
    Fabian
     
    #11
  12. Fabian Julian

    Fabian Julian Member

    Joined:
    Jun 8, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Shikher,
    I am try run to run the below code (Kmeans example), but it throws an error since numpy module not available. How do I install numpy and math packages. please help.

    from numpy import array
    from math import sqrt

    from pyspark.mllib.clustering import KMeans, KMeansModel

    # Load and parse the data
    data = sc.textFile("data/mllib/kmeans_data.txt")
    parsedData = data.map(lambda line: array([float(x) for x in line.split(' ')]))

    # Build the model (cluster the data)
    clusters = KMeans.train(parsedData, 2, maxIterations=10, initializationMode="random")

    # Evaluate clustering by computing Within Set Sum of Squared Errors
    def error(point):
    center = clusters.centers[clusters.predict(point)]
    return sqrt(sum([x**2 for x in (point - center)]))

    WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
    print("Within Set Sum of Squared Error = " + str(WSSSE))

    # Save and load model
    clusters.save(sc, "target/org/apache/spark/PythonKMeansExample/KMeansModel")
    sameModel = KMeansModel.load(sc, "target/org/apache/spark/PythonKMeansExample/KMeansModel")


    Regards
    Fabian
     
    #12
  13. Koyel Sinha Chowdhury

    Koyel Sinha Chowdhury Well-Known Member

    Joined:
    Feb 14, 2019
    Messages:
    54
    Likes Received:
    5
    Hi Fabian,

    Kindly use the import commands on the program:

    >>import numpy as np
    >>import math
     
    #13
  14. Fabian Julian

    Fabian Julian Member

    Joined:
    Jun 8, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Koyel,

    Please find the screenshot attached. Import math worked, but unable to "import numpy". It says "No module named numpy". Please advise.

    Thanks
    Fabian screenshot_numpy.png
     
    #14
  15. muralidharan1307(4260201)

    Joined:
    Nov 19, 2014
    Messages:
    1
    Likes Received:
    0
    Hi Fabian,

    I hope you are using our Practice lab and the required files are not imported on our lab. So you will be unable to use those files. Kindly request you to use scala instead of Python on Spark-shell
     
    #15
  16. Fabian Julian

    Fabian Julian Member

    Joined:
    Jun 8, 2019
    Messages:
    4
    Likes Received:
    0
    Hi Koyel,

    I am trying to explore pyspark. Is there any way to import these (numpy/ math) packages to Virtual labs. Please advise

    Regards
    Fabian
     
    #16
  17. Koyel Sinha Chowdhury

    Koyel Sinha Chowdhury Well-Known Member

    Joined:
    Feb 14, 2019
    Messages:
    54
    Likes Received:
    5
    #17

Share This Page