Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

WIPRO - BIGDATA ACADEMY

Lakshmi Prasanna

Member
Customer
Hi DeshDeep,

I have completed assignments on Map Reduce - Word Count, Word Length and Doctor Patient and
on Scala - Arrays and Flatmap. Please find the programs, input and output files attached.
 

Attachments

  • MapReduce - Assignments.zip
    19.7 KB · Views: 10
  • Scala - Assignments.zip
    153.5 KB · Views: 10

Kasim Khan H

Member
Customer
Attached screenshots for creating RDD from collections using parallelize in scala and Flat map to find distinct words in a text
 

Attachments

  • flapmap.png
    flapmap.png
    56.7 KB · Views: 21
  • parallize.png
    parallize.png
    26.1 KB · Views: 19

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

I have gone through your assignment and it looks good as you were able to implement the concepts in your assignments.
 

MEGHA AHLUWALIA

Member
Customer
Hi Deshdeep,

PFA screenhots for flatmap word count and rdd from collections assignment as discussed yesterday!

Thanks,
 

Attachments

  • flatmapwrdcount.jpg
    flatmapwrdcount.jpg
    166.2 KB · Views: 16
  • rddfromcollectn.jpg
    rddfromcollectn.jpg
    84.1 KB · Views: 16

Srinivas N_1

New Member
Alumni
Customer
MR Assignments.
 

Attachments

  • DoctorPatientCount .txt
    2.2 KB · Views: 3
  • output.jpg
    output.jpg
    111.4 KB · Views: 15
  • WordLengthOccurrence .txt
    2.3 KB · Views: 1

Srinivasulu Kuruva

Member
Customer
Hi Deshdeep,

Please find the screenshot of the Day 6 assignment.

Regards,
Srinivas
 

Attachments

  • Assignment_1(Day6).JPG
    Assignment_1(Day6).JPG
    71.5 KB · Views: 18
  • Assignment_2(Day6).JPG
    Assignment_2(Day6).JPG
    60.1 KB · Views: 17
  • Assignment_3(Day6).JPG
    Assignment_3(Day6).JPG
    24.3 KB · Views: 18

Ravi_272

Member
Customer
Hi all,

A very happy new year to all.

Below are the solution to assignment given for Batch 2 on Jan 3rd.

1) Reverse a map from (k,v) to (v,k)

Code:
scala> val userCountMap = sc.textFile("problem2.txt").filter(line1 => line1.contains(".html")).map(line2 => (line2.split(" ")(2), 1))
userCountMap: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[19] at map at <console>:27
scala> val UserCountRedMap = userCountMap.reduceByKey(_+_)
UserCountRedMap: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[20] at reduceByKey at <console>:29
scala> val invertedMap = UserCountRedMap.map({case(k, v) => v -> k})
invertedMap: org.apache.spark.rdd.RDD[(Int, String)] = MapPartitionsRDD[21] at map at <console>:31

2) Sort the above map by key and find top elements

Code:
scala> val sortedMap = invertedMap.sortByKey()
sortedMap: org.apache.spark.rdd.RDD[(Int, String)] = ShuffledRDD[24] at sortByKey at <console>:33
scala> sortedMap.top(4)
res5: Array[(Int, String)] = Array((2,69827), (2,2489), (2,21475), (1,4712))

3) Group the values for each key for the above map

Code:
scala> val groupedMap = sortedMap.groupByKey()
groupedMap: org.apache.spark.rdd.RDD[(Int, Iterable[String])] = MapPartitionsRDD[35] at groupByKey at <console>:35
scala> groupedMap.collect
res6: Array[(Int, Iterable[String])] = Array((1,CompactBuffer(4712)), (2,CompactBuffer(69827, 2489, 21475)))
 

Lakshmi Prasanna

Member
Customer
Hi DeshDeep,

I have completed assignment on Scala - Swap and Sort. Please find the attachment.

Regards
Prasanna
 

Attachments

  • Swap n Sort.JPG
    Swap n Sort.JPG
    107.7 KB · Views: 19

MEGHA AHLUWALIA

Member
Customer
Hi Deshdeep ,
can you please verify the output for sortByKey() and groupByKey().

I could not do with complete file as I am unable to access the google drive that Shivang showed yesterday. Can you please specify where in LMS I can download all the content?

thanks,
 

Attachments

  • sort and group.png
    sort and group.png
    142.8 KB · Views: 16

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi Deshdeep ,
can you please verify the output for sortByKey() and groupByKey().

I could not do with complete file as I am unable to access the google drive that Shivang showed yesterday. Can you please specify where in LMS I can download all the content?

thanks,

Hi Megha,

The file is available in Learning Tools Download sections inside Practice Files zip file.

Download that file and you will find all the details inside with the files.
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
How to get rdd size?
Hi Ajaya,

If you are simply looking to count the number of rows in the rdd, do:

val distFile = sc.textFile(file)
println(distFile.count)
If you are interested in the MBs, you can use the SizeEstimator:

import org.apache.spark.util.SizeEstimator

println(SizeEstimator.estimate(distFile))
 

VATTIPALLI ARUNA

Active Member
Customer
Hi,

Trying to get the flume use case discussed execute but, looks like doing some basic mistake. Could anyone check and help.

[aruna.karri_wipro@ec2-52-86-42-143 ~]$ flume-ng agent --conf /etc/flume-ng/conf \> --conf-file /flume1.conf \> --name myAgent -Dflume.root.logger=INFO,console
Info: Including Hadoop libraries found via (/usr/bin/hadoop) for HDFS accessInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-api-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/tez/lib/slf4j-api-1.7.5.jar from classpathInfo: Including HBASE libraries found via (/usr/bin/hbase) for HBASE accessInfo: Excluding /usr/hdp/2.4.0.0-169/hbase/lib/slf4j-api-1.7.7.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-api-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/tez/lib/slf4j-api-1.7.5.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-api-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/zookeeper/lib/slf4j-api-1.6.1.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/zookeeper/lib/slf4j-log4j12-1.6.1.jar from classpathInfo: Including Hive libraries found via () for Hive access+ exec /usr -Xmx20m -Dflume.root.logger=INFO,console -cp '/etc/flume-ng/conf:/usr/hdp/2.4.0.0-169/flume/lib/*:/usr/hdp/2.4.0.0-169/hadoop/conf:/usr/hdp/2.4.0.0-169/hadoop/lib/activation-1.1.jar:/usr/hdp/2.4.0.0-169/hadoop/lib/apacheds-

adoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native org.apache.flume.node.Application --conf-file /home/aruna.karri_wipro/flume1.conf --name myAgent/usr/hdp/2.4.0.0-169/flume/bin/flume-ng.distro: line 247: /usr: is a directory/usr/hdp/2.4.0.0-169/flume/bin/flume-ng.distro: line 247: exec: /usr: cannot execute: Success


---------------------------------------------------------------------------------------------------------------------------------------------------------------

[aruna.karri_wipro@ec2-52-86-42-143 ~]$ ls
derby.log
flume
flume1.conf
num123
[aruna.karri_wipro@ec2-52-86-42-143 ~]$ pwd
/home/aruna.karri_wipro
[aruna.karri_wipro@ec2-52-86-42-143 ~]$ cat flume1.conf
myAgent.sources=anyName-source
myAgent.sinks=hdfs
myAgent.channels=memory
myAgent.sources.anyName-source.type=spooldir
myAgent.sources.anyName-source.spooldir=/home/aruna.karri_wipro/flume/source
myAgent.sources.anyName-source.channels=memory-channelmyAgent.sinks.hdfs.type=hdfs
myAgent.sinks.hdfs.hdfs.path=/flume/log
myAgent.sinks.hdfs.channel=memory-channel
myAgent.sinks.hdfs.hdfs.rollInterval=0
myAgent.sinks.hdfs.hdfs.rollSize=524288
myAgent.sinks.hdfs.hdfs.rollCount=0
myAgent.sinks.hdfs.hdfs.fileType=DataStream
myAgent.channels.memory.type=memory
myAgent.channels.memory.capacity=1000
myAgent.channels.memory.transactionCapacity=1000
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

This is a gentle reminder to be certified on Big Data hadoop and Spark Developer course. Please make sure that all these below requirements are met:

  • 85% of attendance.
  • 85% of e-learning.
  • 4 Simulation exam.
  • 2 Projects.
Please let me know in case of any queries or if you need any help.
 

VATTIPALLI ARUNA

Active Member
Customer
Hi,

Trying to get the flume use case discussed execute but, looks like doing some basic mistake. Could anyone check and help.

[aruna.karri_wipro@ec2-52-86-42-143 ~]$ flume-ng agent --conf /etc/flume-ng/conf \> --conf-file /flume1.conf \> --name myAgent -Dflume.root.logger=INFO,console
Info: Including Hadoop libraries found via (/usr/bin/hadoop) for HDFS accessInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-api-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/tez/lib/slf4j-api-1.7.5.jar from classpathInfo: Including HBASE libraries found via (/usr/bin/hbase) for HBASE accessInfo: Excluding /usr/hdp/2.4.0.0-169/hbase/lib/slf4j-api-1.7.7.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-api-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/tez/lib/slf4j-api-1.7.5.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-api-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/zookeeper/lib/slf4j-api-1.6.1.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/zookeeper/lib/slf4j-log4j12-1.6.1.jar from classpathInfo: Including Hive libraries found via () for Hive access+ exec /usr -Xmx20m -Dflume.root.logger=INFO,console -cp '/etc/flume-ng/conf:/usr/hdp/2.4.0.0-169/flume/lib/*:/usr/hdp/2.4.0.0-169/hadoop/conf:/usr/hdp/2.4.0.0-169/hadoop/lib/activation-1.1.jar:/usr/hdp/2.4.0.0-169/hadoop/lib/apacheds-

adoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native org.apache.flume.node.Application --conf-file /home/aruna.karri_wipro/flume1.conf --name myAgent/usr/hdp/2.4.0.0-169/flume/bin/flume-ng.distro: line 247: /usr: is a directory/usr/hdp/2.4.0.0-169/flume/bin/flume-ng.distro: line 247: exec: /usr: cannot execute: Success


---------------------------------------------------------------------------------------------------------------------------------------------------------------

[aruna.karri_wipro@ec2-52-86-42-143 ~]$ ls
derby.log
flume
flume1.conf
num123
[aruna.karri_wipro@ec2-52-86-42-143 ~]$ pwd
/home/aruna.karri_wipro
[aruna.karri_wipro@ec2-52-86-42-143 ~]$ cat flume1.conf
myAgent.sources=anyName-source
myAgent.sinks=hdfs
myAgent.channels=memory
myAgent.sources.anyName-source.type=spooldir
myAgent.sources.anyName-source.spooldir=/home/aruna.karri_wipro/flume/source
myAgent.sources.anyName-source.channels=memory-channelmyAgent.sinks.hdfs.type=hdfs
myAgent.sinks.hdfs.hdfs.path=/flume/log
myAgent.sinks.hdfs.channel=memory-channel
myAgent.sinks.hdfs.hdfs.rollInterval=0
myAgent.sinks.hdfs.hdfs.rollSize=524288
myAgent.sinks.hdfs.hdfs.rollCount=0
myAgent.sinks.hdfs.hdfs.fileType=DataStream
myAgent.channels.memory.type=memory
myAgent.channels.memory.capacity=1000
myAgent.channels.memory.transactionCapacity=1000
 

Soumit Jana

Member
Customer
Task for Hive UDF-
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class HiveExtension extends UDF {

private Text result = new Text();

public Text evaluate(String input) {
if (input == null)
return null;
else
result.set(input.toUpperCase());
return result;

}

}

upload_2017-1-5_16-11-46.png
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi,

Trying to get the flume use case discussed execute but, looks like doing some basic mistake. Could anyone check and help.

[aruna.karri_wipro@ec2-52-86-42-143 ~]$ flume-ng agent --conf /etc/flume-ng/conf \> --conf-file /flume1.conf \> --name myAgent -Dflume.root.logger=INFO,console
Info: Including Hadoop libraries found via (/usr/bin/hadoop) for HDFS accessInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-api-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/tez/lib/slf4j-api-1.7.5.jar from classpathInfo: Including HBASE libraries found via (/usr/bin/hbase) for HBASE accessInfo: Excluding /usr/hdp/2.4.0.0-169/hbase/lib/slf4j-api-1.7.7.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-api-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/tez/lib/slf4j-api-1.7.5.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-api-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/zookeeper/lib/slf4j-api-1.6.1.jar from classpathInfo: Excluding /usr/hdp/2.4.0.0-169/zookeeper/lib/slf4j-log4j12-1.6.1.jar from classpathInfo: Including Hive libraries found via () for Hive access+ exec /usr -Xmx20m -Dflume.root.logger=INFO,console -cp '/etc/flume-ng/conf:/usr/hdp/2.4.0.0-169/flume/lib/*:/usr/hdp/2.4.0.0-169/hadoop/conf:/usr/hdp/2.4.0.0-169/hadoop/lib/activation-1.1.jar:/usr/hdp/2.4.0.0-169/hadoop/lib/apacheds-

adoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native org.apache.flume.node.Application --conf-file /home/aruna.karri_wipro/flume1.conf --name myAgent/usr/hdp/2.4.0.0-169/flume/bin/flume-ng.distro: line 247: /usr: is a directory/usr/hdp/2.4.0.0-169/flume/bin/flume-ng.distro: line 247: exec: /usr: cannot execute: Success


---------------------------------------------------------------------------------------------------------------------------------------------------------------

[aruna.karri_wipro@ec2-52-86-42-143 ~]$ ls
derby.log
flume
flume1.conf
num123
[aruna.karri_wipro@ec2-52-86-42-143 ~]$ pwd
/home/aruna.karri_wipro
[aruna.karri_wipro@ec2-52-86-42-143 ~]$ cat flume1.conf
myAgent.sources=anyName-source
myAgent.sinks=hdfs
myAgent.channels=memory
myAgent.sources.anyName-source.type=spooldir
myAgent.sources.anyName-source.spooldir=/home/aruna.karri_wipro/flume/source
myAgent.sources.anyName-source.channels=memory-channelmyAgent.sinks.hdfs.type=hdfs
myAgent.sinks.hdfs.hdfs.path=/flume/log
myAgent.sinks.hdfs.channel=memory-channel
myAgent.sinks.hdfs.hdfs.rollInterval=0
myAgent.sinks.hdfs.hdfs.rollSize=524288
myAgent.sinks.hdfs.hdfs.rollCount=0
myAgent.sinks.hdfs.hdfs.fileType=DataStream
myAgent.channels.memory.type=memory
myAgent.channels.memory.capacity=1000
myAgent.channels.memory.transactionCapacity=1000

Hi Aruna,

I request you try this in your VM as it looks like these is some restriction issue. There is not error in your code beside it even in the end it says success user can not execute.
 

praveen.rachapally

Member
Customer
hi,
i have jar file of size 5.91mb, which contains program for pig udf and pig related jars.
i am not able to upload the jars to hdfs from local windows machine.
please do the needful.
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
hi,
i have jar file of size 5.91mb, which contains program for pig udf and pig related jars.
i am not able to upload the jars to hdfs from local windows machine.
please do the needful.


HI Praveen,

Jar files can be uploaded to the local of cloud lab using FTP application. I have checked just now it is working fine. File con be of any size.

Once uploaded you can move it to HDFS using copyFromLocal Command.

Also, using File browser you can upload file to HDFS and file size doesn't matter.
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

Code:
Custno, firstname, lastname, age, profession
4000001,Kristina,Chung,55,Pilot
4000002,Paige,Chen,74,Teacher
4000003,Sherri,Melton,34,Firefighter
4000004,Gretchen,Hill,66,Computer hardware engineer
4000005,Karen,Puckett,74,Lawyer
4000006,Patrick,Song,42,Veterinarian
4000007,Elsie,Hamilton,43,Pilot
4000008,Hazel,Bender,63,Carpenter
4000009,Malcolm,Wagner,39,Artist
4000010,Dolores,McLaughlin,60,Writer
4000011,Francis,McNamara,47,Therapist
4000012,Sandy,Raynor,26,Writer
4000013,Marion,Moon,41,Carpenter
4000014,Beth,Woodard,65,
4000015,Julia,Desai,49,Musician
4000016,Jerome,Wallace,52,Pharmacist
4000017,Neal,Lawrence,72,Computer support specialist
4000018,Jean,Griffin,45,Childcare worker
4000019,Kristine,Dougherty,63,Financial analyst
Step 1: Create a HBase table ‘customers’ with column_family ‘customers_data’ from HBase shell.

# Enter into HBase shell

[training@localhost ~]$ hbase shell
# Create a table ‘customers’ with column family ‘customers_data’

hbase(main):001:0> create 'customers', 'customers_data'
# List the tables

hbase(main):002:0> list
# Exit from HBase shell

hbase(main):003:0> exit
Step 2: Write the following PIG script to load data into the ‘customers’ table in HBase

-- Name your script Load_HBase_Customers.pig
-- Load dataset 'customers' from HDFS location

raw_data = LOAD 'hdfs:/user/training/customers' USING PigStorage(',') AS (
           custno:chararray,
           firstname:chararray,
           lastname:chararray,
           age:int,
           profession:chararray
);

-- To dump the data from PIG Storage to stdout
/* dump raw_data; */

-- Use HBase storage handler to map data from PIG to HBase
--NOTE: In this case, custno (first unique column) will be considered as row key.


STORE raw_data INTO 'hbase://customers' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'customers_data:firstname
customers_data:lastname
customers_data:age
customers_data:profession'
);
 

Manish Pundir

Member
Customer
Getting error while loading data from sqlContext.
val sql=sqlContext.load("sample.txt")
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
[Stage 0:> (0 + 0) / 8]17/01/09 06:01:31 ERROR Executor: Exception in task 7.0 in stage 0.0
(TID 7)
java.io.IOException: Could not read footer: java.lang.RuntimeException: hdfs://cloudlabns/user/manish.pundir_wipro/sample.txt is not a Parquet file.
expected magic number at tail [80, 65, 82, 49] but found [48, 48, 10, 10]
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Getting error while loading data from sqlContext.
val sql=sqlContext.load("sample.txt")
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
[Stage 0:> (0 + 0) / 8]17/01/09 06:01:31 ERROR Executor: Exception in task 7.0 in stage 0.0
(TID 7)
java.io.IOException: Could not read footer: java.lang.RuntimeException: hdfs://cloudlabns/user/manish.pundir_wipro/sample.txt is not a Parquet file.
expected magic number at tail [80, 65, 82, 49] but found [48, 48, 10, 10]

Hi manish,

Lets discuss this in the class now. Please join the class.
 

praveen.rachapally

Member
Customer
HI Team
not able to connect to hive shell from cloud lab.
getting below exception/error. Please check this issue.

... 11 moreCaused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException): Invalid resource request, requested memory < 0, or requested memory > max configured, requestedMemory=2048, maxMemory=1024 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:265) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:391) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:335) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:282) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:582) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at org.apache.hadoop.ipc.Client.call(Client.java:1427) at org.apache.hadoop.ipc.Client.call(Client.java:1358) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy25.submitApplication(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:236) ... 21 more[praveen.rachapally_wipro@ip-172-31-100-98 ~]$
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
I am unable to run sqoop import command . getting exceptions . Kindly help me on that .

View attachment 1404

View attachment 1405
HI Neetu,
Check you command and update it like below.

Use below credentials to log in.
mysql -h (public ip) -u labuser -p ( press enter)
and in new line enter password as simplilearn
MySQL server access details: (This is a separate set-up in a data node to avoid colliding with Hive DB)
hostname : ec2-23-23-9-126.compute-1.amazonaws.com
public ip : 23.23.9.126
private ip : 172.31.54.174
username: labuser
password: simplilearn
sample command to import mysql database to hdfs:
sqoop import --connect jdbc:mysql://172.31.54.174/pot --driver com.mysql.jdbc.Driver --username labuser --password simplilearn --table potluck --m 1
 

VATTIPALLI ARUNA

Active Member
Customer
Hi Desh,

Would you please check this. Any mistake in the command.

[aruna.karri_wipro@ip-172-31-100-98 ~]$ mysql -h 23.23.9.126 -u labuser -pEnter
password:

ERROR 2003 (HY000): Can't connect to MySQL server on '23.23.9.126' (110)
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi Desh,

Would you please check this. Any mistake in the command.

[aruna.karri_wipro@ip-172-31-100-98 ~]$ mysql -h 23.23.9.126 -u labuser -pEnter
password:

ERROR 2003 (HY000): Can't connect to MySQL server on '23.23.9.126' (110)


Hi,

MySQL server access details: (This is a separate set-up in a data node to avoid colliding with Hive DB)
hostname : ec2-23-23-9-126.compute-1.amazonaws.com
public ip : 23.23.9.126
private ip : 172.31.54.174
username: labuser
password: simplilearn
sample command :

mysql -h 172.31.54.174 -u labuser -p

simplilearn
 

praveen.rachapally

Member
Customer
Cloud_Big Data Hadoop_Real World Project_Social Media

#1# Top 10 most commonly used tags in this data set
REGISTER hdfs://cloudlabns/simplilearn/piggybank.jar;
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();

a = load 'hdfs://cloudlabns/simplilearn/answers.csv2' USING CSVLoader(',') AS (sno:int, qid:long, i:long, qs:int, qt:long, tags:chararray, qvc:int, qac:int, aid:long, j:long, as:int, at:long);
b = FOREACH a GENERATE sno, qid, i, qs, qt, tags
b_group_data_tags = GROUP b by tags;
c = FOREACH b_group_data_tags GENERATE group, b.qid;
d = FOREACH b_group_data_tags GENERATE group, COUNT(b.qid) as counters;
d_ordered = ORDER d BY counters DESC ;
d_10 = limit d_ordered 10;
dump d_10;

#2# Average time to answer questions.
REGISTER hdfs://cloudlabns/simplilearn/piggybank.jar;
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();

a_time = load 'hdfs://cloudlabns/simplilearn/answers.csv2' USING CSVLoader(',') AS (sno:int, qid:long, i:long, qs:int, qt:long, tags:chararray, qvc:int, qac:int, aid:long, j:long, answerScore:int, ansTime:long)
b_time = FOREACH a_time GENERATE sno,qid,i,qs,qt,tags,qvc,qac,aid,j,answerScore,ansTime;
timeTakenTOAnswer = FOREACH b_time GENERATE qid, (ansTime-qt) as tttans
timeTakenTOAnswerGroup = GROUP timeTakenTOAnswer by qid;
timeTakenTOAnswerGroup_avg = FOREACH timeTakenTOAnswerGroup GENERATE group, AVG(timeTakenTOAnswer.tttans) as avg;
dump timeTakenTOAnswerGroup_avg;

#3# Number of questions which got answered within 1 hour.
REGISTER hdfs://cloudlabns/simplilearn/piggybank.jar;
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();

a_time = load 'hdfs://cloudlabns/simplilearn/answers.csv2' USING CSVLoader(',') AS (sno:int, qid:long, i:long, qs:int, qt:long, tags:chararray, qvc:int, qac:int, aid:long, j:long, answerScore:int, ansTime:long)

b_time = FOREACH a_time GENERATE sno,qid,i,qs,qt,tags,qvc,qac,aid,j,answerScore,ansTime;

timeTakenTOAnswer = FOREACH b_time GENERATE qid, (ansTime-qt) as tttans

timeTakenTOAnswerGroup = GROUP timeTakenTOAnswer by qid;

timeTakenTOAnswerGroup_avg = FOREACH timeTakenTOAnswerGroup GENERATE group, (int)(AVG(timeTakenTOAnswer.tttans)) as avg;

test1 = FOREACH timeTakenTOAnswerGroup_avg generate *,FLATTEN(avg) as (avg_temp:chararray);

test2 = FOREACH test1 GENERATE *, GetHour(ToDate(avg_temp)) as hours

filter_data_lt_1 = FILTER test2 BY hours < 1
dump filter_data_lt_1;

#4# Tags of questions which got answered within 1 hour.
REGISTER hdfs://cloudlabns/simplilearn/piggybank.jar;
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();

a_time_tags = load 'hdfs://cloudlabns/simplilearn/answers.csv2' USING CSVLoader(',') AS (sno:int, qid:long, i:long, qs:int, qt:long, tags:chararray, qvc:int, qac:int, aid:long, j:long, answerScore:int, ansTime:long)

b_time_tags = FOREACH a_time_tags GENERATE sno,qid,i,qs,qt,tags,qvc,qac,aid,j,answerScore,ansTime;

timeTakenTOAnswer_tags = FOREACH b_time_tags GENERATE qid,tags, (ansTime-qt) as tttans

timeTakenTOAnswerGroup_tags = GROUP timeTakenTOAnswer_tags by tags;

timeTakenTOAnswerGroup_avg_tags = FOREACH timeTakenTOAnswerGroup_tags GENERATE group, (int)(AVG(timeTakenTOAnswer_tags.tttans)) as avg;

test1_tags = FOREACH timeTakenTOAnswerGroup_avg generate *,FLATTEN(avg) as (avg_temp:chararray);

test2_tags = FOREACH test1_tags GENERATE *, GetHour(ToDate(avg_temp)) as hours

filter_data_lt_1_tags = FILTER test2_tags BY hours < 1
dump filter_data_lt_1_tags;
explain filter_data_lt_1_tags;

illustrate filter_data_lt_1_tags;
 

Manish Pundir

Member
Customer
Hi Team,
I dont find the link of google drive where all the PDF file for scala command and mini project stored can you share that again.
thanks,
 

Manish Pundir

Member
Customer
Hi not able to join the RDD i guess need to do some more in RDD can you help,
val emp = sc.textFile("/user/manish.pundir_wipro/test/emp.txt")
val empRDD = emp.map(x =>(x.split(",")(0),x.split(",")(1)))

val salary = sc.textFile("/user/manish.pundir_wipro/test/salary.txt")
val salaryRDD = salary.map(x =>(x.split(",")(0),x.split(",")(1)))

val joinedDF = emp.join(salaryRDD)
joinedDF.collect
error: value join is not a member of org.apache.spark.rdd.RDD[String]
val joinedDF = emp.join(salaryRDD)
 

VATTIPALLI ARUNA

Active Member
Customer
Hi,

Why is this difference in output below....how does they function

scala> sc.textFile("file:///home/aruna.karri_wipro/emp").collect
res30: Array[String] = Array(E01,Lokesh, E02,Bhupesh, E03,Amit, E04,Ratan, E05,Dinesh, E06,Pavan, E07,Tejas, E08,Sheela, E09,Kumar, E10,Venkat)
scala> sc.textFile("file:///home/aruna.karri_wipro/emp").foreach(println)
E01,Lokesh
E02,Bhupesh
E03,Amit
E04,Ratan
E05,Dinesh
E06,Pavan
E07,Tejas
E08,Sheela
E09,Kumar
E10,Venkat

scala> sc.textFile("file:///home/aruna.karri_wipro/emp").map(l=>l.split("\\W")).first
res27: Array[String] = Array(E01, Lokesh)
scala> sc.textFile("file:///home/aruna.karri_wipro/emp").map(l=>l.split("\\W")).foreach(println)
[Ljava.lang.String;@3e4dbb83
[Ljava.lang.String;@2678fd53
[Ljava.lang.String;@6fa72c8f
[Ljava.lang.String;@f4d789b
[Ljava.lang.String;@5c8f5b52
[Ljava.lang.String;@7f036b2e
[Ljava.lang.String;@5e9d3331
[Ljava.lang.String;@31091923
[Ljava.lang.String;@ade6d68
[Ljava.lang.String;@6ee03d06
scala> sc.textFile("file:///home/aruna.karri_wipro/emp").map(l=>l.split("\\W")).collect
res29: Array[Array[String]] = Array(Array(E01, Lokesh), Array(E02, Bhupesh), Array(E03, Amit), Array(E04, Ratan), Array(E05, Dinesh), Array(E06, Pavan), Array(E07, Tejas), Array(E08, Sheela), Array(E09, Kumar), Array(E10, Venkat))
 

VATTIPALLI ARUNA

Active Member
Customer
Got the difference in collect and foreach(println) this is printing the address of Array[String].....

Yet another question......

why is the below not working....when foreach(println) is allowed why not collect

scala> sc.textFile("file:///home/aruna.karri_wipro/emp").collect
res30: Array[String] = Array(E01,Lokesh, E02,Bhupesh, E03,Amit, E04,Ratan, E05,Dinesh, E06,Pavan, E07,Tejas, E08,Sheela, E09,Kumar, E10,Venkat)
scala> sc.textFile("file:///home/aruna.karri_wipro/emp").foreach(println)
E01,Lokesh
E02,Bhupesh
E03,Amit
E04,Ratan
E05,Dinesh
E06,Pavan
E07,Tejas
E08,Sheela
E09,Kumar
E10,Venkat

scala> sc.textFile("file:///home/aruna.karri_wipro/emp").map(l=>l.split("\\W")).foreach(_.foreach(println))
E06
Pavan
E01
Lokesh
E07
Tejas
E02
E08
Sheela
Bhupesh
E09
Kumar
E03
E10
Venkat
Amit
E04
Ratan
E05
Dinesh

scala> sc.textFile("file:///home/aruna.karri_wipro/emp").map(l=>l.split("\\W")).foreach(_.collect)
<console>:28: error: erroneous or inaccessible type sc.textFile("file:///home/aruna.karri_wipro/emp").map(l=>l.split("\\W")).foreach(_.collect)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------




Hi,

Why is this difference in output below....how does they function

scala> sc.textFile("file:///home/aruna.karri_wipro/emp").collect
res30: Array[String] = Array(E01,Lokesh, E02,Bhupesh, E03,Amit, E04,Ratan, E05,Dinesh, E06,Pavan, E07,Tejas, E08,Sheela, E09,Kumar, E10,Venkat)
scala> sc.textFile("file:///home/aruna.karri_wipro/emp").foreach(println)
E01,Lokesh
E02,Bhupesh
E03,Amit
E04,Ratan
E05,Dinesh
E06,Pavan
E07,Tejas
E08,Sheela
E09,Kumar
E10,Venkat

scala> sc.textFile("file:///home/aruna.karri_wipro/emp").map(l=>l.split("\\W")).first
res27: Array[String] = Array(E01, Lokesh)
scala> sc.textFile("file:///home/aruna.karri_wipro/emp").map(l=>l.split("\\W")).foreach(println)
[Ljava.lang.String;@3e4dbb83
[Ljava.lang.String;@2678fd53
[Ljava.lang.String;@6fa72c8f
[Ljava.lang.String;@f4d789b
[Ljava.lang.String;@5c8f5b52
[Ljava.lang.String;@7f036b2e
[Ljava.lang.String;@5e9d3331
[Ljava.lang.String;@31091923
[Ljava.lang.String;@ade6d68
[Ljava.lang.String;@6ee03d06
scala> sc.textFile("file:///home/aruna.karri_wipro/emp").map(l=>l.split("\\W")).collect
res29: Array[Array[String]] = Array(Array(E01, Lokesh), Array(E02, Bhupesh), Array(E03, Amit), Array(E04, Ratan), Array(E05, Dinesh), Array(E06, Pavan), Array(E07, Tejas), Array(E08, Sheela), Array(E09, Kumar), Array(E10, Venkat))
 

VATTIPALLI ARUNA

Active Member
Customer
Gave it as
val joinedDF = emp.join(salaryRDD)
guess should be val joinedDF = empRDD.join(salaryRDD)

RDD[String] represents normal rdd. RDD[String,String] represents PairRDD. :)



Hi not able to join the RDD i guess need to do some more in RDD can you help,
val emp = sc.textFile("/user/manish.pundir_wipro/test/emp.txt")
val empRDD = emp.map(x =>(x.split(",")(0),x.split(",")(1)))

val salary = sc.textFile("/user/manish.pundir_wipro/test/salary.txt")
val salaryRDD = salary.map(x =>(x.split(",")(0),x.split(",")(1)))

val joinedDF = emp.join(salaryRDD)
joinedDF.collect
error: value join is not a member of org.apache.spark.rdd.RDD[String]
val joinedDF = emp.join(salaryRDD)
 

adarsh pattar

Member
Customer
Hi i tried running spark streaming program by executing all commands using 1 script. i wrote streaming code in a file name :scala_test.scala and executed it using command : spark-shell -i scala-test.scala .
results are given in attachment file
 

Attachments

  • scal1.png
    scal1.png
    202.6 KB · Views: 9

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi i tried running spark streaming program by executing all commands using 1 script. i wrote streaming code in a file name :scala_test.scala and executed it using command : spark-shell -i scala-test.scala .
results are given in attachment file

HI Adarsh,

Good work, lets discuss this with Shivank during Monday class.
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All.

For centos use this link

https://wiki.centos.org/Download

Download 6.8 version.


Here once you click, based on ur machine ie 64 bit (x86_64) or 32 bit (i386), CLICK ON RPMS

,then following folder and chose ISOs and that should take you to mirror for your location, from where you can download linux disc image iso file.
 
Last edited by a moderator:
Top