Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

Time sensitive: Data for end of course project |Big Data Hadoop and Spark Developer | Stock Exchange Data Analysis

To Backend support and Simplilearn Support:

I have searched the entire forum and found that this problem has been reported several times by learners. However, no definitive solution was provided. So, I am re-framing the problem to describe the exact information needed to proceed with our project work.

THE PROBLEM:
Data for Project 1(NYSE data analysis):Big Data Hadoop and Spark Developer


A link to the NewYork stock exchange data file was provided, but the task requires us to use sqoop to import the data from MySQL to Hive.
To create the required pipeline using sqoop to pull the STOCK_PRICES and STOCK_COMPANIES tables from the BDHS_PROJECT database, the following information needs to be provided to learners:

1) database host: (either the full hostname or IP address of the host (that can be seen by all the hadoop/hive nodes))

2) We are given the name of the database, but nothing is said about the authentication against the databsse to access it, so we will also need the "username" and "password" to authenticate access to the database.

Please NOTE 1: The task of the project includes creating a data pipeline using sqoop to move the data from MySQL to Hive. So, manually creating a Hive table and populating it with the file from the provided link is not the solution.

In summary, the following bold-underlined is missing from the assignment question:

sqoop import --connect jdbc:mysql:// FULLHOSTNAME-or-IPADDRESS/BDHS_PROJECT --username UUUUUU --password PPPPPPP --table STOCK_PRICES -hive-import

Please NOTE 2: Yes, the database may exists somewhere on the system, and backend guys can see it and access it. But we learner are non-root users so we need to authentication to BDHS_PROJECT.

Please advise.
 
Top