Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

WIPRO - BIGDATA ACADEMY

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi Deshdeep ,
from where we can download docs / PPts given by Shivang .

Hi Neetu,

All the material provided by Shivank is in the Downloads folder under practice files.

PPT's are given in the e-Book where it contains all the material covered by Shivank during the class.
 

MEGHA AHLUWALIA

Member
Customer
Hi All.

For centos use this link

https://wiki.centos.org/Download

Download 6.8 version.


Here once you click, based on ur machine ie 64 bit (x86_64) or 32 bit (i386), CLICK ON RPMS

,then following folder and chose ISOs and that should take you to mirror for your location, from where you can download linux disc image iso file.

Hi Deshdeep, which of the following has to be downloaded. please help.

Thanks,
Megha
 

Attachments

  • centos ISO.png
    centos ISO.png
    203.9 KB · Views: 18

Neetu Gupta

Member
Alumni
Customer
Hi ,
I downloaded CentOS-6.8-x86_64-bin-DVD1 . But when i am choosing it for storage its throwing error :

upload_2017-1-17_21-15-28.png



Please help me .
 

Manish Pundir

Member
Customer
Not able to connect to mysql data base.
mysql -utraining -ptraining
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Not able to connect to mysql data base.
mysql -utraining -ptraining
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)


HI Manish,

MySQL server access details: (This is a separate set-up in a data node to avoid colliding with Hive DB)
hostname : ec2-23-23-9-126.compute-1.amazonaws.com
public ip : 23.23.9.126
private ip : 172.31.54.174
username: labuser
password: simplilearn
sample command :

mysql -h 172.31.54.174 -u labuser -p

simplilearn
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Deshdeep,

I am not able to access internet inside virtual box created today. Please suggest if any changes required.

Thanks,
Rahamath.
HI Rahamath,

Please check you network adapter setting to resolve this issue.

Add the following to /etc/sysconfig/network-scripts/ifcfg-enp0s3

DNS1=8.8.8.8
DNS2=8.8.4.4
# Note this was set to no
ONBOOT=yes
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

Please find the recordings below

Batch 1:

Date Batch Link
19/Dec
Batch 1 – Day1 https://s3.amazonaws.com/downloads....ta Academy/Batch 1/BIG DATA ACADEMY Day 2.mp4
20/Dec Batch 1 – Day2 https://s3.amazonaws.com/downloads.... Big Data Academy/Batch 1/Batch 1 - day 2.mp4
21/Dec Batch 1 - Day3 https://s3.amazonaws.com/downloads.... Big Data Academy/Batch 1/Batch 1 - Day 3.mp4
22/Dec Batch 1 - Day4 https://s3.amazonaws.com/downloads.... Big Data Academy/Batch 1/Batch 1 - day 4.mp4
23/Dec Batch 1 - Day5 https://s3.amazonaws.com/downloads.... - Big Data Academy/Batch 1/Batch 1 day 5.mp4
02/Jan Batch 1 - Day6 https://s3.amazonaws.com/downloads.... - Big Data Academy/Batch 1/Batch 1 Day 6.mp4
03/Jan Batch 1 - Day7 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%201/Batch%201%20day%207.mp4
04/Jan Batch 1 - Day8 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%201/batch%201%20day%208.mp4
05/Jan Batch 1 - Day9 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%201/batch%201%20day%209.mp4
06/Jan Batch 1 - Day10 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%201/batch%201%20day%2010.mp4
16/Jan Batch 1 - Day11 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%201/batch%201%20day%2011.mp4
17/Jan Batch 1 - Day12 https://s3.amazonaws.com/downloads....Academy/BIG DATA ACADEMY Batch 1 17th JAN.mp4
18/Jan Batch 1 - Day13 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/BIG%20data%20academy%20batch%201_18%20Jan.mp4
Date Batch Link
09/Jan Batch 1 – Day1 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%201/batch%201%20doubt%20clearing%201.mp4
11/Jan Batch 1 – Day2 https://s3.amazonaws.com/downloads.... Academy/Batch 1/batch 1 doubt clearing 2.mp4
13/Jan Batch 1 – Day3 https://s3.amazonaws.com/downloads.... Academy/Batch 1/batch 1 doubt clearing 3.mp4

batch 2:

Date Batch Link
19/Dec Batch 2 – Day1 https://s3.amazonaws.com/downloads....Big Data Academy/Batch 2/BIG DATA ACADEMY.mp4
20/Dec Batch 2 – Day2 https://s3.amazonaws.com/downloads....ta Academy/Batch 2/BIG DATA ACADEMY Day 2.mp4
21/Dec Batch 2 – Day3 https://s3.amazonaws.com/downloads....r 2016/B2B/BIG DATA ACADEMY 21st Dec 2016.mp4
22/Dec Batch 2 – Day4 https://s3.amazonaws.com/downloads....o - Big Data Academy/Batch 2/Day 4 Part 2.mp4
23/Dec Batch 2 – Day5 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%202/batch%202%20day%205.mp4
02/Jan Batch 2 – Day6 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%202/Batch%202%20Day%206.mp4
03/Jan Batch 2 – Day7 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%202/Batch%202%20Day%207.mp4
04/Jan Batch 2 – Day8 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%202/batch%202%20day%208.mp4
05/Jan Batch 2 – Day9 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%202/batch%202%20day%209.mp4
06/Jan Batch 2 – Day10 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%202/Batch%202%20Day%2010.mp4
16/Jan Batch 2 – Day11 https://s3.amazonaws.com/downloads....- Big Data Academy/Batch 2/batch 2 day 11.mp4
17/Jan Batch 2 – Day12 https://s3.amazonaws.com/downloads....o - Big Data Academy/BIG DATA ACADEMY (2).mp4
18/Jan Batch 2 – Day13 https://s3.amazonaws.com/downloads....- Big Data Academy/Batch 2/batch 2 day 12.mp4
Date Batch Link
09/Jan Batch 2 – Day1 https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%202/batch%202%20doubt%20clearing%201.mp4
11/Jan Batch 2 – Day2 https://s3.amazonaws.com/downloads.... Academy/Batch 2/batch 2 doubt clearing 2.mp4
 

Ravi_272

Member
Customer
I am facing problem while using -put command to place file in HDFS

I have installed Hadoop 2.x in my virtualbox machine running on CentOS.

A)Below files are configured
1)hadoop-env.sh
2)hdfs-site.xml
3)core-site.xml
4)mapred-site.xml.template
5)masters
6)slaves
7)yarn-site.xml

B)Created directory and assigned right to it.
C)Ran the format command to format namenode
D)Verified the current folder is created in the path
E)executed start-all.sh
F) Verified all the java processes
G) executed hadoop-daemon.sh start datanode to start datanode
H) Ran the hadoop dfs -put filename.txt command and got error

WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Cannot lock storage /wiprop2. The directory is already locked
2017-01-20 13:39:49,780 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to m2/10.156.238.207:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1338)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1304)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:226)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:867)
at java.lang.Thread.run(Thread.java:745)
2017-01-20 13:39:49,788 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to m2/10.156.238.207:9000
2017-01-20 13:39:49,798 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
2017-01-20 13:39:51,798 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2017-01-20 13:39:51,799 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2017-01-20 13:39:51,800 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

I have attached logs also from the logs directory of Hadoop. Can you please help
 

Attachments

  • hadoop-hduser-namenode-m2.txt
    40.2 KB · Views: 3
Last edited:

Manish Pundir

Member
Customer
Not able to load CSV file with load api tryied both
scala> val cars = sqlContext.load("/user/manish.pundir_wipro/spark/emp.csv","com.databricks.spark.csv")warning: there were 1 deprecation warning(s); re-run with -deprecation for detailsjava.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at http://spark-packages.org

val cars = sqlContext.load("/user/manish.pundir_wipro/spark/emp.csv","csv")warning: there were 1 deprecation warning(s); re-run with -deprecation for detailsjava.lang.ClassNotFoundException: Failed to find data source: csv. Please find packages at http://spark-packages.org
 

Manish Pundir

Member
Customer
Hi Team,
Data/Input for project 2 ."A mobile phone service provider has introduced a new Open Network campaign. The company has invited the users to raise a request to initiate..."
IS NOT CORRECT.CAN YOU provide the updated input.
 

Ravi_272

Member
Customer
I am facing problem while using -put command to place file in HDFS

I have installed Hadoop 2.x in my virtualbox machine running on CentOS.

A)Below files are configured
1)hadoop-env.sh
2)hdfs-site.xml
3)core-site.xml
4)mapred-site.xml.template
5)masters
6)slaves
7)yarn-site.xml

B)Created directory and assigned right to it.
C)Ran the format command to format namenode
D)Verified the current folder is created in the path
E)executed start-all.sh
F) Verified all the java processes
G) executed hadoop-daemon.sh start datanode to start datanode
H) Ran the hadoop dfs -put filename.txt command and got error

WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Cannot lock storage /wiprop2. The directory is already locked
2017-01-20 13:39:49,780 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to m2/10.156.238.207:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1338)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1304)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:226)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:867)
at java.lang.Thread.run(Thread.java:745)
2017-01-20 13:39:49,788 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to m2/10.156.238.207:9000
2017-01-20 13:39:49,798 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
2017-01-20 13:39:51,798 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2017-01-20 13:39:51,799 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2017-01-20 13:39:51,800 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

I have attached logs also from the logs directory of Hadoop. Can you please help
 

Attachments

  • hadoop-hduser-namenode-m2.txt
    40.2 KB · Views: 4

Megha_42

Well-Known Member
Simplilearn Support
Hi All,

Please find the documents that were used in class of 24 Jan, attached here, with this post.

Thanks,
Megha
 

Attachments

  • Linux.txt
    523 bytes · Views: 32
  • Hdfs tasks.pdf
    18.5 KB · Views: 37

VATTIPALLI ARUNA

Active Member
Customer
Hi Team,
Data/Input for project 2 ."A mobile phone service provider has introduced a new Open Network campaign. The company has invited the users to raise a request to initiate..."
IS NOT CORRECT.CAN YOU provide the updated input.
Please share quickly if a new data is present.....
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
I am facing problem while using -put command to place file in HDFS

I have installed Hadoop 2.x in my virtualbox machine running on CentOS.

A)Below files are configured
1)hadoop-env.sh
2)hdfs-site.xml
3)core-site.xml
4)mapred-site.xml.template
5)masters
6)slaves
7)yarn-site.xml

B)Created directory and assigned right to it.
C)Ran the format command to format namenode
D)Verified the current folder is created in the path
E)executed start-all.sh
F) Verified all the java processes
G) executed hadoop-daemon.sh start datanode to start datanode
H) Ran the hadoop dfs -put filename.txt command and got error

WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Cannot lock storage /wiprop2. The directory is already locked
2017-01-20 13:39:49,780 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to m2/10.156.238.207:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:478)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1338)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1304)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:314)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:226)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:867)
at java.lang.Thread.run(Thread.java:745)
2017-01-20 13:39:49,788 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to m2/10.156.238.207:9000
2017-01-20 13:39:49,798 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)
2017-01-20 13:39:51,798 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2017-01-20 13:39:51,799 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2017-01-20 13:39:51,800 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

I have attached logs also from the logs directory of Hadoop. Can you please help

HI Ravi,

2015-05-28 21:41:57,544 WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop/dfs/datanode: namenode clusterID = CID-e77ee39a-ab4a-4de1-b1a4-9d4da78b83e8; datanode clusterID = CID-6c250e90-658c-4363-9346-972330ff8bf9

Your namenode and datanode cluster ID does not match.

Open your usr/local/hadoop/dfs/datanode/current/VERSION file and change:

clusterID=CID-6c250e90-658c-4363-9346-972330ff8bf9

to

clusterID=CID-e77ee39a-ab4a-4de1-b1a4-9d4da78b83e8

NOTE: Whenever you format your namenode, check the VERSION file of namenode and datanode. They both should have same clusterID and namespaceID. Otherwise your datanode, won't start.
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi Team,
Data/Input for project 2 ."A mobile phone service provider has introduced a new Open Network campaign. The company has invited the users to raise a request to initiate..."
IS NOT CORRECT.CAN YOU provide the updated input.

HI Manish,


We are working on it this will be resolved in 24 hrs. The new data set will be available with the course project data sets.
 

Ganesh Padaiyachi

Member
Customer
Hello All, received a mail from Sandesh asking to complete the case study before 31st Jan. Is this the two projects (Banking & telecom) on the LMS he is talking about or something else has been assigned??
 

Megha_42

Well-Known Member
Simplilearn Support
Hi All,

Glad to see you all so immersive in today's session for the MapReduce hands-on
Please find all documents for today's MapReduce in Java, attached here.

Happy hands-on!

Thanks,
Megha
 

Attachments

  • Java_Mapred_First_hands_on.txt
    8.7 KB · Views: 24
  • Mapreduce Steps.txt
    1.2 KB · Views: 25
  • MapReduce_simple_example.jpg
    MapReduce_simple_example.jpg
    163.1 KB · Views: 26

vamsi jampana

Member
Customer
Hi Deshdeep,

Regarding Project 1 - Banking case study.
Can you please give the criteria for below to work on / Give more details on these questions?

1. Check if age matters in marketing subscription for deposit- Do we need consider any age group/ need to find which age having "no" to deposit ?

2. Check if marital status mattered for subscription to deposit.
3. Check if age and marital status together mattered for subscription to deposit scheme

Thanks
Vamsi
 
Last edited:

Ravi_272

Member
Customer
HI Ravi,

2015-05-28 21:41:57,544 WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop/dfs/datanode: namenode clusterID = CID-e77ee39a-ab4a-4de1-b1a4-9d4da78b83e8; datanode clusterID = CID-6c250e90-658c-4363-9346-972330ff8bf9

Your namenode and datanode cluster ID does not match.

Open your usr/local/hadoop/dfs/datanode/current/VERSION file and change:

clusterID=CID-6c250e90-658c-4363-9346-972330ff8bf9

to

clusterID=CID-e77ee39a-ab4a-4de1-b1a4-9d4da78b83e8

NOTE: Whenever you format your namenode, check the VERSION file of namenode and datanode. They both should have same clusterID and namespaceID. Otherwise your datanode, won't start.

What is the location from where I can check the clusterID for namenode and datanode. I know the location for VERSION file in datanode but what is location of namenode directory
 

Arun_4681

Member
Customer
Hi Deshdeep,
we didn't receive recorded session for last two doubt clearing sessions for Batch 1 on Jan 20th and 24th.
could you send the link?

Thanks.
 

VATTIPALLI ARUNA

Active Member
Customer
Friends not sure if you have got this....i spent quite a lot time in getting off the error...so thought of sharing....
(pyspark)
>>> rdd3=rdd2.collect()
>>> for t1 in rdd3:
... print t1, 'ok'
File "<stdin>", line 2
print t1, 'ok'
^
IndentationError: expected an indented block

after giving a space before print it worked....

>>> for t1 in rdd3:
... print t1, 'ok'
...
2 ok
4 ok
6 ok
 

vamsi jampana

Member
Customer
Deshdeep,

I am unable to create dataframe using csv file, please suggest.

val df = sqlContext.read.csv("test.csv")

Hi Rahamath,
Please try this
sc.textFile("file.csv").
from RDD to DF.

(OR)

spark-shell --packages com.databricks:spark-csv_2.10:1.1.0
val sqlContext= new org.apache.spark.sql.SQLContext(sc)
val DF = sqlContext.read.format("com.databricks.spark.csv").
option("header", "true").
option("inferSchema", "true").
load("file.csv")
 

Ravi_272

Member
Customer
Hi,

I cam across problem while working on project01.

The data set provided is not in correct format due to which when I create a data frame from the data and call a simple 'SELECT' statement I am getting error message.

Code:
scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").option("delimiter",";").load("project01/*.csv")
df: org.apache.spark.sql.DataFrame = [age;"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y: string]

I am running select command on above data frame and then getting error message as below
Code:
scala> df.select("age").show();

Code:
org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input columns: [age;"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y];
    at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42

Can you please check if format of header for the dataset is correct?
 

Ravi_272

Member
Customer
Hi,

I cam across problem while working on project01.

The data set provided is not in correct format due to which when I create a data frame from the data and call a simple 'SELECT' statement I am getting error message.

Code:
scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").option("delimiter",";").load("project01/*.csv")
df: org.apache.spark.sql.DataFrame = [age;"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y: string]

I am running select command on above data frame and then getting error message as below
Code:
scala> df.select("age").show();

Code:
org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input columns: [age;"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y];
    at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42

Can you please check if format of header for the dataset is correct?

The count statement on the same data set is running correctly and giving the output

Code:
scala> df.count()
res20: Long = 45211
 

Ravi_272

Member
Customer
Hi, Can anyone enlighten me on the two problems asked in project 1? I am not able to understand what we need to achieve.

3. Maximum, Mean, and Minimum age of average targeted customer. What does average targeted customer means here? It just means to find all customers and then find their max, min and mean age.

8. Do feature engineering for column—age and find right age effect on campaign.

Thanks
Ravi
 

Ravi_272

Member
Customer
Hi Deshdeep,

Regarding Project 1 - Banking case study.
Can you please give the criteria for below to work on / Give more details on these questions?

1. Check if age matters in marketing subscription for deposit- Do we need consider any age group/ need to find which age having "no" to deposit ?

2. Check if marital status mattered for subscription to deposit.
3. Check if age and marital status together mattered for subscription to deposit scheme

Thanks
Vamsi

1. We need to check that how many people of particular age group have said yes for deposit. get the age and count of age who said yes for deposit.
2. Same as above
3. Same as above
 

vamsi jampana

Member
Customer
Hi, Can anyone enlighten me on the two problems asked in project 1? I am not able to understand what we need to achieve.

3. Maximum, Mean, and Minimum age of average targeted customer. What does average targeted customer means here? It just means to find all customers and then find their max, min and mean age.

8. Do feature engineering for column—age and find right age effect on campaign.

Thanks
Ravi

Hi Ravi,
I understood like below.
8. Do feature engineering for column—age and find right age effect :::: what age people and count of that age people who said yes. Which is maximum count. That is right age effect on campaign
 

Ganesh Padaiyachi

Member
Customer
Hi,

I created a dataframe for project1:

Code:
val df = sqlContext.read.format("com.databricks.spark.csv").
option("header", "true").
option("inferSchema", "true").
load("project1.csv")

I tried to read data:

Code:
df.select("y").show()

Got the below error:

org.apache.spark.sql.AnalysisException: cannot resolve 'y' given input columns age;"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y";

Please assist.
 
Top