Welcome to the Simplilearn Community

Want to join the rest of our members? Sign up right away!

Sign Up

WIPRO - BIGDATA ACADEMY

mahesh_622

Administrator
Staff member
Simplilearn Support
BigData community forum thread built to support exclusively only for WIPRO learners.

Post your queries/questions/doubts here and get it answered by our trainers and Experts (global teaching assistants).
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

Please go ahead download and install Cloud era with virtual box using below links.

Processor: I3- processor, with 64-bit architecture,

Ram: 6GB RAM

In c drive 30 GB of free space in hard disk.

To achieve the installation please follow below given links to download the files. I have also attached a link on how to achieve this installation in quick easy steps. Follow the below mentioned links.

Please download Virtual box using this link:
https://www.virtualbox.org/wiki/Downloads

And Cloudera using this link:
http://www.cloudera.com/downloads/quickstart_vms/5-8.html

Also, please follow below video to install this on your machine:
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

In order to achieve theinstallation of VM and Hadoop on your machine I have crafted a document and I have attached it with this, please follow as it will help you not only in downloading files but also in installation.

Please follow self-explanatory document.
 

Attachments

  • Pseudo distributed mode 2 7 1..pdf
    229.1 KB · Views: 30

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

Below is the code from Shivank's class on With Combiner and Partitioner


Code:
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Partitioner;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WithPartitioner {

    public static class Map extends MapReduceBase implements
            Mapper<LongWritable, Text, Text, IntWritable> {

        @Override
        public void map(LongWritable key, Text value,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {

            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);

            while (tokenizer.hasMoreTokens()) {
                value.set(tokenizer.nextToken());
                output.collect(value, new IntWritable(1));

                // // I am fine I am fine
                // v
                // I 1
                // am 1
                // fine 1
                // I 1
                // am 1
                // fine 1

                // I (1,1)

            }

        }
    }

    // Output types of Mapper should be same as arguments of Partitioner
    public static class MyPartitioner implements Partitioner<Text, IntWritable> {

        @Override
        public int getPartition(Text key, IntWritable value, int numPartitions) {

            String myKey = key.toString().toLowerCase();

            if (myKey.equals("hadoop")) {
                return 0;
            }
            if (myKey.equals("data")) {
                return 1;
            } else {
                return 2;
            }
        }

        @Override
        public void configure(JobConf arg0) {

            // Gives you a new instance of JobConf if you want to change Job
            // Configurations

        }
    }

    public static class Reduce extends MapReduceBase implements
            Reducer<Text, IntWritable, Text, IntWritable> {

        @Override
        public void reduce(Text key, Iterator<IntWritable> values,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {

            int sum = 0;
            while (values.hasNext()) {
                sum += values.next().get();
                // sum = sum + 1;
            }

            // beer,3

            output.collect(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) throws Exception {

        JobConf conf = new JobConf(WithPartitioner.class);
        conf.setJobName("wordcount");

        // Forcing program to run 3 reducers
        conf.setNumReduceTasks(3);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);
        conf.setPartitionerClass(MyPartitioner.class);

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

         FileInputFormat.setInputPaths(conf, new Path(args[0]));
         FileOutputFormat.setOutputPath(conf, new Path(args[1]));
        
        JobClient.runJob(conf);
    }
}
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

Please check the complete document attached with code for Sample on Counters.


Code:
import java.io.IOException;
import java.util.Date;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Counters;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;

public class MyCounter {
   
    public static enum MONTH{
        DEC,
        JAN,
        FEB
    };
   
   
   
    public static class MyMapper extends Mapper<LongWritable,Text, Text, Text> {
        private Text out = new Text();
        protected void map(LongWritable key, Text value, Context context)
            throws java.io.IOException, InterruptedException {
            String line = value.toString();
            String[]  strts = line.split(",");
            long lts = Long.parseLong(strts[1]);
            Date time = new Date(lts);
            int m = time.getMonth();
           
            if(m==11){
                context.getCounter(MONTH.DEC).increment(10);   
            }
            if(m==0){               
                    context.getCounter(MONTH.JAN).increment(20);
            }
            if(m==1){
                    context.getCounter(MONTH.FEB).increment(30);
            }
                out.set("success");
            context.write(out,out);
        } 
}
   
   
  public static void main(String[] args)
                  throws IOException, ClassNotFoundException, InterruptedException {
   
    Job job = new Job();
    job.setJarByClass(MyCounter.class);
    job.setJobName("CounterTest");
    job.setNumReduceTasks(0);
    job.setMapperClass(MyMapper.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);
   
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
   
    job.waitForCompletion(true);
   
    Counters counters = job.getCounters();
   
    Counter c1 = counters.findCounter(MONTH.DEC);
    System.out.println(c1.getDisplayName()+ " : " + c1.getValue());
    c1 = counters.findCounter(MONTH.JAN);
    System.out.println(c1.getDisplayName()+ " : " + c1.getValue());
    c1 = counters.findCounter(MONTH.FEB);
    System.out.println(c1.getDisplayName()+ " : " + c1.getValue());
   
  }
}
 

Attachments

  • Custom Counters Problem Statement.pdf
    65.7 KB · Views: 9
  • inputdata.txt
    77 bytes · Views: 8

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

Please find the complete content with problem statement for distributed cache

Step1: Create a abc.dat file and put below content in it.
Code:
up    Uttar_Pradesh
ma    Maharashtra
bi    Bihar
wb    WestBengal

Step2: create a file dcinput and put below mentioned content in it.

Code:
up    199654321
ma    112328654
bi    103876487
wb    91765349

MyDC

Code:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.net.URI;
import java.util.HashMap;
import java.util.Map;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Counters;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;

public class MyDC {
   
   
    public static class MyMapper extends Mapper<LongWritable,Text, Text, Text> {
       
       
        private Map<String, String> abMap = new HashMap<String, String>();
                private Text outputKey = new Text();
                private Text outputValue = new Text();
       
        protected void setup(Context context) throws java.io.IOException, InterruptedException{
            Path[] files = DistributedCache.getLocalCacheFiles(context.getConfiguration());
           
           
            for (Path p : files) {
                if (p.getName().equals("abc.dat")) {
                    BufferedReader reader = new BufferedReader(new FileReader(p.toString()));
                    String line = reader.readLine();
                    while(line != null) {
                        String[] tokens = line.split("\t");
                        String ab = tokens[0];
                        String state = tokens[1];
                        abMap.put(ab, state);
                        line = reader.readLine();
                    }
                }
            }
            if (abMap.isEmpty()) {
                throw new IOException("Unable to load Abbrevation data.");
            }
        }

       
        protected void map(LongWritable key, Text value, Context context)
            throws java.io.IOException, InterruptedException {
           
           
            String row = value.toString();
            String[] tokens = row.split("\t");
            String inab = tokens[0];
            String state = abMap.get(inab);
            outputKey.set(state);
            outputValue.set(row);
                context.write(outputKey,outputValue);
        } 
}
   
   
  public static void main(String[] args)
                  throws IOException, ClassNotFoundException, InterruptedException {
   
    Job job = new Job();
    job.setJarByClass(MyDC.class);
    job.setJobName("DCTest");
    job.setNumReduceTasks(0);
   
    try{
    DistributedCache.addCacheFile(new URI("/abc.dat"), job.getConfiguration());
    }catch(Exception e){
        System.out.println(e);
    }
   
    job.setMapperClass(MyMapper.class);
   
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);
   
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
   
    job.waitForCompletion(true);
   
   
  }
}
 

Attachments

  • DistributedCache.pdf
    73.4 KB · Views: 9
  • ProblemStatement.pdf
    53.9 KB · Views: 9

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

Below is the code for Reduce join program.

Step 1: Create a file custs and add below content into it.

Code:
4000001,Kristina,Chung,55,Pilot
4000002,Paige,Chen,74,Teacher
4000003,Sherri,Melton,34,Firefighter
4000004,Gretchen,Hill,66,Computer hardware engineer
4000005,Karen,Puckett,74,Lawyer
4000006,Patrick,Song,42,Veterinarian
4000007,Elsie,Hamilton,43,Pilot
4000008,Hazel,Bender,63,Carpenter
4000009,Malcolm,Wagner,39,Artist
4000010,Dolores,McLaughlin,60,Writer

Step 2: Create a txns file and save below content into it.

Code:
00000000,06-26-2011,4000001,040.33,Exercise & Fitness,Cardio Machine Accessories,Clarksville,Tennessee,credit
00000001,05-26-2011,4000002,198.44,Exercise & Fitness,Weightlifting Gloves,Long Beach,California,credit
00000002,06-01-2011,4000002,005.58,Exercise & Fitness,Weightlifting Machine Accessories,Anaheim,California,credit
00000003,06-05-2011,4000003,198.19,Gymnastics,Gymnastics Rings,Milwaukee,Wisconsin,credit
00000004,12-17-2011,4000002,098.81,Team Sports,Field Hockey,Nashville  ,Tennessee,credit
00000005,02-14-2011,4000004,193.63,Outdoor Recreation,Camping & Backpacking & Hiking,Chicago,Illinois,credit
00000006,10-28-2011,4000005,027.89,Puzzles,Jigsaw Puzzles,Charleston,South Carolina,credit
00000007,07-14-2011,4000006,096.01,Outdoor Play Equipment,Sandboxes,Columbus,Ohio,credit
00000008,01-17-2011,4000006,010.44,Winter Sports,Snowmobiling,Des Moines,Iowa,credit
00000009,05-17-2011,4000006,152.46,Jumping,Bungee Jumping,St. Petersburg,Florida,credit
00000010,05-29-2011,4000007,180.28,Outdoor Recreation,Archery,Reno,Nevada,credit
00000011,06-18-2011,4000009,121.39,Outdoor Play Equipment,Swing Sets,Columbus,Ohio,credit
00000012,02-08-2011,4000009,041.52,Indoor Games,Bowling,San Francisco,California,credit
00000013,03-13-2011,4000010,107.80,Team Sports,Field Hockey,Honolulu  ,Hawaii,credit
00000014,02-25-2011,4000010,036.81,Gymnastics,Vaulting Horses,Los Angeles,California,credit
00000015,10-20-2011,4000001,137.64,Combat Sports,Fencing,Honolulu  ,Hawaii,credit
00000016,05-28-2011,4000010,035.56,Exercise & Fitness,Free Weight Bars,Columbia,South Carolina,credit
00000017,10-18-2011,4000008,075.55,Water Sports,Scuba Diving & Snorkeling,Omaha,Nebraska,credit
00000018,11-18-2011,4000008,088.65,Team Sports,Baseball,Salt Lake City,Utah,credit
00000019,08-28-2011,4000008,051.81,Water Sports,Life Jackets,Newark,New Jersey,credit
00000020,06-29-2011,4000005,041.55,Exercise & Fitness,Weightlifting Belts,New Orleans,Louisiana,credit
00000021,02-14-2011,4000005,045.79,Air Sports,Parachutes,New York,New York,credit
00000022,10-10-2011,4000009,019.64,Water Sports,Kitesurfing,Saint Paul,Minnesota,credit
00000023,05-02-2011,4000009,099.50,Gymnastics,Gymnastics Rings,Springfield,Illinois,credit
00000024,06-10-2011,4000003,151.20,Water Sports,Surfing,Plano,Texas,credit
00000025,10-14-2011,4000009,144.20,Indoor Games,Darts,Phoenix,Arizona,credit
00000026,10-11-2011,4000009,031.58,Combat Sports,Wrestling,Orange,California,credit
00000027,09-29-2011,4000010,066.40,Games,Mahjong,Fremont,California,credit
00000028,05-12-2011,4000008,079.78,Team Sports,Cricket,Lexington,Kentucky,credit
00000029,06-03-2011,4000001,126.90,Outdoor Recreation,Hunting,Phoenix,Arizona,credit
00000030,03-14-2011,4000001,047.05,Water Sports,Swimming,Lincoln,Nebraska,credit
00000031,11-28-2011,4000008,005.03,Games,Dice & Dice Sets,Los Angeles,California,credit
00000032,01-29-2011,4000008,020.13,Team Sports,Soccer,Springfield,Illinois,credit
00000033,06-15-2011,4000008,154.15,Outdoor Recreation,Lawn Games,Nashville  ,Tennessee,credit
00000034,05-06-2011,4000008,098.96,Team Sports,Indoor Volleyball,Atlanta,Georgia,credit
00000035,04-12-2011,4000008,185.26,Games,Board Games,Centennial,Colorado,credit
00000036,10-13-2011,4000007,035.66,Team Sports,Football,Saint Paul,Minnesota,credit
00000037,04-19-2011,4000007,020.20,Outdoor Recreation,Shooting Games,San Diego,California,credit
00000038,08-05-2011,4000007,150.60,Outdoor Recreation,Camping & Backpacking & Hiking,Hampton  ,Virginia,credit
00000039,03-12-2011,4000006,174.36,Outdoor Play Equipment,Swing Sets,Pittsburgh,Pennsylvania,credit
00000040,11-07-2011,4000005,165.10,Team Sports,Cheerleading,Reno,Nevada,credit
00000041,04-16-2011,4000004,028.11,Indoor Games,Bowling,Westminster,Colorado,cash
00000042,09-10-2011,4000004,038.52,Outdoor Recreation,Tetherball,Denton,Texas,cash
00000043,04-22-2011,4000004,032.34,Water Sports,Water Polo,Las Vegas,Nevada,cash
00000044,09-11-2011,4000001,135.37,Water Sports,Surfing,Seattle,Washington,credit
00000045,11-27-2011,4000001,090.04,Exercise & Fitness,Abdominal Equipment,Honolulu  ,Hawaii,credit
00000046,05-27-2011,4000001,052.29,Gymnastics,Vaulting Horses,Cleveland,Ohio,credit
00000047,10-23-2011,4000008,100.10,Outdoor Play Equipment,Swing Sets,Everett,Washington,credit
00000048,09-27-2011,4000007,157.94,Exercise & Fitness,Exercise Bands,Philadelphia,Pennsylvania,credit
00000049,07-12-2011,4000010,144.59,Jumping,Jumping Stilts,Cambridge,Massachusetts,credit
00000050,10-20-2011,4000010,055.93,Jumping,Pogo Sticks,Everett,Washington,credit
00000051,02-17-2011,4000002,032.65,Water Sports,Life Jackets,Columbus,Georgia,cash
00000052,02-04-2011,4000005,044.82,Outdoor Play Equipment,Lawn Water Slides,Hampton  ,Virginia,cash
00000053,06-12-2011,4000004,044.46,Water Sports,Scuba Diving & Snorkeling,Charleston,South Carolina,cash
00000054,10-03-2011,4000007,154.87,Outdoor Recreation,Running,Long Beach,California,credit
00000055,12-16-2011,4000006,106.11,Water Sports,Swimming,New York,New York,credit
00000056,06-21-2011,4000002,176.63,Outdoor Recreation,Geocaching,Boston,Massachusetts,credit
00000057,12-20-2011,4000003,178.20,Outdoor Recreation,Skating,San Jose,California,credit
00000058,12-29-2011,4000002,194.86,Water Sports,Windsurfing,Oklahoma City,Oklahoma,credit
00000059,11-07-2011,4000001,021.43,Winter Sports,Snowboarding,Philadelphia,Pennsylvania,cash

ReduceJoin

Code:
import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class ReduceJoin {

    public static class CustsMapper extends
            Mapper<Object, Text, Text, Text> {
        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            String record = value.toString();
            String[] parts = record.split(",");
            context.write(new Text(parts[0]), new Text("custs\t" + parts[1]));
        }
    }

    public static class TxnsMapper extends
            Mapper<Object, Text, Text, Text> {
        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
            String record = value.toString();
            String[] parts = record.split(",");
            context.write(new Text(parts[2]), new Text("txns\t" + parts[3]));
        }
    }

    public static class ReduceJoinReducer extends
            Reducer<Text, Text, Text, Text> {
        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            String name = "";
            double total = 0.0;
            int count = 0;
            for (Text t : values) {
                String parts[] = t.toString().split("\t");
                if (parts[0].equals("txns")) {
                    count++;
                    total += Float.parseFloat(parts[1]);
                } else if (parts[0].equals("custs")) {
                    name = parts[1];
                }
            }
            String str = String.format("%d\t%f", count, total);
            context.write(new Text(name), new Text(str));
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "Reduce-side join");
        job.setJarByClass(ReduceJoin.class);
        job.setReducerClass(ReduceJoinReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
       
   
        MultipleInputs.addInputPath(job, new Path(args[0]),TextInputFormat.class, CustsMapper.class);
        MultipleInputs.addInputPath(job, new Path(args[1]),TextInputFormat.class, TxnsMapper.class);
        Path outputPath = new Path(args[2]);
       
       
        FileOutputFormat.setOutputPath(job, outputPath);
        outputPath.getFileSystem(conf).delete(outputPath);
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}
 

Attachments

  • Joins Problem Statement.pdf
    70.8 KB · Views: 12

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
HI All,

Please find the attached hadoop installation guide attached with this mail.
 

Attachments

  • Hadoop Installation Guide (1).pdf
    956.2 KB · Views: 17

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hello All,

I have sent the jar files in a mail to all of you from both the batches. Kindly check your mail box to find the jar files discussed with Shivang during the class.

In case you have not received it then please reply here so that I can share it immediately.
 

Kedarnath B

New Member
Customer
Hi All,

I have completed the assignment given by Shivang today. Here are the supporting docs.

Command used to execute jar: hadoop jar Hadoop_Practice.jar com.wipro.hadoop.practice.WordOccurenceCount sample.txt output1

Screen-shot of the output content and also the sample input:
upload_2016-12-21_22-11-22.png

I have attached the code written as well.
 

Attachments

  • WordOccuranceMapper.txt
    799 bytes · Views: 15
  • WordOccurenceCount.txt
    2.1 KB · Views: 12
  • WordOccuranceReducer.txt
    577 bytes · Views: 10

Anandita Choudhury

Member
Customer
Completed Assignment 1 Solution attached
 

Attachments

  • assignment1.jpg
    assignment1.jpg
    98.6 KB · Views: 27
  • WordCount.java.txt
    2.2 KB · Views: 2
  • words.txt
    13 bytes · Views: 1
Last edited:

Ravi_272

Member
Customer
Hi,

Solution for problem raised in yesterday's after noon session.

Problem : Find word length and print how many words exists for each length.

Code:
package com.training;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;


public class WordCountByLength {

    /**
     * @param args
     * @throws IOException
     * @throws IllegalArgumentException
     * @throws InterruptedException
     * @throws ClassNotFoundException
     */
    public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException {

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCountByLength.class);
      
        job.setMapperClass(MapCount.class);
        job.setReducerClass(ReduceCount.class);
      
        job.setMapOutputKeyClass(IntWritable.class);
        job.setMapOutputValueClass(IntWritable.class);
      
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(IntWritable.class);
      
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setInputFormatClass(TextInputFormat.class);
      
      
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
      
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

    public static class MapCount extends Mapper<LongWritable, Text, IntWritable, IntWritable>{

        @Override
        protected void map(LongWritable key, Text value,
                Context context)
                        throws IOException, InterruptedException {

            String line = value.toString();
            String[] words = line.split("[ ]");

            for (String word : words) {

                context.write(new IntWritable(word.length()), new IntWritable(1));
            }
        }
    }

    public static class ReduceCount extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable>{

        protected void reduce(IntWritable key, Iterable<IntWritable> values,
                Context context)
                        throws IOException, InterruptedException {

            int sum = 0;

            for(IntWritable value : values){
                sum+=value.get();
            }

            context.write(key, new IntWritable(sum));
        }
    }
}

Input file as below

A quick brown fox jumps over the lazy dog
9gag is fun, 9gag is life
Knowledge makes you confident

Output as below

1 1
2 2
3 4
4 6
5 4
9 2
 

Shashidhar Jambanour

Member
Customer
I am unable to launch Virtual box on Cloudera.
Getting error as attached.

Could anyone please help in resolving this issue.
 

Attachments

  • Virtual_Box_Error.png
    Virtual_Box_Error.png
    133.7 KB · Views: 26
  • Virtual_Box_error_details.jpg
    Virtual_Box_error_details.jpg
    194 KB · Views: 24
  • VBoxHardening.txt
    17.2 KB · Views: 2

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

Please find the mysql db server access details below for Cloud Lab db host. This db is for lab users to practise sqoop commands.

mysql db host ip : 172.31.54.174
db user name : labuser
db password : simplilearn
db port : 3306

mysql -h 172.31.54.174 -u labuser -psimplilearn
 

Shashidhar Jambanour

Member
Customer
Assignment submission:

Story: Find word length and corresponding number of words from input data.

Please find attached source code, input and output files.
 

Attachments

  • LengthAndWordCount.txt
    2.2 KB · Views: 8
  • Input.jpg
    Input.jpg
    151.2 KB · Views: 23
  • Output.jpg
    Output.jpg
    152.4 KB · Views: 20
Last edited:

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
I am unable to launch Virtual box on Cloudera.
Getting error as attached.

Could anyone please help in resolving this issue.

HI Shashidhar,

This is either because your Cloudera not loaded properly or there is some version problem.

Try again after downloading it from below link:
Please download Virtual box using this link:
https://www.virtualbox.org/wiki/Downloads
And Cloudera using this link:
http://www.cloudera.com/downloads/quickstart_vms/5-8.html
Also, please follow below video to install this on your machine:
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi, Can you please provide the link to download all Hadoop 2.7 jars required for debugging.


Hi Ganesh,

In order to achieve the same installation of VM and Hadoop on your machine I have crafted a document and I have attached it with this, please follow as it will help you not only in downloading files but also in installation.
Please follow self-explanatory document.
 

Attachments

  • Pseudo distributed mode 2 7 1..pdf
    229.1 KB · Views: 8

Nirmal kanna

Member
Customer
I used hadoop-core-1.2.1,hadoop-common-2.2.0 jars and jdk 1.8. I got error in context.write .Please check my thefile for more details.

upload_2016-12-22_14-23-11.png
 

Elavarasan_1

Member
Customer
assignment result. Code is in my personal machine i will upload it evening.
 

Attachments

  • hue.png
    hue.png
    95 KB · Views: 17
  • inputTextfile.png
    inputTextfile.png
    81.8 KB · Views: 16
  • output.png
    output.png
    90.4 KB · Views: 16
  • result.png
    result.png
    83.3 KB · Views: 16

Ganesh Padaiyachi

Member
Customer
Assignment-1 Solution.

Input - Hi this is a test
Output -
1 1
2 2
4 2

Command: hadoop jar WordLength.jar learning.WordCount /user/ganesh.padaiyachi_wipro/input.txt /user/ganesh.padaiyachi_wipro/output1
 

Attachments

  • WordCount.txt
    2.2 KB · Views: 2

Manish Pundir

Member
Customer
Getting null age every time while loading data.
Values are there in the input but still getting all the columns age null.
please find the input in attached file.
 

Attachments

  • query.txt
    1.7 KB · Views: 7

Manish Pundir

Member
Customer
Find the number of cases per location and categorize the count with respect to reason for taking loan in Hive query.
+---------+--------------------+-----------+
| id | reason | location |
+---------+--------------------+-----------+
| 1077501 | credit_card | AZ |
| 1077430 | car | GA |
| 1077175 | small_business | IL |
| 1076863 | other | CA |
| 1075358 | other | OR |
| 1075269 | wedding | AZ |
| 1069639 | debt_consolidation | NC |
| 1072053 | car | CA |
| 1071795 | small_busines | CA |
+---------+--------------------+-----------+
output should be
+----------+-------------+-----+----------------+----------+---------+-------+
| location | credit_card | car | small_business | other... | wedding | total |
+----------+-------------+-----+----------------+----------+---------+-------+
| AZ | 1 | 0 | 0 | 0 | 1 | 2 |
| CA | 0 | 1 | 1 | 0 | 0 | 2 |
+----------+-------------+-----+----------------+----------+---------+-------+
 

Anandita Choudhury

Member
Customer
Please find attached solution to
Assignment 2 : Find total number of patients looked at by each doctor
 

Attachments

  • Doctor-Patient.txt
    102 bytes · Views: 6
  • DistinctCount.java.txt
    2.3 KB · Views: 5
  • assignment2.jpg
    assignment2.jpg
    88 KB · Views: 17

Anandita Choudhury

Member
Customer
Assignment 4 : Export data from HDFS to MySQL

Solution:
sqoop export --connect jdbc:mysql://172.31.54.174/ananditaDb --username labuser --password simplilearn --table stud1 -m 1 --export-dir /user/anandita.choudhury_wipro/student.txt --driver com.mysql.jdbc.Driver
 

Attachments

  • error.jpg
    error.jpg
    130.2 KB · Views: 14

Ravi_272

Member
Customer
Hi all,

Solution for assignment #1.

Problem
Find out how many patients are treated by each doctor.

Input file hospital_data.txt as below

D1 P1
D1 P2
D1 P3
D2 P4
D2 P5
D3 P6
D3 P7
D3 P8
D3 P9
D3 P10
D4 P11
D5 P12
D5 P13

Output file as below
D1 3
D2 2
D3 5
D4 1
D5 2

Source code for the solution.

Code:
package com.training;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;


public class Assg_01_DoctorToPatient {

    /**
     * @param args
     * @throws IOException
     * @throws IllegalArgumentException
     * @throws InterruptedException
     * @throws ClassNotFoundException
     */
    public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException {

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(Assg_01_DoctorToPatient.class);
      
        job.setMapperClass(MapCount.class);
        job.setReducerClass(ReduceCount.class);
      
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
      
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
      
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setInputFormatClass(TextInputFormat.class);
      
      
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
      
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

    public static class MapCount extends Mapper<LongWritable, Text, Text, IntWritable>{

        @Override
        protected void map(LongWritable key, Text value,
                Context context)
                        throws IOException, InterruptedException {

            String line = value.toString();
            String[] words = line.split("[ ]");

            context.write(new Text(words[0]), new IntWritable(1));
          
        }
    }

    public static class ReduceCount extends Reducer<Text, IntWritable, Text, IntWritable>{

        protected void reduce(Text key, Iterable<IntWritable> values,
                Context context)
                        throws IOException, InterruptedException {

            int sum = 0;

            for(IntWritable value : values){
                sum+=value.get();
            }

            context.write(key, new IntWritable(sum));
        }
    }
}
 

Attachments

  • hospital_data.txt
    82 bytes · Views: 2

praveen.rachapally

Member
Customer
Requirement:
write a Map Reduce Program for the below details

i/o:
i am the
you is i

o/p:
(1,1)
(2,2)
(3,3)
 

Attachments

  • test2.zip
    1.2 KB · Views: 2
  • part-r-00000 - File Viewer.pdf
    107.6 KB · Views: 1

praveen.rachapally

Member
Customer
assignment: doctor - patient

input:
D1, P1
D2, P2
D3, P3
D1, P2
D2, P3
D3, P1

output:
D1, 2
D2, 2
D3, 2
 

Attachments

  • doctor_patient.zip
    1.5 KB · Views: 6
  • part-r-00000 - File Viewer_dp.pdf
    109.5 KB · Views: 4

Khuzema Challawala

Member
Customer
assignment: doctor - patient

Code:
package mapr.test;

import java.io.IOException;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class PatientCount {

    public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>{
           private final static IntWritable one = new IntWritable(1);
           private Text Doc = new Text();
          
           public void map(LongWritable key,Text value,Context cxt) throws IOException, InterruptedException {
               String[] result = value.toString().split(",");
               Doc.set(result[0]);
               cxt.write(Doc, one);
           }
          
    }
    public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable> {
         private IntWritable result = new IntWritable();
        
         public void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException, InterruptedException {
             int sum = 0;
             for (IntWritable val : values) {
                 sum += val.get();
             }
             result.set(sum);
             context.write(key, result);
         }
    }

    public static void main(String[] args) throws Exception {
          Configuration conf = new Configuration();
          Job job = Job.getInstance(conf, "Patient Count");
          job.setJarByClass(PatientCount.class);
          job.setMapperClass(Map.class);
          job.setReducerClass(Reduce.class);
          job.setOutputKeyClass(Text.class);
          job.setOutputValueClass(IntWritable.class);
          FileInputFormat.addInputPath(job, new Path(args[0]));
          FileOutputFormat.setOutputPath(job, new Path(args[1]));
          System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}
 

praveen.rachapally

Member
Customer
assignment:
sqoop export

sqoop export --connect jdbc:mysql://172.31.54.174/simplilearn --driver com.mysql.jdbc.Driver --username labuser --password simplilearn --table customer_tbl --export-dir input/customer.txt
 

Attachments

  • sqoop1.pdf
    60.3 KB · Views: 12
  • sqoop2.pdf
    58 KB · Views: 5
  • table.pdf
    56.1 KB · Views: 7
  • customer.txt
    371 bytes · Views: 6

Gaurav Khandelwal_2

Member
Customer
Hi Team,
Getting error while connection to mysql
[gaurav.khandelwal1_wipro@ec2-52-86-42-143 ~]$ mysql -h jdbc:mysql://172.31.54.174 -ulabuser -psimplilearn
ERROR 2005 (HY000): Unknown MySQL server host 'jdbc:mysql://172.31.54.174' (1)
Can anyone help me?
Thanks
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi Team,
Getting error while connection to mysql
[gaurav.khandelwal1_wipro@ec2-52-86-42-143 ~]$ mysql -h jdbc:mysql://172.31.54.174 -ulabuser -psimplilearn
ERROR 2005 (HY000): Unknown MySQL server host 'jdbc:mysql://172.31.54.174' (1)
Can anyone help me?
Thanks

Hi Gaurav,

The command is incorrect, check this one.

mysql -h 172.31.54.174 -u labuser -p <press enter>

simplilearn
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,

Please complete the assignment mentioned by Shivank during the class

1) Dataset like :
D1 P1
D2 P2
D1 P3
D1 P4

D1 denotes doctor ID and P1denotes paitent ID. Calculate how many paitent each doctor treated

2) Debug all codes using debugger methods

3) Use Export method for sqoop for data in HDFS
 

Ravi_272

Member
Customer
Hi All,

Please complete the assignment mentioned by Shivank during the class

1) Dataset like :
D1 P1
D2 P2
D1 P3
D1 P4

D1 denotes doctor ID and P1denotes paitent ID. Calculate how many paitent each doctor treated

2) Debug all codes using debugger methods

3) Use Export method for sqoop for data in HDFS
Hi All,

Please complete the assignment mentioned by Shivank during the class

1) Dataset like :
D1 P1
D2 P2
D1 P3
D1 P4

D1 denotes doctor ID and P1denotes paitent ID. Calculate how many paitent each doctor treated

2) Debug all codes using debugger methods

3) Use Export method for sqoop for data in HDFS
Hi Deshdeep, Can you please post the HDFS file that we have to use to export to mysql DB?
 

Shivank_4

Moderator
Hi All,

I have completed the assignment given by Shivang today. Here are the supporting docs.

Command used to execute jar: hadoop jar Hadoop_Practice.jar com.wipro.hadoop.practice.WordOccurenceCount sample.txt output1

Screen-shot of the output content and also the sample input:
View attachment 1242

I have attached the code written as well.
Liked your approach. Please also try similarly for doctor patient assignment.
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi Deshdeep, Can you please post the HDFS file that we have to use to export to mysql DB?
Hi Ravi,

I have shared those files using google drive for which I have already given you access yesterday. Please go ahead and download from there.
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,


Please find the recordings of session on WIPRO – Big Data Academy Batch 1 – all class recordings conducted till date.


19-Dec Batch 1 – Day1

https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%201/BIG%20DATA%20ACADEMY%20Day%202.mp4

20-Dec Batch 1 – Day2

https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%201/Batch%201%20-%20day%202.mp4

21-Dec Batch 1 - Day3

https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%201/Batch%201%20-%20Day%203.mp4

22-Dec Batch 1 - Day4

https://s3.amazonaws.com/downloads.simplilearn.com/lvc/December%202016/B2B/Wipro%20-%20Big%20Data%20Academy/Batch%201/Batch%201%20-%20day%204.mp4


These recordings can be played online as soon as you enter the URL or click the given recording links.


Also, to save these recordings, Right click --> Save Video As --> Choose your path to save these files and it will start downloading.

Regards
Desh
 

DeshDeep Singh

Well-Known Member
Simplilearn Support
Alumni
Hi All,


Please find the recordings of session on WIPRO – Big Data Academy Batch 1 – all class recordings conducted till date.


19-Dec Batch 2 – Day1

https://s3.amazonaws.com/downloads....Big Data Academy/Batch 2/BIG DATA ACADEMY.mp4


20-Dec Batch 2 – Day2
https://s3.amazonaws.com/downloads....ta Academy/Batch 2/BIG DATA ACADEMY Day 2.mp4


21-Dec Batch 2 – Day3 https://s3.amazonaws.com/downloads....r 2016/B2B/BIG DATA ACADEMY 21st Dec 2016.mp4



22-Dec Batch 2 – Day4
https://s3.amazonaws.com/downloads....o - Big Data Academy/Batch 2/Day 4 Part 2.mp4


These recordings can be played online as soon as you enter the URL or click the given recording links.


Also, to save these recordings, Right click --> Save Video As --> Choose your path to save these files and it will start downloading.

Regards
Desh
 
Top