[AWS] Hadoop MapReduce 기반 word Counter 구현하기

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

컴공생 누르지 마세요! 컴공생 울어요.

[AWS] Hadoop MapReduce 기반 word Counter 구현하기 본문

STUDY/클라우드 컴퓨팅

[AWS] Hadoop MapReduce 기반 word Counter 구현하기

당도최고치악산멜론 2022. 12. 19. 13:10

이번 게시글은 저번 글인 "[AWS] Hadoop MapReduce 기반 알파벳 Counter 구현하기"를 변형하여, word counter를 구현해 볼 것이다.

사실 aws 관련 모든 세팅은 저번 게시글과 동일하고, 자바 코드만 살짝 수정하면 되기 때문에 이번 글에서는 word counter 구현 코드만 다룰 것이다.

AWS와 하둡 맵리듀스 관련 세팅이 궁금하다면 저번 글을 참고하길 바란다.

https://kwonppo.tistory.com/31

[AWS] Hadoop MapReduce 기반 알파벳 Counter 구현하기

이번 게시글에서는 AWS EC2에서 Hadoop MapReduce framework를 사용하여 알파벳 counter를 구현해 볼 것이다. 알파벳 counter는 이름 그대로 input string을 구성하고 있는 특정 알파벳이 몇 개인지 카운트하는 작

kwonppo.tistory.com

Word Counter 코드

aws EC2 instance에 다음 세가지 자바 파일을 vim을 통해 생성해준다. 코드는 다음과 같다.

(1) WordCounter.java

import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount{
    public static void main(String[] args) throws IOException {
        // 1. Configure Mapper & Reducer of Hadoop
        JobConf conf = new JobConf(WordCount.class);
        conf.setJobName("wordcount");
        conf.setMapperClass(WordCountMapper.class);
        conf.setCombinerClass(WordCountReducer.class);
        conf.setReducerClass(WordCountReducer.class);

        // 2. fianl output key type & value type
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        // 3. in/output format
        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        // 4. set the path of file for read files
        // input path: args[0]
        // output path: args[1]
        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        // 5. run job
        JobClient.runJob(conf);
    }
}

(2) WordCountMapper.java

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>{
    private Text word = new Text();

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException{
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while(tokenizer.hasMoreTokens()){
            word.set(tokenizer.nextToken());
            output.collect(word, new IntWritable(1));
        }
    }
}

(3) WordCountReducer.java

import java.io.IOException;
import java.util.StringTokenizer;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{
    public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException{
        int sum = 0;
        while(values.hasNext()){
            sum += values.next().get();
        }
        output.collect(key, new IntWritable(sum));
    }
}

java 파일을 만들어 준 후, 컴파일 및 실행하면 input 파일에서 특정 word가 몇개 있는지 카운트된 output 파일을 얻을 수 있다.

'STUDY > 클라우드 컴퓨팅' 카테고리의 다른 글

[AWS] Hadoop MapReduce 기반 알파벳 Counter 구현하기 (0)	2022.12.01
[AWS] WordPress blog 생성 및 DDoS attack 방어 (5) (0)	2022.11.23
[AWS] WordPress blog 생성 및 DDoS attack 방어 (4) (0)	2022.11.23
[AWS] WordPress blog 생성 및 DDoS attack 방어 (3) (0)	2022.11.23
[AWS] WordPress blog 생성 및 DDoS attack 방어 (2) (0)	2022.11.23

'STUDY/클라우드 컴퓨팅' Related Articles

Comments

컴공생 누르지 마세요! 컴공생 울어요.

[AWS] Hadoop MapReduce 기반 word Counter 구현하기 본문

[AWS] Hadoop MapReduce 기반 word Counter 구현하기

Word Counter 코드

(1) WordCounter.java

'STUDY > 클라우드 컴퓨팅' 카테고리의 다른 글

티스토리툴바