Мы пытаемся разработать простую программу, в которой цель состоит в том, чтобы прочитать патентные данные из файла и проверить, ссылаются ли другие страны на этот патент или нет, это из текста книга 'Hadoop in Action'
от 'chuck Lam'
, где мы пытаемся узнать о advanced map/reduce programming
.Программа распределенного кэширования Hadoop не генерирует выход
Распределение hadoop, которое у нас установлено, Local Node
, и мы выполняем программу в Windows environment
, используя cygwin
.
Это URL http://www.nber.org/patents/
, из которого мы скачали файлы: apat63_99.txt
и cite75_99.txt
.
В качестве файлов распределенного кеша мы используем , а 'cite75_99.txt'
находится в папке input
, которую мы передаем из параметров командной строки.
Проблема в том, что программа не генерирует выходные данные, выходные файлы, которые мы видим, не содержат данных.
Мы попытались с фазой отображения, а также выход фазы редуктора, и оба они пустые.
Вот код, который мы разработали для этой задачи:
package com.sample.patent;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Hashtable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class country_cite {
private static Hashtable<String, String> joinData
= new Hashtable<String, String>();
public static class Country_Citation_Class extends
Mapper<Text, Text, Text, Text> {
Path[] cacheFiles;
public void configure(JobConf conf) {
try {
cacheFiles = DistributedCache.getLocalCacheArchives(conf);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void map(Text key, Text value, Context context)
throws IOException, InterruptedException {
if (cacheFiles != null && cacheFiles.length > 0) {
String line;
String[] tokens;
BufferedReader joinReader = new BufferedReader(new FileReader(
cacheFiles[0].toString()));
try {
while ((line = joinReader.readLine()) != null) {
tokens = line.split(",");
joinData.put(tokens[0], tokens[4]);
}
} finally {
joinReader.close();
}
}
if (joinData.get(key) != null)
context.write(key, new Text(joinData.get(key)));
}
}
public static class MyReduceClass extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String patent_country = joinData.get(key);
if (patent_country != null) {
for (Text val : values) {
String cited_country = joinData.get(val);
if (cited_country != null
&& !cited_country.equals(patent_country)) {
context.write(key, new Text(cited_country));
}
}
}
}
}
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
DistributedCache.addCacheFile(new Path(args[0]).toUri(),
conf);
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 3) {
System.err.println("Usage: country_cite <in> <out>");
System.exit(2);
}
Job job = new Job(conf,"country_cite");
job.setJarByClass(country_cite.class);
job.setMapperClass(Country_Citation_Class.class);
job.setInputFormatClass(org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat.class);
// job.setReducerClass(MyReduceClass.class);
job.setNumReduceTasks(0);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[1]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Инструмент Eclipse
и Hadoop's version
, которые мы используем, 1.2.1
.
Эти параметры командной строки для запуска задания:
/cygdrive/c/cygwin64/usr/local/hadoop
$ bin/hadoop jar PatentCitation.jar country_cite apat63_99.txt input output
Это след, который получает генерируется во время выполнения программы:
/cygdrive/c/cygwin64/usr/local/hadoop
$ bin/hadoop jar PatentCitation.jar country_cite apat63_99.txt input output
Patch for HADOOP-7682: Instantiating workaround file system
14/06/22 12:39:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging to 0700
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001 to 0700
14/06/22 12:39:21 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/06/22 12:39:21 INFO input.FileInputFormat: Total input paths to process : 1
14/06/22 12:39:21 WARN snappy.LoadSnappy: Snappy native library not loaded
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.split": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.split to 0644
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.splitmetainfo": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.splitmetainfo to 0644
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.xml": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.xml to 0644
14/06/22 12:39:23 INFO filecache.TrackerDistributedCacheManager: Creating fileapat63_99.txt in /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498-work-5016028422992714806 with rwxr-xr-x
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "/tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498-work-5016028422992714806": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\local\archive\7067728792316735217_-679065598_1881640498-work-5016028422992714806 to 0755
14/06/22 12:40:06 INFO filecache.TrackerDistributedCacheManager: Cached apat63_99.txt as /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498/fileapat63_99.txt
14/06/22 12:40:08 INFO filecache.TrackerDistributedCacheManager: Cached apat63_99.txt as /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498/fileapat63_99.txt
14/06/22 12:40:09 INFO mapred.JobClient: Running job: job_local1277400315_0001
14/06/22 12:40:10 INFO mapred.LocalJobRunner: Waiting for map tasks
14/06/22 12:40:10 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000000_0
14/06/22 12:40:10 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:10 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:0+33554432
14/06/22 12:40:10 INFO mapred.JobClient: map 0% reduce 0%
14/06/22 12:40:15 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000000_0 is done. And is in the process of commiting
14/06/22 12:40:15 INFO mapred.LocalJobRunner:
14/06/22 12:40:15 INFO mapred.Task: Task attempt_local1277400315_0001_m_000000_0 is allowed to commit now
14/06/22 12:40:15 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000000_0' to output
14/06/22 12:40:15 INFO mapred.LocalJobRunner:
14/06/22 12:40:15 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000000_0' done.
14/06/22 12:40:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000000_0
14/06/22 12:40:15 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000001_0
14/06/22 12:40:15 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:15 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:33554432+33554432
14/06/22 12:40:16 INFO mapred.JobClient: map 12% reduce 0%
14/06/22 12:40:21 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000001_0 is done. And is in the process of commiting
14/06/22 12:40:21 INFO mapred.LocalJobRunner:
14/06/22 12:40:21 INFO mapred.Task: Task attempt_local1277400315_0001_m_000001_0 is allowed to commit now
14/06/22 12:40:21 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000001_0' to output
14/06/22 12:40:21 INFO mapred.LocalJobRunner:
14/06/22 12:40:21 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000001_0' done.
14/06/22 12:40:21 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000001_0
14/06/22 12:40:21 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000002_0
14/06/22 12:40:21 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:21 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:67108864+33554432
14/06/22 12:40:21 INFO mapred.JobClient: map 25% reduce 0%
14/06/22 12:40:26 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000002_0 is done. And is in the process of commiting
14/06/22 12:40:26 INFO mapred.LocalJobRunner:
14/06/22 12:40:26 INFO mapred.Task: Task attempt_local1277400315_0001_m_000002_0 is allowed to commit now
14/06/22 12:40:26 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000002_0' to output
14/06/22 12:40:26 INFO mapred.LocalJobRunner:
14/06/22 12:40:26 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000002_0' done.
14/06/22 12:40:26 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000002_0
14/06/22 12:40:26 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000003_0
14/06/22 12:40:26 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:26 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:100663296+33554432
14/06/22 12:40:26 INFO mapred.JobClient: map 37% reduce 0%
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.JobClient: map 42% reduce 0%
14/06/22 12:40:29 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000003_0 is done. And is in the process of commiting
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.Task: Task attempt_local1277400315_0001_m_000003_0 is allowed to commit now
14/06/22 12:40:29 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000003_0' to output
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000003_0' done.
14/06/22 12:40:29 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000003_0
14/06/22 12:40:29 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000004_0
14/06/22 12:40:29 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:29 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:134217728+33554432
14/06/22 12:40:30 INFO mapred.JobClient: map 50% reduce 0%
14/06/22 12:40:30 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000004_0 is done. And is in the process of commiting
14/06/22 12:40:30 INFO mapred.LocalJobRunner:
14/06/22 12:40:30 INFO mapred.Task: Task attempt_local1277400315_0001_m_000004_0 is allowed to commit now
14/06/22 12:40:30 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000004_0' to output
14/06/22 12:40:30 INFO mapred.LocalJobRunner:
14/06/22 12:40:30 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000004_0' done.
14/06/22 12:40:30 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000004_0
14/06/22 12:40:30 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000005_0
14/06/22 12:40:30 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:30 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:167772160+33554432
14/06/22 12:40:31 INFO mapred.JobClient: map 62% reduce 0%
14/06/22 12:40:31 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000005_0 is done. And is in the process of commiting
14/06/22 12:40:31 INFO mapred.LocalJobRunner:
14/06/22 12:40:31 INFO mapred.Task: Task attempt_local1277400315_0001_m_000005_0 is allowed to commit now
14/06/22 12:40:31 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000005_0' to output
14/06/22 12:40:31 INFO mapred.LocalJobRunner:
14/06/22 12:40:31 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000005_0' done.
14/06/22 12:40:31 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000005_0
14/06/22 12:40:31 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000006_0
14/06/22 12:40:31 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:31 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:201326592+33554432
14/06/22 12:40:32 INFO mapred.JobClient: map 75% reduce 0%
14/06/22 12:40:32 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000006_0 is done. And is in the process of commiting
14/06/22 12:40:32 INFO mapred.LocalJobRunner:
14/06/22 12:40:32 INFO mapred.Task: Task attempt_local1277400315_0001_m_000006_0 is allowed to commit now
14/06/22 12:40:32 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000006_0' to output
14/06/22 12:40:32 INFO mapred.LocalJobRunner:
14/06/22 12:40:32 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000006_0' done.
14/06/22 12:40:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000006_0
14/06/22 12:40:32 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000007_0
14/06/22 12:40:32 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:33 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:234881024+29194407
14/06/22 12:40:33 INFO mapred.JobClient: map 87% reduce 0%
14/06/22 12:40:35 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000007_0 is done. And is in the process of commiting
14/06/22 12:40:35 INFO mapred.LocalJobRunner:
14/06/22 12:40:35 INFO mapred.Task: Task attempt_local1277400315_0001_m_000007_0 is allowed to commit now
14/06/22 12:40:35 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000007_0' to output
14/06/22 12:40:35 INFO mapred.LocalJobRunner:
14/06/22 12:40:35 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000007_0' done.
14/06/22 12:40:35 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000007_0
14/06/22 12:40:35 INFO mapred.LocalJobRunner: Map task executor complete.
14/06/22 12:40:35 INFO mapred.JobClient: map 100% reduce 0%
14/06/22 12:40:35 INFO mapred.JobClient: Job complete: job_local1277400315_0001
14/06/22 12:40:35 INFO mapred.JobClient: Counters: 9
14/06/22 12:40:35 INFO mapred.JobClient: File Output Format Counters
14/06/22 12:40:35 INFO mapred.JobClient: Bytes Written=64
14/06/22 12:40:35 INFO mapred.JobClient: FileSystemCounters
14/06/22 12:40:35 INFO mapred.JobClient: FILE_BYTES_READ=5009033659
14/06/22 12:40:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3820489832
14/06/22 12:40:35 INFO mapred.JobClient: File Input Format Counters
14/06/22 12:40:35 INFO mapred.JobClient: Bytes Read=264104103
14/06/22 12:40:35 INFO mapred.JobClient: Map-Reduce Framework
14/06/22 12:40:35 INFO mapred.JobClient: Map input records=16522439
14/06/22 12:40:35 INFO mapred.JobClient: Spilled Records=0
14/06/22 12:40:35 INFO mapred.JobClient: Total committed heap usage (bytes)=708313088
14/06/22 12:40:35 INFO mapred.JobClient: Map output records=0
14/06/22 12:40:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=952
Пожалуйста, дайте нам знать, где мы будем неправильно , если я пропустил какую-либо важную информацию, дайте мне знать.
Спасибо и наилучшими пожеланиями