2014-06-22 4 views
0

Мы пытаемся разработать простую программу, в которой цель состоит в том, чтобы прочитать патентные данные из файла и проверить, ссылаются ли другие страны на этот патент или нет, это из текста книга 'Hadoop in Action' от 'chuck Lam', где мы пытаемся узнать о advanced map/reduce programming.Программа распределенного кэширования Hadoop не генерирует выход

Распределение hadoop, которое у нас установлено, Local Node, и мы выполняем программу в Windows environment, используя cygwin.

Это URL http://www.nber.org/patents/, из которого мы скачали файлы: apat63_99.txt и cite75_99.txt.

В качестве файлов распределенного кеша мы используем , а 'cite75_99.txt' находится в папке input, которую мы передаем из параметров командной строки.

Проблема в том, что программа не генерирует выходные данные, выходные файлы, которые мы видим, не содержат данных.

Мы попытались с фазой отображения, а также выход фазы редуктора, и оба они пустые.

Вот код, который мы разработали для этой задачи:

package com.sample.patent; 

import java.io.BufferedReader; 
import java.io.FileReader; 
import java.io.IOException; 
import java.util.Hashtable; 

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.filecache.DistributedCache; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapred.JobConf; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.Mapper; 
import org.apache.hadoop.mapreduce.Reducer; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 
import org.apache.hadoop.util.GenericOptionsParser; 


public class country_cite { 

    private static Hashtable<String, String> joinData 
        = new Hashtable<String, String>(); 

    public static class Country_Citation_Class extends 
     Mapper<Text, Text, Text, Text> { 


     Path[] cacheFiles; 

     public void configure(JobConf conf) { 
      try { 

       cacheFiles = DistributedCache.getLocalCacheArchives(conf); 
      } catch (IOException e) { 
       // TODO Auto-generated catch block 
       e.printStackTrace(); 
      } 

     } 

     public void map(Text key, Text value, Context context) 
       throws IOException, InterruptedException { 
      if (cacheFiles != null && cacheFiles.length > 0) { 
       String line; 
       String[] tokens; 
       BufferedReader joinReader = new BufferedReader(new FileReader(
         cacheFiles[0].toString())); 
       try { 
        while ((line = joinReader.readLine()) != null) { 
         tokens = line.split(","); 
         joinData.put(tokens[0], tokens[4]); 
        } 
       } finally { 
        joinReader.close(); 
       } 

      } 

      if (joinData.get(key) != null) 
       context.write(key, new Text(joinData.get(key))); 
     } 

    } 

    public static class MyReduceClass extends Reducer<Text, Text, Text, Text> { 

     public void reduce(Text key, Iterable<Text> values, Context context) 
       throws IOException, InterruptedException { 

      String patent_country = joinData.get(key); 
      if (patent_country != null) { 
       for (Text val : values) { 
        String cited_country = joinData.get(val); 
        if (cited_country != null 
          && !cited_country.equals(patent_country)) { 
         context.write(key, new Text(cited_country)); 
        } 
       } 
      } 
     } 
    } 

    public static void main(String[] args) throws Exception { 
     // TODO Auto-generated method stub 
     Configuration conf = new Configuration(); 

     DistributedCache.addCacheFile(new Path(args[0]).toUri(), 
       conf); 

     String[] otherArgs = new GenericOptionsParser(conf, args) 
       .getRemainingArgs(); 
     if (otherArgs.length != 3) { 
      System.err.println("Usage: country_cite <in> <out>"); 
      System.exit(2); 
     } 
     Job job = new Job(conf,"country_cite");  
     job.setJarByClass(country_cite.class); 
     job.setMapperClass(Country_Citation_Class.class); 
     job.setInputFormatClass(org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat.class); 
     // job.setReducerClass(MyReduceClass.class); 
     job.setNumReduceTasks(0); 
     job.setOutputKeyClass(Text.class); 
     job.setOutputValueClass(Text.class); 
     FileInputFormat.addInputPath(job, new Path(otherArgs[1])); 
     FileOutputFormat.setOutputPath(job, new Path(otherArgs[2])); 
     System.exit(job.waitForCompletion(true) ? 0 : 1); 

    } 

} 

Инструмент Eclipse и Hadoop's version, которые мы используем, 1.2.1.

Эти параметры командной строки для запуска задания:

/cygdrive/c/cygwin64/usr/local/hadoop 
$ bin/hadoop jar PatentCitation.jar country_cite apat63_99.txt input output 

Это след, который получает генерируется во время выполнения программы:

/cygdrive/c/cygwin64/usr/local/hadoop 
$ bin/hadoop jar PatentCitation.jar country_cite apat63_99.txt input output 
Patch for HADOOP-7682: Instantiating workaround file system 
14/06/22 12:39:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging to 0700 
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001 to 0700 
14/06/22 12:39:21 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 
14/06/22 12:39:21 INFO input.FileInputFormat: Total input paths to process : 1 
14/06/22 12:39:21 WARN snappy.LoadSnappy: Snappy native library not loaded 
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.split": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.split to 0644 
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.splitmetainfo": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.splitmetainfo to 0644 
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.xml": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.xml to 0644 
14/06/22 12:39:23 INFO filecache.TrackerDistributedCacheManager: Creating fileapat63_99.txt in /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498-work-5016028422992714806 with rwxr-xr-x 
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "/tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498-work-5016028422992714806": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\local\archive\7067728792316735217_-679065598_1881640498-work-5016028422992714806 to 0755 
14/06/22 12:40:06 INFO filecache.TrackerDistributedCacheManager: Cached apat63_99.txt as /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498/fileapat63_99.txt 
14/06/22 12:40:08 INFO filecache.TrackerDistributedCacheManager: Cached apat63_99.txt as /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498/fileapat63_99.txt 
14/06/22 12:40:09 INFO mapred.JobClient: Running job: job_local1277400315_0001 
14/06/22 12:40:10 INFO mapred.LocalJobRunner: Waiting for map tasks 
14/06/22 12:40:10 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000000_0 
14/06/22 12:40:10 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
14/06/22 12:40:10 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:0+33554432 
14/06/22 12:40:10 INFO mapred.JobClient: map 0% reduce 0% 
14/06/22 12:40:15 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000000_0 is done. And is in the process of commiting 
14/06/22 12:40:15 INFO mapred.LocalJobRunner: 
14/06/22 12:40:15 INFO mapred.Task: Task attempt_local1277400315_0001_m_000000_0 is allowed to commit now 
14/06/22 12:40:15 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000000_0' to output 
14/06/22 12:40:15 INFO mapred.LocalJobRunner: 
14/06/22 12:40:15 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000000_0' done. 
14/06/22 12:40:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000000_0 
14/06/22 12:40:15 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000001_0 
14/06/22 12:40:15 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
14/06/22 12:40:15 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:33554432+33554432 
14/06/22 12:40:16 INFO mapred.JobClient: map 12% reduce 0% 
14/06/22 12:40:21 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000001_0 is done. And is in the process of commiting 
14/06/22 12:40:21 INFO mapred.LocalJobRunner: 
14/06/22 12:40:21 INFO mapred.Task: Task attempt_local1277400315_0001_m_000001_0 is allowed to commit now 
14/06/22 12:40:21 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000001_0' to output 
14/06/22 12:40:21 INFO mapred.LocalJobRunner: 
14/06/22 12:40:21 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000001_0' done. 
14/06/22 12:40:21 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000001_0 
14/06/22 12:40:21 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000002_0 
14/06/22 12:40:21 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
14/06/22 12:40:21 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:67108864+33554432 
14/06/22 12:40:21 INFO mapred.JobClient: map 25% reduce 0% 
14/06/22 12:40:26 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000002_0 is done. And is in the process of commiting 
14/06/22 12:40:26 INFO mapred.LocalJobRunner: 
14/06/22 12:40:26 INFO mapred.Task: Task attempt_local1277400315_0001_m_000002_0 is allowed to commit now 
14/06/22 12:40:26 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000002_0' to output 
14/06/22 12:40:26 INFO mapred.LocalJobRunner: 
14/06/22 12:40:26 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000002_0' done. 
14/06/22 12:40:26 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000002_0 
14/06/22 12:40:26 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000003_0 
14/06/22 12:40:26 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
14/06/22 12:40:26 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:100663296+33554432 
14/06/22 12:40:26 INFO mapred.JobClient: map 37% reduce 0% 
14/06/22 12:40:29 INFO mapred.LocalJobRunner: 
14/06/22 12:40:29 INFO mapred.JobClient: map 42% reduce 0% 
14/06/22 12:40:29 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000003_0 is done. And is in the process of commiting 
14/06/22 12:40:29 INFO mapred.LocalJobRunner: 
14/06/22 12:40:29 INFO mapred.Task: Task attempt_local1277400315_0001_m_000003_0 is allowed to commit now 
14/06/22 12:40:29 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000003_0' to output 
14/06/22 12:40:29 INFO mapred.LocalJobRunner: 
14/06/22 12:40:29 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000003_0' done. 
14/06/22 12:40:29 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000003_0 
14/06/22 12:40:29 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000004_0 
14/06/22 12:40:29 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
14/06/22 12:40:29 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:134217728+33554432 
14/06/22 12:40:30 INFO mapred.JobClient: map 50% reduce 0% 
14/06/22 12:40:30 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000004_0 is done. And is in the process of commiting 
14/06/22 12:40:30 INFO mapred.LocalJobRunner: 
14/06/22 12:40:30 INFO mapred.Task: Task attempt_local1277400315_0001_m_000004_0 is allowed to commit now 
14/06/22 12:40:30 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000004_0' to output 
14/06/22 12:40:30 INFO mapred.LocalJobRunner: 
14/06/22 12:40:30 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000004_0' done. 
14/06/22 12:40:30 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000004_0 
14/06/22 12:40:30 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000005_0 
14/06/22 12:40:30 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
14/06/22 12:40:30 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:167772160+33554432 
14/06/22 12:40:31 INFO mapred.JobClient: map 62% reduce 0% 
14/06/22 12:40:31 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000005_0 is done. And is in the process of commiting 
14/06/22 12:40:31 INFO mapred.LocalJobRunner: 
14/06/22 12:40:31 INFO mapred.Task: Task attempt_local1277400315_0001_m_000005_0 is allowed to commit now 
14/06/22 12:40:31 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000005_0' to output 
14/06/22 12:40:31 INFO mapred.LocalJobRunner: 
14/06/22 12:40:31 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000005_0' done. 
14/06/22 12:40:31 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000005_0 
14/06/22 12:40:31 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000006_0 
14/06/22 12:40:31 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
14/06/22 12:40:31 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:201326592+33554432 
14/06/22 12:40:32 INFO mapred.JobClient: map 75% reduce 0% 
14/06/22 12:40:32 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000006_0 is done. And is in the process of commiting 
14/06/22 12:40:32 INFO mapred.LocalJobRunner: 
14/06/22 12:40:32 INFO mapred.Task: Task attempt_local1277400315_0001_m_000006_0 is allowed to commit now 
14/06/22 12:40:32 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000006_0' to output 
14/06/22 12:40:32 INFO mapred.LocalJobRunner: 
14/06/22 12:40:32 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000006_0' done. 
14/06/22 12:40:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000006_0 
14/06/22 12:40:32 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000007_0 
14/06/22 12:40:32 INFO mapred.Task: Using ResourceCalculatorPlugin : null 
14/06/22 12:40:33 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:234881024+29194407 
14/06/22 12:40:33 INFO mapred.JobClient: map 87% reduce 0% 
14/06/22 12:40:35 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000007_0 is done. And is in the process of commiting 
14/06/22 12:40:35 INFO mapred.LocalJobRunner: 
14/06/22 12:40:35 INFO mapred.Task: Task attempt_local1277400315_0001_m_000007_0 is allowed to commit now 
14/06/22 12:40:35 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000007_0' to output 
14/06/22 12:40:35 INFO mapred.LocalJobRunner: 
14/06/22 12:40:35 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000007_0' done. 
14/06/22 12:40:35 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000007_0 
14/06/22 12:40:35 INFO mapred.LocalJobRunner: Map task executor complete. 
14/06/22 12:40:35 INFO mapred.JobClient: map 100% reduce 0% 
14/06/22 12:40:35 INFO mapred.JobClient: Job complete: job_local1277400315_0001 
14/06/22 12:40:35 INFO mapred.JobClient: Counters: 9 
14/06/22 12:40:35 INFO mapred.JobClient: File Output Format Counters 
14/06/22 12:40:35 INFO mapred.JobClient:  Bytes Written=64 
14/06/22 12:40:35 INFO mapred.JobClient: FileSystemCounters 
14/06/22 12:40:35 INFO mapred.JobClient:  FILE_BYTES_READ=5009033659 
14/06/22 12:40:35 INFO mapred.JobClient:  FILE_BYTES_WRITTEN=3820489832 
14/06/22 12:40:35 INFO mapred.JobClient: File Input Format Counters 
14/06/22 12:40:35 INFO mapred.JobClient:  Bytes Read=264104103 
14/06/22 12:40:35 INFO mapred.JobClient: Map-Reduce Framework 
14/06/22 12:40:35 INFO mapred.JobClient:  Map input records=16522439 
14/06/22 12:40:35 INFO mapred.JobClient:  Spilled Records=0 
14/06/22 12:40:35 INFO mapred.JobClient:  Total committed heap usage (bytes)=708313088 
14/06/22 12:40:35 INFO mapred.JobClient:  Map output records=0 
14/06/22 12:40:35 INFO mapred.JobClient:  SPLIT_RAW_BYTES=952 

Пожалуйста, дайте нам знать, где мы будем неправильно , если я пропустил какую-либо важную информацию, дайте мне знать.

Спасибо и наилучшими пожеланиями

ответ

1

Я думаю, что ошибка в строке if (joinData.get(key) != null). joinData использует String как ключ, и вы передаете Text в качестве аргумента для get, так get возвращает null каждый раз. Попробуйте заменить эту строку на if (joinData.get(key.toString()) != null).

Другая ошибка состоит в том, что каждый Mapper и каждый Reducer работать в своем собственном JVM так Reducer и Mapper не могут обмениваться данными через статические объекты и joinData пусто для каждого Reducer.

Смежные вопросы