2013-10-04 2 views
2

Я читаю учебник по улью в книге О'Рейли Хэдоопа Том Уайт. Я пытаюсь сделать таблицу в квадратных скобках, но я не могу заставить Hive создать ведра. Я могу создать таблицу и загрузить в нее данные, но все данные затем сохраняются в одном файле.Улей, не обеспечивающий балансировку

Я запускаю псевдораспределенный кластер Hadoop. Я использую Hadoop 1.2.1 и Hive 0.10.0 с метастаром MySql.

Данные (показаны ниже) первоначально находятся в таблице «пользователи». Они должны быть помещены в таблицу с 4 ведрами, то есть с одним пользователем на каждый ковш.

select * from users; 
OK 
id  name 
0  Nat 
2  Joe 
3  Kay 
4  Ann 

Я установить свойства ниже в попытке обеспечения соблюдения bucketing (я не думаю, что установка mapred.reduce.tasks явно необходимо, но я включил его на всякий случай).

set hive.enforce.bucketing=true; 
set mapred.reduce.tasks=4; 

Затем я создаю таблицу «bucketed_users» и загружаю данные в нее.

CREATE TABLE bucketed_users (id INT, name STRING) 
CLUSTERED BY (id) 
SORTED BY (id ASC) INTO 4 BUCKETS; 

INSERT OVERWRITE TABLE bucketed_users SELECT * FROM users; 

Выход:

Total MapReduce jobs = 1 
Launching Job 1 out of 1 
Number of reduce tasks determined at compile time: 4 
In order to change the average load for a reducer (in bytes): 
    set hive.exec.reducers.bytes.per.reducer=<number> 
In order to limit the maximum number of reducers: 
    set hive.exec.reducers.max=<number> 
In order to set a constant number of reducers: 
    set mapred.reduce.tasks=<number> 
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. 
Execution log at: /tmp/katrina/katrina_20131003204949_a56048f5-ab2f-421b-af45-9ec3ff85731c.log 
Job running in-process (local Hadoop) 
Hadoop job information for null: number of mappers: 0; number of reducers: 0 
2013-10-03 20:49:34,011 null map = 0%, reduce = 0% 
2013-10-03 20:49:35,026 null map = 0%, reduce = 100% 
Ended Job = job_local1250355097_0001 
Execution completed successfully 
Mapred Local Task Succeeded . Convert the Join into MapJoin 
Loading data to table records.bucketed_users 
Deleted hdfs://localhost/user/hive/warehouse/records/bucketed_users 
Table records.bucketed_users stats: [num_partitions: 0, num_files: 1, num_rows: 4, total_size: 24, raw_data_size: 20] 
OK 
id name 
Time taken: 8.527 seconds 

данные были загружены в «bucketed_users» правильно (SELECT * FROM bucketed_users показывает всех пользователей), но количество файлов, созданных только 1 (num_files: 1 выше), а не желаемое 4. Глядя на каталог bucketed_users в HDFS (dfs -ls /user/hive/warehouse/records/bucketed_users;), отображается только один файл, 000000_0. Как я могу обеспечить ведение брэкетинга?

Полный лог ниже:

2013-10-03 20:49:30,769 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - Execution log at: /tmp/katrina/katrina_20131003204949_a56048f5-ab2f-421b-af45-9ec3ff85731c.log 
2013-10-03 20:49:31,139 INFO exec.ExecDriver (ExecDriver.java:execute(328)) - Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat 
2013-10-03 20:49:31,144 INFO exec.ExecDriver (ExecDriver.java:execute(350)) - adding libjars: file:///Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar 
2013-10-03 20:49:31,144 INFO exec.ExecDriver (ExecDriver.java:addInputPaths(852)) - Processing alias users 
2013-10-03 20:49:31,145 INFO exec.ExecDriver (ExecDriver.java:addInputPaths(870)) - Adding input file hdfs://localhost/user/hive/warehouse/records/users 
2013-10-03 20:49:31,145 INFO exec.Utilities (Utilities.java:isEmptyPath(1900)) - Content Summary not cached for hdfs://localhost/user/hive/warehouse/records/users 
2013-10-03 20:49:31,365 WARN util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(52)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
2013-10-03 20:49:32,410 INFO exec.ExecDriver (ExecDriver.java:createTmpDirs(219)) - Making Temp Directory: hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/-ext-10000 
2013-10-03 20:49:32,420 WARN mapred.JobClient (JobClient.java:copyAndConfigureFiles(746)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 
2013-10-03 20:49:32,648 WARN snappy.LoadSnappy (LoadSnappy.java:<clinit>(46)) - Snappy native library not loaded 
2013-10-03 20:49:32,655 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit creating pool for hdfs://localhost/user/hive/warehouse/records/users; using filter path hdfs://localhost/user/hive/warehouse/records/users 
2013-10-03 20:49:32,661 INFO mapred.FileInputFormat (FileInputFormat.java:listStatus(199)) - Total input paths to process : 1 
2013-10-03 20:49:32,716 INFO io.CombineHiveInputFormat (CombineHiveInputFormat.java:getSplits(411)) - number of splits 1 
2013-10-03 20:49:32,847 INFO filecache.TrackerDistributedCacheManager (TrackerDistributedCacheManager.java:downloadCacheObject(423)) - Creating hive-builtins-0.10.0.jar in /tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar-work--7485859847513724632 with rwxr-xr-x 
2013-10-03 20:49:32,850 INFO filecache.TrackerDistributedCacheManager (TrackerDistributedCacheManager.java:downloadCacheObject(435)) - Extracting /tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar-work--7485859847513724632/hive-builtins-0.10.0.jar to /tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar-work--7485859847513724632 
2013-10-03 20:49:32,870 INFO filecache.TrackerDistributedCacheManager (TrackerDistributedCacheManager.java:downloadCacheObject(463)) - Cached file:///Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar as /tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar 
2013-10-03 20:49:32,880 INFO filecache.TrackerDistributedCacheManager (TrackerDistributedCacheManager.java:localizePublicCacheObject(486)) - Cached file:///Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar as /tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar 
2013-10-03 20:49:32,987 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - Job running in-process (local Hadoop) 
2013-10-03 20:49:33,034 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(340)) - Waiting for map tasks 
2013-10-03 20:49:33,037 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(204)) - Starting task: attempt_local1250355097_0001_m_000000_0 
2013-10-03 20:49:33,073 INFO mapred.Task (Task.java:initialize(534)) - Using ResourceCalculatorPlugin : null 
2013-10-03 20:49:33,077 INFO mapred.MapTask (MapTask.java:updateJobWithSplit(455)) - Processing split: Paths:/user/hive/warehouse/records/users/users.txt:0+24InputFormatClass: org.apache.hadoop.mapred.TextInputFormat 

2013-10-03 20:49:33,093 INFO io.HiveContextAwareRecordReader (HiveContextAwareRecordReader.java:initIOContext(154)) - Processing file hdfs://localhost/user/hive/warehouse/records/users/users.txt 
2013-10-03 20:49:33,093 INFO mapred.MapTask (MapTask.java:runOldMapper(419)) - numReduceTasks: 1 
2013-10-03 20:49:33,099 INFO mapred.MapTask (MapTask.java:<init>(949)) - io.sort.mb = 100 
2013-10-03 20:49:33,541 INFO mapred.MapTask (MapTask.java:<init>(961)) - data buffer = 79691776/99614720 
2013-10-03 20:49:33,542 INFO mapred.MapTask (MapTask.java:<init>(962)) - record buffer = 262144/327680 
2013-10-03 20:49:33,550 INFO ExecMapper (ExecMapper.java:configure(69)) - maximum memory = 2088435712 
2013-10-03 20:49:33,551 INFO ExecMapper (ExecMapper.java:configure(74)) - conf classpath = [file:/tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar/] 
2013-10-03 20:49:33,551 INFO ExecMapper (ExecMapper.java:configure(76)) - thread classpath = [file:/tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar/] 
2013-10-03 20:49:33,585 INFO exec.MapOperator (MapOperator.java:setChildren(387)) - Adding alias users to work list for file hdfs://localhost/user/hive/warehouse/records/users 
2013-10-03 20:49:33,587 INFO exec.MapOperator (MapOperator.java:setChildren(402)) - dump TS struct<id:int,name:string> 
2013-10-03 20:49:33,588 INFO ExecMapper (ExecMapper.java:configure(91)) - 
<MAP>Id =10 
    <Children> 
    <TS>Id =0 
     <Children> 
     <SEL>Id =1 
      <Children> 
      <RS>Id =2 
       <Parent>Id = 1 null<\Parent> 
      <\RS> 
      <\Children> 
      <Parent>Id = 0 null<\Parent> 
     <\SEL> 
     <\Children> 
     <Parent>Id = 10 null<\Parent> 
    <\TS> 
    <\Children> 
<\MAP> 
2013-10-03 20:49:33,588 INFO exec.MapOperator (Operator.java:initialize(321)) - Initializing Self 10 MAP 
2013-10-03 20:49:33,588 INFO exec.TableScanOperator (Operator.java:initialize(321)) - Initializing Self 0 TS 
2013-10-03 20:49:33,588 INFO exec.TableScanOperator (Operator.java:initializeChildren(386)) - Operator 0 TS initialized 
2013-10-03 20:49:33,589 INFO exec.TableScanOperator (Operator.java:initializeChildren(390)) - Initializing children of 0 TS 
2013-10-03 20:49:33,589 INFO exec.SelectOperator (Operator.java:initialize(425)) - Initializing child 1 SEL 
2013-10-03 20:49:33,589 INFO exec.SelectOperator (Operator.java:initialize(321)) - Initializing Self 1 SEL 
2013-10-03 20:49:33,592 INFO exec.SelectOperator (SelectOperator.java:initializeOp(58)) - SELECT struct<id:int,name:string> 
2013-10-03 20:49:33,594 INFO exec.SelectOperator (Operator.java:initializeChildren(386)) - Operator 1 SEL initialized 
2013-10-03 20:49:33,595 INFO exec.SelectOperator (Operator.java:initializeChildren(390)) - Initializing children of 1 SEL 
2013-10-03 20:49:33,595 INFO exec.ReduceSinkOperator (Operator.java:initialize(425)) - Initializing child 2 RS 
2013-10-03 20:49:33,595 INFO exec.ReduceSinkOperator (Operator.java:initialize(321)) - Initializing Self 2 RS 
2013-10-03 20:49:33,595 INFO exec.ReduceSinkOperator (ReduceSinkOperator.java:initializeOp(112)) - Using tag = -1 
2013-10-03 20:49:33,606 INFO exec.ReduceSinkOperator (Operator.java:initializeChildren(386)) - Operator 2 RS initialized 
2013-10-03 20:49:33,606 INFO exec.ReduceSinkOperator (Operator.java:initialize(361)) - Initialization Done 2 RS 
2013-10-03 20:49:33,606 INFO exec.SelectOperator (Operator.java:initialize(361)) - Initialization Done 1 SEL 
2013-10-03 20:49:33,606 INFO exec.TableScanOperator (Operator.java:initialize(361)) - Initialization Done 0 TS 
2013-10-03 20:49:33,607 INFO exec.MapOperator (Operator.java:initialize(361)) - Initialization Done 10 MAP 
2013-10-03 20:49:33,637 INFO exec.MapOperator (MapOperator.java:cleanUpInputFileChangedOp(494)) - Processing alias users for file hdfs://localhost/user/hive/warehouse/records/users 
2013-10-03 20:49:33,638 INFO exec.MapOperator (Operator.java:forward(774)) - 10 forwarding 1 rows 
2013-10-03 20:49:33,638 INFO exec.TableScanOperator (Operator.java:forward(774)) - 0 forwarding 1 rows 
2013-10-03 20:49:33,639 INFO exec.SelectOperator (Operator.java:forward(774)) - 1 forwarding 1 rows 
2013-10-03 20:49:33,641 INFO ExecMapper (ExecMapper.java:map(148)) - ExecMapper: processing 1 rows: used memory = 114294872 
2013-10-03 20:49:33,642 INFO exec.MapOperator (Operator.java:close(549)) - 10 finished. closing... 
2013-10-03 20:49:33,643 INFO exec.MapOperator (Operator.java:close(555)) - 10 forwarded 4 rows 
2013-10-03 20:49:33,643 INFO exec.MapOperator (Operator.java:logStats(845)) - DESERIALIZE_ERRORS:0 
2013-10-03 20:49:33,643 INFO exec.TableScanOperator (Operator.java:close(549)) - 0 finished. closing... 
2013-10-03 20:49:33,643 INFO exec.TableScanOperator (Operator.java:close(555)) - 0 forwarded 4 rows 
2013-10-03 20:49:33,643 INFO exec.SelectOperator (Operator.java:close(549)) - 1 finished. closing... 
2013-10-03 20:49:33,644 INFO exec.SelectOperator (Operator.java:close(555)) - 1 forwarded 4 rows 
2013-10-03 20:49:33,644 INFO exec.ReduceSinkOperator (Operator.java:close(549)) - 2 finished. closing... 
2013-10-03 20:49:33,644 INFO exec.ReduceSinkOperator (Operator.java:close(555)) - 2 forwarded 0 rows 
2013-10-03 20:49:33,644 INFO exec.SelectOperator (Operator.java:close(570)) - 1 Close done 
2013-10-03 20:49:33,644 INFO exec.TableScanOperator (Operator.java:close(570)) - 0 Close done 
2013-10-03 20:49:33,644 INFO exec.MapOperator (Operator.java:close(570)) - 10 Close done 
2013-10-03 20:49:33,645 INFO ExecMapper (ExecMapper.java:close(215)) - ExecMapper: processed 4 rows: used memory = 114767288 
2013-10-03 20:49:33,647 INFO mapred.MapTask (MapTask.java:flush(1289)) - Starting flush of map output 
2013-10-03 20:49:33,659 INFO mapred.MapTask (MapTask.java:sortAndSpill(1471)) - Finished spill 0 
2013-10-03 20:49:33,661 INFO mapred.Task (Task.java:done(858)) - Task:attempt_local1250355097_0001_m_000000_0 is done. And is in the process of commiting 
2013-10-03 20:49:33,668 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(466)) - hdfs://localhost/user/hive/warehouse/records/users/users.txt:0+24 
2013-10-03 20:49:33,668 INFO mapred.Task (Task.java:sendDone(970)) - Task 'attempt_local1250355097_0001_m_000000_0' done. 
2013-10-03 20:49:33,668 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(229)) - Finishing task: attempt_local1250355097_0001_m_000000_0 
2013-10-03 20:49:33,668 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(348)) - Map task executor complete. 
2013-10-03 20:49:33,680 INFO mapred.Task (Task.java:initialize(534)) - Using ResourceCalculatorPlugin : null 
2013-10-03 20:49:33,680 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(466)) - 
2013-10-03 20:49:33,690 INFO mapred.Merger (Merger.java:merge(408)) - Merging 1 sorted segments 
2013-10-03 20:49:33,695 INFO mapred.Merger (Merger.java:merge(491)) - Down to the last merge-pass, with 1 segments left of total size: 70 bytes 
2013-10-03 20:49:33,695 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(466)) - 
2013-10-03 20:49:33,697 INFO ExecReducer (ExecReducer.java:configure(100)) - maximum memory = 2088435712 
2013-10-03 20:49:33,697 INFO ExecReducer (ExecReducer.java:configure(105)) - conf classpath = [file:/tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar/] 
2013-10-03 20:49:33,697 INFO ExecReducer (ExecReducer.java:configure(107)) - thread classpath = [file:/tmp/hadoop-katrina/mapred/local/76384558/archive/-2634153638864376244_689726567_810621743/file/Users/katrina/Code/hive/hive-0.10.0/lib/hive-builtins-0.10.0.jar/] 
2013-10-03 20:49:33,698 INFO ExecReducer (ExecReducer.java:configure(149)) - 
<OP>Id =3 
    <Children> 
    <FS>Id =4 
     <Parent>Id = 3 null<\Parent> 
    <\FS> 
    <\Children> 
<\OP> 
2013-10-03 20:49:33,698 INFO exec.ExtractOperator (Operator.java:initialize(321)) - Initializing Self 3 OP 
2013-10-03 20:49:33,698 INFO exec.ExtractOperator (Operator.java:initializeChildren(386)) - Operator 3 OP initialized 
2013-10-03 20:49:33,698 INFO exec.ExtractOperator (Operator.java:initializeChildren(390)) - Initializing children of 3 OP 
2013-10-03 20:49:33,698 INFO exec.FileSinkOperator (Operator.java:initialize(425)) - Initializing child 4 FS 
2013-10-03 20:49:33,699 INFO exec.FileSinkOperator (Operator.java:initialize(321)) - Initializing Self 4 FS 
2013-10-03 20:49:33,701 INFO exec.FileSinkOperator (Operator.java:initializeChildren(386)) - Operator 4 FS initialized 
2013-10-03 20:49:33,701 INFO exec.FileSinkOperator (Operator.java:initialize(361)) - Initialization Done 4 FS 
2013-10-03 20:49:33,701 INFO exec.ExtractOperator (Operator.java:initialize(361)) - Initialization Done 3 OP 
2013-10-03 20:49:33,706 INFO ExecReducer (ExecReducer.java:reduce(243)) - ExecReducer: processing 1 rows: used memory = 117749816 
2013-10-03 20:49:33,707 INFO exec.ExtractOperator (Operator.java:forward(774)) - 3 forwarding 1 rows 
2013-10-03 20:49:33,707 INFO exec.FileSinkOperator (FileSinkOperator.java:createBucketFiles(458)) - Final Path: FS hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_tmp.-ext-10000/000000_0 
2013-10-03 20:49:33,707 INFO exec.FileSinkOperator (FileSinkOperator.java:createBucketFiles(460)) - Writing to temp file: FS hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_task_tmp.-ext-10000/_tmp.000000_0 
2013-10-03 20:49:33,707 INFO exec.FileSinkOperator (FileSinkOperator.java:createBucketFiles(481)) - New Final Path: FS hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_tmp.-ext-10000/000000_0 
2013-10-03 20:49:33,737 INFO ExecReducer (ExecReducer.java:close(301)) - ExecReducer: processed 4 rows: used memory = 118477400 
2013-10-03 20:49:33,737 INFO exec.ExtractOperator (Operator.java:close(549)) - 3 finished. closing... 
2013-10-03 20:49:33,737 INFO exec.ExtractOperator (Operator.java:close(555)) - 3 forwarded 4 rows 
2013-10-03 20:49:33,737 INFO exec.FileSinkOperator (Operator.java:close(549)) - 4 finished. closing... 
2013-10-03 20:49:33,737 INFO exec.FileSinkOperator (Operator.java:close(555)) - 4 forwarded 0 rows 
2013-10-03 20:49:33,990 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - Hadoop job information for null: number of mappers: 0; number of reducers: 0 
2013-10-03 20:49:34,011 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - 2013-10-03 20:49:34,011 null map = 0%, reduce = 0% 
2013-10-03 20:49:34,111 INFO jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(137)) - Stats publishing for key hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/-ext-10000/000000 
2013-10-03 20:49:34,143 INFO exec.FileSinkOperator (Operator.java:logStats(845)) - TABLE_ID_1_ROWCOUNT:4 
2013-10-03 20:49:34,143 INFO exec.ExtractOperator (Operator.java:close(570)) - 3 Close done 
2013-10-03 20:49:34,145 INFO mapred.Task (Task.java:done(858)) - Task:attempt_local1250355097_0001_r_000000_0 is done. And is in the process of commiting 
2013-10-03 20:49:34,146 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(466)) - reduce > reduce 
2013-10-03 20:49:34,147 INFO mapred.Task (Task.java:sendDone(970)) - Task 'attempt_local1250355097_0001_r_000000_0' done. 
2013-10-03 20:49:35,026 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - 2013-10-03 20:49:35,026 null map = 0%, reduce = 100% 
2013-10-03 20:49:35,030 INFO exec.ExecDriver (SessionState.java:printInfo(392)) - Ended Job = job_local1250355097_0001 
2013-10-03 20:49:35,033 INFO exec.FileSinkOperator (Utilities.java:mvFileToFinalPath(1361)) - Moving tmp dir: hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_tmp.-ext-10000 to: hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_tmp.-ext-10000.intermediate 
2013-10-03 20:49:35,036 INFO exec.FileSinkOperator (Utilities.java:mvFileToFinalPath(1372)) - Moving tmp dir: hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/_tmp.-ext-10000.intermediate to: hdfs://localhost/tmp/hive-katrina/hive_2013-10-03_20-49-28_110_131412476548383989/-ext-10000 
+0

Bucketing работает для меня с Hadoop 1.2.1 и CDH4, но на Hadoop 2.2.0 Я вижу эту точную проблему. –

ответ

0

Я не могу воспроизвести, что:

hive> INSERT OVERWRITE TABLE bucketed_users SELECT * FROM unbucketed_users; 
Total MapReduce jobs = 1 
Launching Job 1 out of 1 
Number of reduce tasks determined at compile time: 4 
In order to change the average load for a reducer (in bytes): 
    set hive.exec.reducers.bytes.per.reducer=<number> 
In order to limit the maximum number of reducers: 
    set hive.exec.reducers.max=<number> 
In order to set a constant number of reducers: 
    set mapred.reduce.tasks=<number> 
Starting Job = job_1384565454792_0070, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1384565454792_0070/ 
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1384565454792_0070 
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 4 
2013-11-16 05:04:12,290 Stage-1 map = 0%, reduce = 0% 
2013-11-16 05:04:33,868 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.16 sec 
MapReduce Total cumulative CPU time: 7 seconds 160 msec 
Ended Job = job_1384565454792_0070 
Loading data to table default.bucketed_users 
rmr: DEPRECATED: Please use 'rm -r' instead. 
Moved: 'hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/bucketed_users' to trash at: hdfs://sandbox.hortonworks.com:8020/user/hue/.Trash/Current 
Table default.bucketed_users stats: [num_partitions: 0, num_files: 4, num_rows: 0, total_size: 24, raw_data_size: 0] 
MapReduce Jobs Launched: 
Job 0: Map: 1 Reduce: 4 Cumulative CPU: 7.16 sec HDFS Read: 259 HDFS Write: 24 SUCCESS 
Total MapReduce CPU Time Spent: 7 seconds 160 msec 
OK 
Time taken: 19.291 seconds 
hive> dfs -ls /apps/hive/warehouse/bucketed_users;       
Found 4 items 
-rw-r--r-- 3 hue hdfs   12 2013-11-16 05:04 /apps/hive/warehouse/bucketed_users/000000_0 
-rw-r--r-- 3 hue hdfs   0 2013-11-16 05:04 /apps/hive/warehouse/bucketed_users/000001_0 
-rw-r--r-- 3 hue hdfs   6 2013-11-16 05:04 /apps/hive/warehouse/bucketed_users/000002_0 
-rw-r--r-- 3 hue hdfs   6 2013-11-16 05:04 /apps/hive/warehouse/bucketed_users/000003_0 

Это очень странно, что вы видите преобразование в MapJoin, вы не должны видеть, что с вашей в запросе нет объединений. Это действительно ваш запрос? Если вы видите, что я предлагаю:

hive.auto.convert.join=false; 

Если это исправлено, вы должны указать ошибку.

0

Одд, это работает для меня, однако, так как вы указали, что ваша таблица сортируется вы также должны установить

set hive.enforce.sorting=true; 
in addition of 
set hive.enforce.bucketing = true; 

Я интересно, если комбинация ведра/таблицы сортировки и только установив один из Принудительный настрой каким-то образом запутывает его.

Смежные вопросы