2014-11-15 2 views
0

У меня есть кластер hbase (0.96.1.1-cdh5.0.2) на AWS, управляемый Cloudera с 4 региональными серверами и 1 сервером zookeeper. Сервер zookeeper работает на том же хосте, что и хозяин hbase. Проблема, с которой я сталкиваюсь, заключается в том, что серверы 3/4 региона недоступны, потому что они не могут подключиться к zookeeper. Единственным региональным сервером, который остается на прежнем уровне, является тот, который работает на том же хосте, что и мастер и zookeeper. Ниже приведен соответствующий раздел одного из журналов сервера сбоев.ConnectionLossException: KeeperErrorCode = ConnectionLoss for/hbase/master

2014-11-14 15:46:59,871 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=ip-10-146-188-157.ec2.internal:2181 sessionTimeout=60000 watcher=regionserver:60020,  quorum=ip-10-146-188-157.ec2.internal:2181, baseZNode=/hbase 
2014-11-14 15:46:59,915 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=regionserver:60020 connecting to ZooKeeper ensemble=ip-10-146-188-157.ec2.internal:2181 
2014-11-14 15:46:59,920 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181. Will not attempt to authenticate using SASL (unknown error) 
2014-11-14 15:47:00,649 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Installed shutdown hook thread: Shutdownhook:regionserver60020 
2014-11-14 15:47:59,948 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60041ms for sessionid 0x0, closing socket connection and attempting reconnect 
2014-11-14 15:48:00,067 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 
2014-11-14 15:48:00,072 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 1000ms before retry #0... 
2014-11-14 15:48:01,067 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181. Will not attempt to authenticate using SASL (unknown error) 
2014-11-14 15:49:00,123 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60057ms for sessionid 0x0, closing socket connection and attempting reconnect 
2014-11-14 15:49:00,224 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 
2014-11-14 15:49:00,224 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms before retry #1... 
2014-11-14 15:49:01,224 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181. Will not attempt to authenticate using SASL (unknown error) 
2014-11-14 15:50:00,259 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60035ms for sessionid 0x0, closing socket connection and attempting reconnect 
2014-11-14 15:50:00,360 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 
2014-11-14 15:50:00,360 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 4000ms before retry #2... 
2014-11-14 15:50:01,360 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181. Will not attempt to authenticate using SASL (unknown error) 
2014-11-14 15:51:00,408 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60048ms for sessionid 0x0, closing socket connection and attempting reconnect 
2014-11-14 15:51:00,509 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 
2014-11-14 15:51:00,509 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 8000ms before retry #3... 
2014-11-14 15:51:01,509 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server ip-10-146-188-157.ec2.internal/10.146.188.157:2181. Will not attempt to authenticate using SASL (unknown error) 
2014-11-14 15:52:00,559 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 60051ms for sessionid 0x0, closing socket connection and attempting reconnect 
2014-11-14 15:52:00,659 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=ip-10-146-188-157.ec2.internal:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 
2014-11-14 15:52:00,660 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 attempts 
2014-11-14 15:52:00,661 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020, quorum=ip-10-146-188-157.ec2.internal:2181, baseZNode=/hbase Unable to set watcher on znode /hbase/master 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) 
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) 
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) 
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199) 
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425) 
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77) 
    at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:671) 
    at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:644) 
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:772) 
    at java.lang.Thread.run(Thread.java:744) 
2014-11-14 15:52:00,687 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020, quorum=ip-10-146-188-157.ec2.internal:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) 
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) 
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) 
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199) 
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425) 
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77) 
    at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:671) 
    at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:644) 
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:772) 
    at java.lang.Thread.run(Thread.java:744) 
2014-11-14 15:52:00,692 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 0.0.0.0,60020,1415998019646: Unexpected exception during initialization, aborting 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) 
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) 
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) 
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199) 
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425) 
    at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77) 
    at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:671) 
    at  org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:644) 
    at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:772) 
    at java.lang.Thread.run(Thread.java:744) 

Я подозреваю, что это может быть связано с конфигурацией и т.д.// хостов, но не могу понять, в чем проблема./И т.д./хостов для каждого из экземпляров в кластере:

127.0.0.1    localhost.localdomain localhost 
::1    localhost6.localdomain6 localhost6 

HBase-site.xml фракция дело с Zookeeper есть.

<property> 
    <name>zookeeper.znode.parent</name> 
    <value>/hbase</value> 
</property> 
<property> 
    <name>zookeeper.znode.rootserver</name> 
    <value>root-region-server</value> 
</property> 
<property> 
    <name>hbase.zookeeper.quorum</name> 
    <value>ip-10-146-188-157.ec2.internal</value> 
</property> 
<property> 
    <name>hbase.zookeeper.property.clientPort</name> 
    <value>2181</value> 
</property> 

Любая помощь будет принята с благодарностью.

ответ

0

Вы предоставили полное доменное имя своим хост-машинам? Если нет, то дайте его и попробуйте изменить соответствующий экземпляр «localhost» или ip с полным доменным именем в файлах конфигурации.

+0

Также разместите здесь конфигурационный файл zookeeper. – Vikas

Смежные вопросы