2015-11-01 3 views
4

У меня есть несколько заданий Samza, которые я хочу запустить. Я могу первыми запустить нормально. Однако вторая работа, похоже, сидит в состоянии ACCEPTED и никогда не переходит в состояние RUNNING, пока я не убью первую работу.Почему работа YARN не переходит в состояние RUNNING?

Вот вид из ПРЯЖИ UI:

YARN UI

Вот детали для второй работы, где вы можете увидеть, ни один узел не было выделено: enter image description here

У меня есть 2 datanodes, поэтому я должен иметь возможность запускать несколько заданий. Вот соответствующий раздел моего yarn-site.xml (единственный другой конфигурации у меня есть в файле делать с HA конфигурации, Zookeeper и т.д.):

<property> 
    <name>yarn.scheduler.minimum-allocation-mb</name> 
    <value>128</value> 
    <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description> 
</property> 
<property> 
    <name>yarn.scheduler.maximum-allocation-mb</name> 
    <value>2048</value> 
    <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description> 
</property> 
<property> 
    <name>yarn.scheduler.minimum-allocation-vcores</name> 
    <value>1</value> 
    <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description> 
</property> 
<property> 
    <name>yarn.scheduler.maximum-allocation-vcores</name> 
    <value>2</value> 
    <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description> 
</property> 
<property> 
    <name>yarn.nodemanager.resource.memory-mb</name> 
    <value>4096</value> 
    <description>Physical memory, in MB, to be made available to running containers</description> 
</property> 
<property> 
    <name>yarn.nodemanager.resource.cpu-vcores</name> 
    <value>4</value> 
    <description>Number of CPU cores that can be allocated for containers.</description> 
</property> 

EDIT:

я могу увидеть в менеджере ресурсов бревна:

2015-11-01 17:47:37,151 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: assignedContainer application attempt=appattempt_1446300861747_0018_000001 container=Container: [ContainerId: container_1446300861747_0018_01_000002, NodeId: yarndata-01:41274, NodeHttpAddress: yarndata-01:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: null, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:1024, vCores:1>, usedCapacity=0.125, absoluteUsedCapacity=0.125, numApps=1, numContainers=1 clusterResource=<memory:8192, vCores:8> 
2015-11-01 17:47:37,151 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting assigned queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:2048, vCores:2>, usedCapacity=0.25, absoluteUsedCapacity=0.25, numApps=1, numContainers=2 
2015-11-01 17:47:37,151 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used=<memory:2048, vCores:2> cluster=<memory:8192, vCores:8> 
2015-11-01 17:47:37,658 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Sending NMToken for nodeId : yarndata-01:41274 for container : container_1446300861747_0018_01_000002 
2015-11-01 17:47:37,659 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1446300861747_0018_01_000002 Container Transitioned from ALLOCATED to ACQUIRED 
2015-11-01 17:47:39,154 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1446300861747_0018_01_000002 Container Transitioned from ACQUIRED to RUNNING 
2015-11-01 17:48:03,821 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new applicationId: 19 
2015-11-01 17:48:04,339 WARN org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific max attempts: 0 for application: 19 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead. 
2015-11-01 17:48:04,339 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 19 submitted by user www-data 
2015-11-01 17:48:04,339 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=www-data IP=192.168.2.81 OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1446300861747_0019 
2015-11-01 17:48:04,340 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1446300861747_0019 
2015-11-01 17:48:04,340 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1446300861747_0019 State change from NEW to NEW_SAVING 
2015-11-01 17:48:04,340 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1446300861747_0019 
2015-11-01 17:48:04,342 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1446300861747_0019 State change from NEW_SAVING to SUBMITTED 
2015-11-01 17:48:04,342 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application added - appId: application_1446300861747_0019 user: www-data leaf-queue of parent: root #applications: 2 
2015-11-01 17:48:04,342 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Accepted application application_1446300861747_0019 from user: www-data, in queue: default 
2015-11-01 17:48:04,343 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1446300861747_0019 State change from SUBMITTED to ACCEPTED 
2015-11-01 17:48:04,343 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1446300861747_0019_000001 
2015-11-01 17:48:04,343 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1446300861747_0019_000001 State change from NEW to SUBMITTED 
2015-11-01 17:48:04,343 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: not starting application as amIfStarted exceeds amLimit 
2015-11-01 17:48:04,343 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application added - appId: application_1446300861747_0019 user: org.apache.hadoo[email protected]202c5cd5, leaf-queue: default #user-pending-applications: 1 #user-active-applications: 1 #queue-pending-applications: 1 #queue-active-applications: 1 

Что я не правильно делаю?

ответ

6

Ответ заключается в том, что диспетчер ресурсов говорил, что ресурсов недостаточно для создания нового контейнера samza и мастера приложения.

Я изменил значение yarn.scheduler.capacity.maximum-am-resource-percent в пределах capacity-scheduler.xml, чтобы быть больше, чем значение по умолчанию 0,1.

documentation для этого параметра состояния:

Maximum percent of resources in the cluster which can be used to run 
application masters i.e. controls number of concurrent running applications. 
Смежные вопросы