2013-07-08 1 views
0

Мы только что перевели Cassandra 1.1.7 в производство сегодня, но перед этим мы увидели два узла Cassandra с OOM. Мы видели эту ошибку в прошлом в тестах нагрузки и настроили nofiles соответственно, чтобы они не возникали. Также обратите внимание, что эта ошибка произошла, когда на инфраструктуру не было никакой нагрузки.Узел Cassandra Node, увиденная операционная операция OpsCenter полностью заполнена

ERROR [Thread-22] 2013-07-08 16:31:50,905 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-22,5,main] 
java.lang.OutOfMemoryError: unable to create new native thread 
at java.lang.Thread.start0(Native Method) 
at java.lang.Thread.start(Thread.java:640) 
at java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703) 
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:652) 
at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581) 
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155) 
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113) 

Мы не смогли запустить сервер Cassandra обратно (сохраняли ошибку OOM). Только после того, как мы остановили агента OpsCenter (Enterprise 2.1.3), мы смогли запустить резервную копию Cassandra, а затем запустить агент. Ниже находится agent.log, близкий к времени смерти узла Cassandra. Мы видим, что много оперативной очереди операций полностью и операции отбрасываются. Мы также НЕ используем вторичные индексы. Любые мысли приветствуются,

WARN [pool-4-thread-1] 2013-07-08 16:31:41,395 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,396 367168 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,396 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,396 367169 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,396 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,396 367170 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 367171 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 367172 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367173 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367174 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367175 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 367176 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 367177 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367178 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367179 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367180 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 367181 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 367182 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 367183 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 367184 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367185 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367186 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367187 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 367188 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 367189 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,405 367190 operations dropped so far. 
ERROR [Thread-4] 2013-07-08 16:31:45,347 Error when proccessing thrift callme.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level. 
ERROR [pool-5-thread-1] 2013-07-08 16:31:47,793 Error connecting via JMX: java.io.IOException: Cannot run program "cat": java.io.IOException: error=11, Resource temporarily unavailable 
ERROR [Thread-4] 2013-07-08 16:31:50,348 Error when proccessing thrift callme.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level. 
INFO [pool-5-thread-1] 2013-07-08 16:31:52,794 New JMX connection (127.0.0.1:7199) 
ERROR [pool-5-thread-1] 2013-07-08 16:31:52,857 Error connecting via JMX: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: 
java.net.ConnectException: Connection refused] 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,127 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 367191 operations dropped so far. 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 367192 operations dropped so far. 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367193 operations dropped so far. 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367194 operations dropped so far. 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367195 operations dropped so far. 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,130 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,130 367196 operations dropped so far. 
ERROR [Thread-4] 2013-07-08 16:31:55,350 Could not flush transport (to be expected if the pool is shutting down) in close for client: CassandraClient<16.211.56.72:9160-3> 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 367171 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 367172 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367173 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367174 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367175 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 367176 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 367177 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367178 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367179 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367180 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 367181 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 367182 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 367183 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 367184 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367185 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367186 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367187 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 367188 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 367189 operations dropped so far. 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation 
WARN [pool-4-thread-1] 2013-07-08 16:31:41,405 367190 operations dropped so far. 
ERROR [Thread-4] 2013-07-08 16:31:45,347 Error when proccessing thrift callme.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level. 
ERROR [pool-5-thread-1] 2013-07-08 16:31:47,793 Error connecting via JMX: java.io.IOException: Cannot run program "cat": java.io.IOException: error=11, Resource temporarily unavailable 
ERROR [Thread-4] 2013-07-08 16:31:50,348 Error when proccessing thrift callme.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level. 
INFO [pool-5-thread-1] 2013-07-08 16:31:52,794 New JMX connection (127.0.0.1:7199) 
ERROR [pool-5-thread-1] 2013-07-08 16:31:52,857 Error connecting via JMX: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: 
java.net.ConnectException: Connection refused] 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,127 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 367191 operations dropped so far. 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 367192 operations dropped so far. 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367193 operations dropped so far. 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367194 operations dropped so far. 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367195 operations dropped so far. 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,130 Thrift operation queue is full, discarding thrift operation 
WARN [pool-3-thread-4] 2013-07-08 16:31:53,130 367196 operations dropped so far. 
ERROR [Thread-4] 2013-07-08 16:31:55,350 Could not flush transport (to be expected if the pool is shutting down) in close for client: CassandraClient<16.211.56.72:9160-3> 
org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe 
at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147) 
at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156) 
at me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:98) 
at me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:26) 
at me.prettyprint.cassandra.connection.HConnectionManager.closeClient(HConnectionManager.java:311) 
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:260) 
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97) 
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243) 
at clj_hector.core$put.doInvoke(core.clj:164) 
at clojure.lang.RestFn.invoke(RestFn.java:470) 
at opsagent.cassandra$store_rollup.invoke(cassandra.clj:107) 
at clojure.lang.AFn.applyToHelper(AFn.java:161) 
at clojure.lang.AFn.applyTo(AFn.java:151) 
at clojure.core$apply.invoke(core.clj:540) 
at opsagent.cassandra$async_call$fn__582$fn__583.invoke(cassandra.clj:164) 
at opsagent.cassandra$process_queue$fn__587.invoke(cassandra.clj:170) 
at opsagent.cassandra$process_queue.invoke(cassandra.clj:169) 
at opsagent.cassandra$setup_cassandra$fn__595.invoke(cassandra.clj:203) 
at clojure.lang.AFn.run(AFn.java:24) 
at java.lang.Thread.run(Thread.java:662) 
Caused by: java.net.SocketException: Broken pipe 
at java.net.SocketOutputStream.socketWrite0(Native Method) 
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) 
at java.net.SocketOutputStream.write(SocketOutputStream.java:136) 
at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) 
... 19 more 
ERROR [Thread-4] 2013-07-08 16:31:55,351 MARK HOST AS DOWN TRIGGERED for host 16.211.56.72(16.211.56.72):9160 
ERROR [Thread-4] 2013-07-08 16:31:55,351 Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{16.211.56.72(16.211.56.72):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 0; NumBeforeExhausted: 0 
INFO [Thread-4] 2013-07-08 16:31:55,351 Shutdown triggered on <ConcurrentCassandraClientPoolByHost>:{16.211.56.72(16.211.56.72):9160} 
INFO [Thread-4] 2013-07-08 16:31:55,351 Shutdown complete on <ConcurrentCassandraClientPoolByHost>:{16.211.56.72(16.211.56.72):9160} 
INFO [Thread-4] 2013-07-08 16:31:55,352 Host detected as down was added to retry queue: 16.211.56.72(16.211.56.72):9160 
WARN [Thread-4] 2013-07-08 16:31:55,392 Could not fullfill request on this host CassandraClient<16.211.56.72:9160-3> 

ответ

2

«Невозможно создать новую собственную нить» - это ваше курительное пистолет. Вещи, которые помогут включают:

  • Увеличьте ваши ядра пределов резьбы
  • переключателя бережливости в hsha вместо резьбы на-сопп
  • Обновления до последней OpsCenter агента
0

Я наткнулся на это проблемы и сделал несколько изменений и экспериментов (и много чтения, учитывая, что некоторые предложения работали сразу на моем Linux Ubuntu Server, но не на моей машине разработки - Mac OS). Я увеличил ОС «maxfiles», «maxfilesperproc» и т. Д., Как это предлагает большинство людей ... но проблема осталась в Mac OS. Я наконец-то удался избавиться от него, когда (как предложено jbellis тоже) изменила конфигурацию Cassandra (/conf/cassandra.yaml), а именно следующие параметры:

... 
native_transport_min_threads: 16 
native_transport_max_threads: 128 
... 
rpc_server_type: hsha #changed from sync to hsha 
... 
rpc_min_threads: 16 
rpc_max_threads: 2048 
... 

Возможно вам нужно настроить это на свои собственные настройки (в основном число потоков) ... но эти переменные, по-видимому, являются способностью (возможно, вместе с ограничениями максимальных файлов ядра ОС) для решения проблемы «java.lang.OutOfMemoryError: неспособность создать новый собственный поток».

Смежные вопросы