Profile Image
Sabyasachi Sahoo

Why the Threads are going to Waiting State

All Threads are slowing moving towards waiting and they are not getting released. This causes downtime in the system.


Report URL - https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjIvMTEvNC9hcGktOTlkNThmODEtMDc2MS00YjMxLTk3MTItMDM2OGRhZjhkMWQ5LnR4dA==

  • threads waiting state

  • causing downtime

  • Thread leak

  • Too many Kakfa heart beat threads

Please Sign In or to post your comment or answer

Profile Image

Ankita

Hi Sabyasachi,

 

Here is my observation on your report:

 

1) In your application I can see Dump-4

'kafka-coordinator-heartbeat-thread | meta' thread is stuck on park() method in sun.misc.Unsafe file. Before getting stuck, this thread obtained 1 lock (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator lock) and never released it. Due to that 1 thread is BLOCKED as shown in the below screenshot. If threads are BLOCKED for a prolonged period, your application can become unresponsive. Below is the stacktrace of  'kafka-coordinator-heartbeat-thread | meta' thread.

 

 

java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000557212f40> (a java.util.concurrent.locks.ReentrantLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:249)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:304)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:1036)
- locked <0x0000000552067b10> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

 

2) I can see more than 150 threads are waiting state and it has same stacktrace from Dump-6 onwards.

 

java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:1025)
- locked <0x0000000558e00528> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

 

3) 378 threads were executing sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at the time of capturing thread dump. It can slow down transactions. Examine their stacktrace. 

 

java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000005529f47c8> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000005529f48b8> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000005529f47e0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)

 

Profile Image

Ram Lakshmanan

Hello Sahoo!

 

 Greetings. I would like to share few more observations in addition to what Ankita has shared above.

 

a. Thread leak in New Relic?

 Looks like your application is using New Relic, I could see 315 'NewRelicMetricsReporter' threads in TIMED_WAITING state, they all tend to have same stack trace in TIMED_WAITING state and not doing anything. Why there are so many New relic monitoring threads in the application. May be there is a bug in new Relic or you have misconfigured? You might want to check with new relic support team. Below is one of the stack trace of 315 new relic threads:

 

NewRelicMetricsReporter-1
PRIORITY : 5
THREAD ID : 0X00007F7B48E2A800
NATIVE ID : 0X3E07
NATIVE ID (DECIMAL) : 15879
STATE : TIMED_WAITING

stackTrace:
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000559000998> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

 

 

b. Too many Kakfa heart beat threads

 Your application is having 283 kafka heart beat threads (i.e. Kafka-coordinator-heartbeat-thread). I have typically seen only 1 kafka hearbeat thread per JVM instance. It's not clear why you end up having 283 kakfa heart beat thread. You might have to check your Kafka configuration. Below is the stack trace of one of th kafka heart beat thread:

 

kafka-coordinator-heartbeat-thread | ignition-control-flow-1
PRIORITY : 5
THREAD ID : 0X00007F7BB8038000
NATIVE ID : 0X3E16
NATIVE ID (DECIMAL) : 15894
STATE : TIMED_WAITING

stackTrace:
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:1061)
- locked <0x0000000558e00528> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

Got something else on mind? Post Your Question

Not the answer you're looking for? Browse other questions tagged
  • threads waiting state

  • causing downtime

  • Thread leak

  • Too many Kakfa heart beat threads