Hi Sabyasachi,
Here is my observation on your report:
1) In your application I can see Dump-4
'kafka-coordinator-heartbeat-thread | meta' thread is stuck on park() method in sun.misc.Unsafe file. Before getting stuck, this thread obtained 1 lock (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator lock) and never released it. Due to that 1 thread is BLOCKED as shown in the below screenshot. If threads are BLOCKED for a prolonged period, your application can become unresponsive. Below is the stacktrace of 'kafka-coordinator-heartbeat-thread | meta' thread.
java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000557212f40> (a java.util.concurrent.locks.ReentrantLock$FairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:249) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:304) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:1036) - locked <0x0000000552067b10> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
2) I can see more than 150 threads are waiting state and it has same stacktrace from Dump-6 onwards.
java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:1025) - locked <0x0000000558e00528> (a org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
3) 378 threads were executing sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at the time of capturing thread dump. It can slow down transactions. Examine their stacktrace.
java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005529f47c8> (a io.netty.channel.nio.SelectedSelectionKeySet) - locked <0x00000005529f48b8> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005529f47e0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101) at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68) at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)
Edit your Comment