Hello Bibek!
Greetings. I reviewed your thread dump report. Here are my observations:
a. High number of GC threads - CPU spike
Your application threads have a high number of Garbage Collection (GC) threads. It has 66 Garbage Collection threads. High GC threads will become counter productive & has potential to cause high CPU spike. Note: You might not have explicityly configured high number of GC threads, however GC threads defaults are picked up based on the number of cores/CPUs present on the device/container. To learn more on how default GC thread count are picked up, you may refer to this post. Reducing GC threads has potential to reduce to CPU consumption.
b. Kafka threads - consuming high CPU
You have uploaded 5 thread dumps. In each thread dump report there is a section called 'CPU consuming threads'. FastThread algorithms makes potential speculative guesstimate to inform you what are the potential threads that are consuming CPU in this section. Here are few potential threads reported in this section that I feel can consume high CPU. To learn how to accurately detect the CPU consuming threads, refer to 'C. Right way to detect CPU' mentioned below.
1. Kafka thread - traversing scala collection, consuming high cpu
kafka-request-handler-34 PRIORITY : 5 THREAD ID : 0X00007FCA9339C000 STATE : RUNNABLE stackTrace: java.lang.Thread.State: RUNNABLE at scala.collection.TraversableOnce$class.nonEmpty(TraversableOnce.scala:111) at scala.collection.AbstractTraversable.nonEmpty(Traversable.scala:104) at scala.collection.generic.Growable$class.loop$1(Growable.scala:52) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:57) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1.apply(MetadataCache.scala:128) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1.apply(MetadataCache.scala:128) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at kafka.server.MetadataCache.getTopicMetadata(MetadataCache.scala:127) at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:1003) at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:1087) at kafka.server.KafkaApis.handle(KafkaApis.scala:116) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at java.lang.Thread.run(Thread.java:748)
2. Kafka Thread - ThreadLocal, consuming high CPU
kafka-request-handler-33 PRIORITY : 5 STATE : RUNNABLE stackTrace: java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) at java.lang.ThreadLocal.get(ThreadLocal.java:163) at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:481) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:249) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at kafka.server.MetadataCache.kafka$server$MetadataCache$$getAliveEndpoint(MetadataCache.scala:118) at kafka.server.MetadataCache$$anonfun$kafka$server$MetadataCache$$getPartitionMetadata$1$$anonfun$apply$1.apply(MetadataCache.scala:76) at kafka.server.MetadataCache$$anonfun$kafka$server$MetadataCache$$getPartitionMetadata$1$$anonfun$apply$1.apply(MetadataCache.scala:73) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.server.MetadataCache$$anonfun$kafka$server$MetadataCache$$getPartitionMetadata$1.apply(MetadataCache.scala:73) at kafka.server.MetadataCache$$anonfun$kafka$server$MetadataCache$$getPartitionMetadata$1.apply(MetadataCache.scala:72) at scala.Option.map(Option.scala:146) at kafka.server.MetadataCache.kafka$server$MetadataCache$$getPartitionMetadata(MetadataCache.scala:72) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1$$anonfun$apply$7.apply(MetadataCache.scala:129) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1$$anonfun$apply$7.apply(MetadataCache.scala:128) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1.apply(MetadataCache.scala:128) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1.apply(MetadataCache.scala:128) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at kafka.server.MetadataCache.getTopicMetadata(MetadataCache.scala:127) at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:1003) at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:1087) at kafka.server.KafkaApis.handle(KafkaApis.scala:116) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at java.lang.Thread.run(Thread.java:748)
3. Kafka thread - traversing scala collection, consuming high CPU
kafka-request-handler-32 PRIORITY : 5 THREAD ID : 0X00007FCA93398800 stackTrace: java.lang.Thread.State: RUNNABLE at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.copyToBuffer(TraversableOnce.scala:275) at scala.collection.AbstractTraversable.copyToBuffer(Traversable.scala:104) at scala.collection.IndexedSeqLike$class.toBuffer(IndexedSeqLike.scala:95) at scala.collection.mutable.ArrayBuffer.toBuffer(ArrayBuffer.scala:48) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1$$anonfun$apply$7$$anonfun$apply$8.apply(MetadataCache.scala:130) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1$$anonfun$apply$7$$anonfun$apply$8.apply(MetadataCache.scala:129) at scala.Option.map(Option.scala:146) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1$$anonfun$apply$7.apply(MetadataCache.scala:129) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1$$anonfun$apply$7.apply(MetadataCache.scala:128) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1.apply(MetadataCache.scala:128) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1.apply(MetadataCache.scala:128) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at kafka.server.MetadataCache.getTopicMetadata(MetadataCache.scala:127) at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:1003) at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:1087) at kafka.server.KafkaApis.handle(KafkaApis.scala:116) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at java.lang.Thread.run(Thread.java:748)
4. Kafka Thread - thread local, consuming high CPU
kafka-request-handler-24 PRIORITY : 5 STATE : RUNNABLE stackTrace: java.lang.Thread.State: RUNNABLE at java.lang.ThreadLocal$ThreadLocalMap.getEntryAfterMiss(ThreadLocal.java:444) at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:419) at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:298) at java.lang.ThreadLocal.get(ThreadLocal.java:163) at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:423) at java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at kafka.server.MetadataCache.kafka$server$MetadataCache$$getAliveEndpoint(MetadataCache.scala:118) at kafka.server.MetadataCache$$anonfun$kafka$server$MetadataCache$$getPartitionMetadata$1$$anonfun$apply$1.apply(MetadataCache.scala:76) at kafka.server.MetadataCache$$anonfun$kafka$server$MetadataCache$$getPartitionMetadata$1$$anonfun$apply$1.apply(MetadataCache.scala:73) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.server.MetadataCache$$anonfun$kafka$server$MetadataCache$$getPartitionMetadata$1.apply(MetadataCache.scala:73) at kafka.server.MetadataCache$$anonfun$kafka$server$MetadataCache$$getPartitionMetadata$1.apply(MetadataCache.scala:72) at scala.Option.map(Option.scala:146) at kafka.server.MetadataCache.kafka$server$MetadataCache$$getPartitionMetadata(MetadataCache.scala:72) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1$$anonfun$apply$7.apply(MetadataCache.scala:129) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1$$anonfun$apply$7.apply(MetadataCache.scala:128) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1.apply(MetadataCache.scala:128) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1.apply(MetadataCache.scala:128) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at kafka.server.MetadataCache.getTopicMetadata(MetadataCache.scala:127) at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:1003) at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:1087) at kafka.server.KafkaApis.handle(KafkaApis.scala:116) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at java.lang.Thread.run(Thread.java:748)
5. Kafka Thread - scala collection, consuming high CPU
kafka-request-handler-14 PRIORITY : 5 STATE : RUNNABLE stackTrace: java.lang.Thread.State: RUNNABLE at scala.collection.generic.Growable$class.loop$1(Growable.scala:53) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:57) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1.apply(MetadataCache.scala:128) at kafka.server.MetadataCache$$anonfun$getTopicMetadata$1.apply(MetadataCache.scala:128) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at kafka.server.MetadataCache.getTopicMetadata(MetadataCache.scala:127) at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:1003) at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:1087) at kafka.server.KafkaApis.handle(KafkaApis.scala:116) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at java.lang.Thread.run(Thread.java:748)
C. Right way to detect CPU
Inorder to accurately point the lines of code causing the CPU spike, you need to analyze not only thread dumps but also 'top -H -p {PID}' command output, where {PID} is your Java application's process Id which is experiencing CPU spike. When you issue this ‘top’ command with given arguments, it will list all the threads running in the application and amount of CPU each one of the thread consume. Once you have both the data, you can identify high CPU consuming thread and lines of code they are executing.
You can use the open source yCrash script which will capture 360-degree application level artifacts (like GC logs, 3 snapshots of thread dumps, heap dumps) and system level artifacts (like top, top -H, netstat, vmstat, iostat, dmesg, diskusage, kernel parameters...). Once you have these data, either you can manually analyze them or upload it to yCrash tool. Tool analyzes all these dataset and generates an instant root cause analysis report pointing out exact line of code causing the CPU spike. Here is more detailed information on how to diagnose high CPU spike.
Edit your Comment