Profile Image
Pooja Mehta

JAVA springboot API showing sudden increase in CPU consumption

Hello,

 

We have an API (JAVA springboot) that shows sudden increase in CPU consumption from less than 10% to 60-70% after a few minutes into the load test. We have captured some memory dumps when CPU shows an increase and analyzing the threads in them, we see a lot of them WAITING at jdk.internal.misc.Unsafe.park(Native Method). I understand that these threads are simply waiting for task to be assigned, but could it be the cause? Could there be a blocking/locking issue with the queue? If so, how do I go from here?

 

Attaching some snapshots from Dynatrace, that shows SSLHandshake as the TOP consumer - 

 

 

 

 

 

Regards,



Report URL - https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjEvMDkvMjcvLS0yLnR4dC0tMTMtNTItNTI7Oy0tMS50eHQtLTEzLTUyLTUy

  • cpu

  • suddencpuincrease

  • api

  • javaspringboot

  • memorydumps

  • waiting

  • blockingissue

  • threaddumps

  • cpuspikes

  • cpuconsumption

  • heapdumps

  • threadswaitingstate

Profile Image

Raja K

By any chance, the timeout used while poll methods of LinkedBlockingQueue in TimeUnit.nanoseconds

Please Sign In or to post your comment or answer

Profile Image

Ram Lakshmanan

Hello Pooja!

 

 Greetings. Here are my initial thoughts:

 

1. Always CPU spikes are caused by threads that are in RUNNABLE state. Other thread states like WAITING, TIMED_WAITING, BLOCKED, NEW, TERMINATED will not cause CPU spike. 

 

2. It looks like you have captured when heap dump capture was in the process through the dynatrace one agent. You should have captured thread dumps before capturing heap dumps. When heap dump is captured, it freezes the JVM - thus we wouldn't what are the actual threads that are causing the problem.

 

3. Some times when garbage collection runs in background that can also cause the CPU to spike up. 

 

 Inorder to accurately point the lines of code causing the CPU spike, you need to analyze not only thread dumps but also 'top -H -p {PID}' command output, where {PID} is your Java application's process Id which is experiencing CPU spike. When you issue this ‘top’ command with given arguments, it will list all the threads running in the application and amount of CPU each one of the thread consume. Once you have both the data, you can identify high CPU consuming thread and lines of code they are executing. You can either capture these artifacts and analyze them manually, or use the 14 day trial version of yCrash tool. yCrash tool automatically captures  application-level data (thread dump, heap dump, Garbage Collection log) and system-level data (netstat, vmstat, iostat, top, top -H, dmesg,...). It marries these two datasets and generates an instant root cause analysis report pointing out exact line of code causing the CPU spike. Here is more information on how to diagnose high CPU spike.

Profile Image

Pooja Mehta

Hi Ram,

 

Thanks for your reply.

The issue is that the service runs on Fargate in AWS so there is no way at moment I can connect to the task and manually capture thread dumps. The only feasible way is through Dynatrace and it gives the option of memory dumps only.

Profile Image

Ram Lakshmanan

Hello Pooja!

 

 I see. If it is the case, can you capture thread dumps separately and few minutes later capture heap dumps and analyze them. Don't capture them together like it was done last earlier. 

 

 a. You can analyze the captured thread dumps in fastThread.

 

 b. You can use Eclipse MAT or HeapHero to analyze heap dumps. Here is a video clip that my assist you in analyzing heap dumps in Eclipse MAT.

 

 You can also check with amazon support whether there is a possibility to trigger shell script on demand in the fargate service? Let me also check whether there are any such options. 

Profile Image

Pooja Mehta

Hi Ram, 

 

Thanks a lot for the pointers, I will check those today.

Meanwhile, what stands out as a hotspot wrt. CPU consumption is sun.security.rsa as shown in the snip below - 

When CPU spikes to 60% after 20 mins into the test, https-jsse-nio-*-exec threads consume the most CPU and this stacktrace shows the above methods in stacktrace.

Do you have any pointers around this?

 

Regards,

 

Profile Image

Ram Lakshmanan

Hello Pooja!

 

 Screenshot of data that you have shared indicates CPU consumption is originating because of SSL handshake. But apparently SSL handshakes shouldn't consume this much CPU. Here is an excellent article, which summarizes that when Google switched their gmail service from http to https, it added the overhead of only 1% CPU. So I can think of only 2 reasons why SSL (i.e. https) to add so much CPU overhead:

 

a. Your connections are not persistent. Either your load testing tool or your application or AWS Fargate service is not keeping the HTTPS connections persistent. It's turning down the connections after every transaction. 

b. I have seen monitoring tools reporting wrong methods as CPU suspects.

 

Most likely it could #a in your case. Check your HTTPS connection settings. 

 

Thanks.

Profile Image

Pooja Mehta

Hi Ram,

 

These are TCP connections to the database. We do have connection pooling enabled using HikariCP.

hikari.connectionTimeout=2000

hikari.initializationFailTimeout=0

hikari.maximumPoolSize=5

 

We have tried running the test with different PoolSize but CPU issue still persists.

Can it be related to any other configuration that is missing?

Profile Image

Ram Lakshmanan

Hello Pooja!

 How do you say these are connections to Database? Can't it be connections from your load test tool to the application server? What is your application server? You need to make "keep-alive" parameter is ON.

Profile Image

Pooja Mehta

https://heaphero.io/my-heap-report.jsp?p=YXJjaGl2ZWQvMjAyMS8xMC8xLy0tZHVtcC5ocHJvZi03LTIwLTEuanNvbg==

Profile Image

Ram Lakshmanan

Heap dump looks normal.

Profile Image

Pooja Mehta

Hi Ram,

 

Thanks for the reply. Application server is Tomcat and DB is Postgressql.

I was referring to DB connections as while looking at the hotspots, I found this stack where we can see that socketinputstream.socketread0 takes considerable network time, which doesn't look normal. Mostly I have seen this time under 1%.

 

 

Profile Image

Pooja Mehta

Here, if you see even the SSL Socket Impl is taking 15.9% of the overall time. 

This behavior is specific to only 1 query. When I check the query execution time, it is between 4-5 ms.

Profile Image

Ram Lakshmanan

Hello Pooja!

 

 This reporting is confusing to me. I thought they are reporting CPU utilization time. But now are you saying they are reporting response time? When we are debuggin CPU spike we need to focus on CPU time and not on the response time.

 

Profile Image

Pooja Mehta

Hi Ram,

 

Sorry for the confusion. Both of these issues are from the same test run, so I wanted to highlight it.

We were able to solve the CPU issue to some extent by changing the logging levels, increasing the connection pool size and doing some changes at the code level where a lot of mapper objects were getting created.

However, the performance is still not as per the expectations as I think the long time spent on Socket.read() operation as well as SSL as highlighted in the snip below is the cause.

At moment, we are not able to find any reason why it could be taking so long, when it's not a big query that returns a huge chunk of data.

We also see a similar behavior with commitTransaction, where if I drill down shows the same trace where SockerRead0 is consuming significant time.

 

What is more peculiar is that this is not seen with all the SQLs.

Profile Image

Jerome

Hi Pooja,

         we are encountering the similar issue in one of our kafka producer app. I appreciate if elaborate your fix to resolve the sudden CPU spike. Thanks

Got something else on mind? Post Your Question

Not the answer you're looking for? Browse other questions tagged
  • cpu

  • suddencpuincrease

  • api

  • javaspringboot

  • memorydumps

  • waiting

  • blockingissue

  • threaddumps

  • cpuspikes

  • cpuconsumption

  • heapdumps

  • threadswaitingstate