Profile Image
billlf

Full GC cycle (using G1GC) causes a java web app to see intermittent http 503 errors

Currently using openjdk 11.0.14.1, where after a periodic full GC cycle, we are having periodic java app performance interference (java app is Jenkins LTS 2.332.1), where the app's web UI is periodically reporting a http 503 error, as the app's heap begins to grow again after the full GC.  We start the jvm with NO args (taking all the jvm defaults for jdk11).  Any thoughts or suggestions as to why the app is having the frequent/intermittent http 503 errors?  The errors continue to occur for approx 48 hours following the full GC, and then disappear until the next full GC occurs.

 

Report URL - https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjIvMDQvNS8tLWp2bS0wNDA1QS5sb2ctLTE1LTAtMjU=&channel=WEB

  • fullgccycle

  • g1gc

  • javawebapp

  • intermittenthttp503errors

  • jenkinslts

Please Sign In or to post your comment or answer

Profile Image

Ram Lakshmanan

Hello Bill!

 

 Greetings. I reviewed your GC log analysis report. Below is the GC pause time graph from your GC report:

 

 

 I see only two occurances of Full GC one that happened at 05:38pm and another at 06:30pm. That too they were pausing the JVMs only for 1.2 seconds and 1.7 seconds. I suspect this is too small of pause time that can trigger HTTP 503 error (Unless you have attached some other GC log). You may want to confirm whether HTTP 503 errors were happening around the reported time frame. 

 

 HTTP 503 error indicates application is not ready to handle the request. There could be several reasons for this:

 

 

  • Garbage collection pauses
  • Threads getting BLOCKED
  • Network connectivity
  • Load balancer routing issue
  • Heavy CPU consumption of threads
  • Operating System running with old patches
  • Memory Leak
  • DB not responding properly

:

:

 

 So just thread dump is not enough to diagnose the problem. You have captured only thread dump, that too one snapshot of it. It's always a good practice to capture 3 thread dumps in a gap of 10 seconds between each one. Besides thread dumps you might have to capture other logs/artifacts to do thorough analysis.

 

 You can use the open source yCrash script which will capture 360-degree application level artifacts (like GC logs, 3 snapshots of thread dumps, heap dumps) and system level artifacts (like top, top -H, netstat, vmstat, iostat, dmesg, diskusage, kernel parameters...). Once you have these data, either you can manually analyze them or upload it to yCrash tool, which will analyze all these artifacts and generate one unified root cause analysis marrying all these artifacts. It can indicate the root cause of the problem.

 

Profile Image

billlf

 

Hello Ram,

 

I have reviewed your answer to my post and found the suggestions for potential root causes to be very helpful, and will shift my troubleshooting emphasis to some of them, if the GC cycle interference turns out to be a red herring.

 

I followed your suggestion for downloading the open source yCrash script (for windows), successfully executed the script inside the docker container where the java app is running in, which generated a 235MB ZIP file.  I have been trying to upload the ZIP file to the gceasy.io web site for analysis assistance on line, but the upload fails stating the file size is too large.

 

Do you have any suggestions on how to upload for online analysis?

Should the ZIP file be uploaded to gceasy.io or to ycrash.io?

If yCrash.io, does the analysis require a paid subscription?

Profile Image

Ram Lakshmanan

Hello Bill!

 

 Greetings. 

 

 You can sign-in to yCrash from this page. You can use the same GCeasy credentials to login in to yCrash. Basic yCrash tier is free. YOu can use it.

 

 Use this 'Bundle Upload' feature to upload the yc-*zip generated by the script and see the results. Keep us posted if you see any issues. Thanks. 

Profile Image

billlf

Ram,

 

Thanks for the assist.  I was able to upload the ZIP file to the yCrash web site and reviewed the resulting report.  I will check out the application issues found on the summary page as possible root causes to the problem the java app is seeing.  The report has a lot of good/useful information that I did not know about the java app.  So more the moment, I now have all the details about the problem and will need time to analyze and explore.  Thanks again for your support for enabling me to complete the java app analysis.

 

Regards

Got something else on mind? Post Your Question

Not the answer you're looking for? Browse other questions tagged
  • fullgccycle

  • g1gc

  • javawebapp

  • intermittenthttp503errors

  • jenkinslts