Hello Pedro!
I could see cluster of GC pauses happening in the duration you have pointed out. That too each GC pause is taking several seconds to complete. See the GC Pause Duration Graph from your report:
In the same time window, please note the Heap usage graph below. From the graph you can infer that there is a very high heap consumption. This high consumption can happen when one of the following occurs:
a. Number of incoming traffic to your application increased
b. Batch Job started to run
c. OR same amount of traffic started was coming, however some transactions might demand more work load. Say for example, I used to work for a bank in the past. Most customer has 2 - 4 bank accounts. However there are outliers, who can around 250 accounts. When such outlier customers starts to use the online banking application, our system will get stressed out.
Given the situation, here are some solutions you can pursue:
1. Normalize the traffic:
a. If spike in the traffic seems to be case, then consider whether you can normalize the traffic (i.e. trying to run the Batch Job with less threads so that application traffic volume can be reduced).
b. Try to add more JVMs during the peak traffic volume time period so that traffic can be reduced on the per JVM level
2. Switch from CMS GC algorithm:
You are currently running with CMS GC algorithm. CMS GC algorithm has been removed from Java 14. Also it's known for long GC pauses. Thus you may consider swtiching from CMS to G1 and see whether handles the pauses gracefully.
Edit your Comment