Bhagyac

We have a Spring Batch job currently having long running issue in production. On checking GC logs we have found that too many Full GC's occurring frequently. We are using G1GC collector and Java 8 version.

Below are the JVM arguments:

-Xms6g -Xmx20g -XX:NewSize=3g -XX:+UseG1GC -Xloggc:/local/apps/stock.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:-HeapDumpOnOutOfMemoryError

GC log file is analyzed using GCViewer tool, below is the report:

GC Pauses:

GC cleanup
GC pause (G1 Evacuation Pause) (mixed)
GC pause (G1 Evacuation Pause) (mixed) (to-space exhausted)
GC pause (G1 Evacuation Pause) (young)
GC pause (G1 Evacuation Pause) (young) (initial-mark)
GC pause (G1 Evacuation Pause) (young) (to-space exhausted)
GC pause (G1 Humongous Allocation) (young)
GC pause (G1 Humongous Allocation) (young) (initial-mark)
GC pause (G1 Humongous Allocation) (young) (to-space exhausted) 10.GC pause (GCLocker Initiated GC) (young)
GC pause (Metadata GC Threshold) (young) (initial-mark)
GC remark;GC ref-proc

Full GC Pauses:

Full GC (Allocation Failure);Eden;Metaspace

Job Runtime: 30 minutes

JRE: 1.8.0_45-b14

Reproducing the Full GC issue in test environment is not possible, so on the basis of report I am trying to resolve this issue. After going through GC log report and various blogs on G1 collector, I am assuming that below GC tuning is required and need to add/modify below parameters.

Remove -XX:NewSize=3g
Add -XX:+DisableExplicitGC to disabled System.gc() calls.
Increase heap size (Xmx) and setting Xms and Xmx to same value.
Add -XX:+UseStringDeduplication -XX:+PrintStringDeduplicationStatistics
Set -XX:MaxGCPauseMillis to high value to maximize the throughput.

Can you suggest if any more parameters need to add and what should be the correct value for these parameters or how to decide correct parameters and values?

G1GC performance tuning

Below are the JVM arguments:

-Xms6g -Xmx20g -XX:NewSize=3g -XX:+UseG1GC -Xloggc:/local/apps/stock.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:-HeapDumpOnOutOfMemoryError

GC log file is analyzed using GCViewer tool, below is the report:

GC Pauses:

GC cleanup
GC pause (G1 Evacuation Pause) (mixed)
GC pause (G1 Evacuation Pause) (mixed) (to-space exhausted)
GC pause (G1 Evacuation Pause) (young)
GC pause (G1 Evacuation Pause) (young) (initial-mark)
GC pause (G1 Evacuation Pause) (young) (to-space exhausted)
GC pause (G1 Humongous Allocation) (young)
GC pause (G1 Humongous Allocation) (young) (initial-mark)
GC pause (G1 Humongous Allocation) (young) (to-space exhausted) 10.GC pause (GCLocker Initiated GC) (young)
GC pause (Metadata GC Threshold) (young) (initial-mark)
GC remark;GC ref-proc

Full GC Pauses:

Full GC (Allocation Failure);Eden;Metaspace

Job Runtime: 30 minutes

JRE: 1.8.0_45-b14

Remove -XX:NewSize=3g
Add -XX:+DisableExplicitGC to disabled System.gc() calls.
Increase heap size (Xmx) and setting Xms and Xmx to same value.
Add -XX:+UseStringDeduplication -XX:+PrintStringDeduplicationStatistics
Set -XX:MaxGCPauseMillis to high value to maximize the throughput.

Can you suggest if any more parameters need to add and what should be the correct value for these parameters or how to decide correct parameters and values?

g1gc
java8
jvm
g1gcperformancetuning
springbatchjob
toomanyfullgcsoccuring
g1gccollector

Please Sign In or Sign Up to post your comment or answer

Oldest
Newest
Likes

Charlie Arehart

While you await a reply from the ycrash folks, I'll offer some thoughts based on experience, not reviewing anything more than what you offered in your note.

I don't think fiddling with jvm knobs is the answer. Gc's are a reflection of heap being used. Usually it's a simple matter of picking the right heap size for your app--especially for a job, doing essentially "one thing".

Before considering RAISING the Heap, there's always the possibility that it's spending time doing Gc's simply because you made the Heap much higher than needed and the jvm lazily used it. So first, have you run it with far less heap and had oom heap errors?

Second, you haven't said how long this job runs, or what jvm version it is. Both can influence how to analyze things. (Sorry if those are indicated somewhere that I missed them. On a phone, once we're answering, we can't easily see the original note.

Third, if it does need more heap, it may run better with even just 1gb more (seriously), or it may need 5g more, or 10 or 20. No none can say, only testing will determine what the app needs.

Finally, another way to go about resolving things is to lower the Heap requirements of the app. That can be harder, of course. (For some, doubling the Heap and running it may work and be trivial. Others may not have that available memory. Or doubling may not be enough.)

In that case, you'd need to find out WHAT in the app is a) using so much heap, and b) holding on to it so that it can't be gc'ed. Again, that is harder and also needs to be assessed at points within the processing of the app, as things could change dramatically over time during processing.

Charlie Arehart

Anyone else have thoughts? Bhagyashri, or ycrash folks?

Ram Lakshmanan

Charlie - you can see my observations below

Charlie Arehart

Yes, Ram. Thanks for hopping in. Glad I was able to get

Bhagyashri a bit more attention. As for your response, I'll say that while it's detailed in some ways other than mine, it seems we both came to the approximately same conclusion. :-) Will look forward to seeing how things may sort out for them here.

Bhagyac

Hi Charlie, thanks for your reply.

I have updated a question with job runtime and Java version.

Ram Lakshmanan

Hello BhagyaShri!

In the metrics you have shared, I could see your application's GC throughput is reported as 59.6%. This is a very poor throughput. It indicates your application is spending 59.6% in processing customer transactions and remaining 40.4% of time in doing Garbage Collection. It means in 1 day, your application is spending close to 10 hours in garbage collection.

This kind of poor throughput can happen only under two circumstances:

a. Your application is suffering from memory leak

b. Your application is suffering from consecutive Full GC problem

I recommend you to read 'Pattern #4 Consecutive Full GC pattern' and 'Pattern #5 Memory Leak Pattern' in this blog post. For these two situations, just adjusting JVM arguments that you have mentioned will not work. Then what is the solution:

a. Solution to Memory Leak:

You can use root cause analysis tools like yCrash - which automatically captures application-level data (thread dump, heap dump, Garbage Collection log) and system-level data (netstat, vmstat, iostat, top, top -H, dmesg, kernel parameters…). Besides capturing the data automatically, it marries these two datasets and generates an instant root cause analysis report. Below is the report generated by the yCrash tool when the above sample program is executed:

Fig: yCrash tool pointing out the root cause of OutOfMemoryError

You can notice the yCrash tool precisely pointing out the root cause of memory leak. You can register here to get a 14 day trial of yCrash.

b. Addressing Consecutive Full GCs

Full GCs Consecutively run because of lack of memory. May be your application's traffic volume has grown and your application isn't able to cope up with currently allocated memory. This can be addressed by increasing -Xmx. For more details refer here

Bhagyac

Hi Ram, thanks for your reply.

Bhagyac

For memory leak issue: I will try the solution you have provided.
Regarding Full GC, I have somehow tried reproducing Full GC issue in test environment and did below experiments to check the throughput.

As per below report #18 showing a good throughput. Could you please suggest, Is it good to go ahead with these settings?

Report after reproducing issue :

Experiments:

Any suggestions would be appreciated.

Thanks

Ram Lakshmanan

Hello Bagya!

Greetings. Very good to see you conducting such exhaustive test scenarios. I will go with settings #10. Because it has best GC throughput and good pause time characteristics.

But however I assume you have conducted these tests in your test lab, simulating synthetic traffic. Always real production traffic would be way different from synthetic traffic. So I would recommend roll out this setting to partial set of servers in production and observe the GC behaviour for 24 hours - so that you can see both high tide traffic and low tide traffic and then propogate the changes to all servers in production.

Good progress. keep it up.

Got something else on mind? Post Your Question

Not the answer you're looking for? Browse other questions tagged

g1gc
java8
jvm
g1gcperformancetuning
springbatchjob
toomanyfullgcsoccuring
g1gccollector

Sign In

G1GC performance tuning

g1gc

java8

jvm