Sajal Gupta

Hystrix threadpool is going on waiting state till timeout on ZGC activity

Hystrix thread pool is going on waiting state and doesn't resume once GC activity is done. I am using JRE17 + ZGC .

hystrix
zgc
Hystrix threadpool
waiting state
ZGC activity
safe point duration
thread pause times
cache heavy
high scale system

Please Sign In or Sign Up to post your comment or answer

Oldest
Newest
Likes

Ram Lakshmanan

Hello Sajal!

Greetings.

It's my personnel experience. I have seen Z GC pausing the application threads (in your case Hystrix thread pool threads) to reduce the object creation rate, thus it will in turn reduce the Garbage collection pause time.

I have couple of questions for you:

a. What is your heap size i.e. -Xmx? Z GC is claimed to work better only for large heap size. If your heap size is < 100GB, you can consider using alternate GC algorithms (G1, Shenandoah, even CMS)

b. I am curious to learn, how you are concluding that Hystrix threads aren't resuming even after Z GC event completes. What metrics or data you are using to come to this conclusion.

Sajal Gupta

Hello Ram ,

Thanks for reply , My heap size is 31GBs. ZGC pause time is less than 2 ms in everycase (less than 1 ms in the most of cases) still my hystrix thread pool is getting timedout on 3 secs timeout .

Ram Lakshmanan

Hello Sajal!

I would like to look at safe point duration & thread pause times. Can you share your GCeasy report? You can click on the 'Share Report' hyperlink on the left top corner of the report. It will generate a URL. Can you paste that URL in this thread?

Sajal Gupta

Please find link

https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjIvMDgvMzEvZ2MtMzEtLTMtMTMtMjI=&channel=WEB

Ram Lakshmanan

Hello Sajal!

I did review the GC log. There doesn't seem to be a problem with GC activities. You have excellent GC throughput 99.999%. I suspect hysterix threads are pausing due to some other reason? May there is a

Threads getting BLOCKED
Network connectivity
Load balancer routing issue
Heavy CPU consumption of threads
Operating System running with old patches
DB not responding properly
:

May be you want to capture thread dump and do the analysis. Even better you can using the open source yCrash script which will capture 360-degree application level artifacts (like GC logs, 3 snapshots of thread dumps, heap dumps) and system level artifacts (like top, top -H, netstat, vmstat, iostat, dmesg, diskusage, kernel parameters...). Once you have these data, either you can manually analyze them or upload it to yCrash tool, which will analyze all these artifacts and generate root cause analysis report. It has potential to indicate the root cause of the problem.

Sajal Gupta

One more GC log of different time

https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjIvMDgvMzEvZ2MtMzEuMS0tMTAtMjktMzg=&channel=WEB

System is cache heavy and high scale system and also using hystrix for rate limiting and threadpool management.

Got something else on mind? Post Your Question

Not the answer you're looking for? Browse other questions tagged

hystrix
zgc
Hystrix threadpool
waiting state
ZGC activity
safe point duration
thread pause times
cache heavy
high scale system

Sign In

Hystrix threadpool is going on waiting state till timeout on ZGC activity

hystrix

zgc