Profile Image
Sajal Gupta

Hystrix threadpool is going on waiting state till timeout on ZGC activity

Hystrix thread pool is going on waiting state and doesn't resume once GC activity is done. I am using JRE17 + ZGC .

  • hystrix

  • zgc

  • Hystrix threadpool

  • waiting state

  • ZGC activity

  • safe point duration

  • thread pause times

  • cache heavy

  • high scale system

Please Sign In or to post your comment or answer

Profile Image

Ram Lakshmanan

Hello Sajal!

 

 Greetings.

 

 It's my personnel experience. I have seen Z GC pausing the application threads (in your case Hystrix thread pool threads) to reduce the object creation rate, thus it will in turn reduce the Garbage collection pause time. 

 

 I have couple of questions for you:

 

a. What is your heap size i.e. -Xmx? Z GC is claimed to work better only for large heap size. If your heap size is < 100GB, you can consider using alternate GC algorithms (G1, Shenandoah, even CMS)

 

b. I am curious to learn, how you are concluding that Hystrix threads aren't resuming even after Z GC event completes. What metrics or data you are using to come to this conclusion.

Profile Image

Sajal Gupta

Hello Ram ,

 

Thanks for reply , My heap size is 31GBs. ZGC pause time is less than 2 ms in everycase (less than 1 ms in the most of cases) still my hystrix thread pool is getting timedout on 3 secs timeout .

 

 

Profile Image

Ram Lakshmanan

Hello Sajal!

 

 I would like to look at safe point duration & thread pause times. Can you share your GCeasy report? You can click on the 'Share Report' hyperlink on the left top corner of the report. It will generate a URL. Can you paste that URL in this thread?

Profile Image

Sajal Gupta

Please find link 

 

https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjIvMDgvMzEvZ2MtMzEtLTMtMTMtMjI=&channel=WEB

Profile Image

Ram Lakshmanan

Hello Sajal! 

 

 I did review the GC log. There doesn't seem to be a problem with GC activities. You have excellent GC throughput 99.999%. I suspect hysterix threads are pausing due to some other reason? May there is a 

 

  • Threads getting BLOCKED
  • Network connectivity
  • Load balancer routing issue
  • Heavy CPU consumption of threads
  • Operating System running with old patches
  • DB not responding properly
  • :

 

 May be you want to capture thread dump and do the analysis. Even better you can using the open source yCrash script which will capture 360-degree application level artifacts (like GC logs, 3 snapshots of thread dumps, heap dumps) and system level artifacts (like top, top -H, netstat, vmstat, iostat, dmesg, diskusage, kernel parameters...). Once you have these data, either you can manually analyze them or upload it to yCrash tool, which will analyze all these artifacts and generate root cause analysis report. It has potential to indicate the root cause of the problem.

Profile Image

Sajal Gupta

One more GC log of different time 

 

https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjIvMDgvMzEvZ2MtMzEuMS0tMTAtMjktMzg=&channel=WEB

 

System is cache heavy and high scale system  and also  using hystrix for rate limiting and threadpool management. 

Got something else on mind? Post Your Question

Not the answer you're looking for? Browse other questions tagged
  • hystrix

  • zgc

  • Hystrix threadpool

  • waiting state

  • ZGC activity

  • safe point duration

  • thread pause times

  • cache heavy

  • high scale system