Profile Image
nathan.miller

sys time greater than user time, but none of the common causes appear to be the issue

I have an issue where I’m running into many occurrences of sys time greater than user time. This happened when we moved an application over to gcp. The application didn’t change, memory settings are all the same, but the every instance appears to be suffering from this problem.

 

I have checked:

the memory on the VM, seems to be enough

Memory for the physical/hypervisor is good

Disk i/o seems good

CPU seems alright

 

 I’m not sure where else to check.

 

Would appreciate any advice anyone can offer.

  • sys

  • systime

  • user time

  • gcp

Please Sign In or to post your comment or answer

Profile Image

Ram Lakshmanan

Hello Nathan!

 

 Good question. 

 

 A Garbage Collection event spends certain amount of time in the JVM layer and certain amount of time in the kernel layer. The time it's spend in the JVM layer is reported as 'user' time and time it spends in kernel layer is reported as 'sys' time.

 

 If 'sys' time is more than 'user' time then it indicates there are certain environmental/kernel issues which is causing the 'sys' time to be higher. This fact is getting very evident since this problem surface only in GCP environment and not in your on-premise data center. There is some problem in your GCP deployment. Glad to see that you are looking at the right places.

  • the memory on the VM
  • Memory for the physical/hypervisor
  • Disk i/o
  • CPU

 Can you also check Kernel parameters?

 

 GCeasy reports the exact timestamps at which 'sys' time was greater than 'user' time. You might want to check the environment behaviour at those exact periods.

 

a. Do you have any APM tools using which you can go back in time and look at the system behaviour. But not all APM tools doesn't give all the environmental data (that we are looking for).

 

b. You can consider configure yCrash open source script in one of the GCP instance to run every 5 minutes (without capturing heap dump), which will capture 360 degree data and not add any overhead. You can analyze the captured output and see what is the environmental issue triggering this problem.

   

 

Profile Image

nathan.miller

I will check the Kernel parameters, anything specific I should lookout for or focus?

We have an APM tool but it's only at the VM level and doesn't provide all the environment data of the physical layer.

Thank you

Profile Image

Ram Lakshmanan

Hello Nathan!

 

 I am not sure of any specific kernel parameter that needs to be compared. If you trigger yCrash script it would capture all the kernel parameters and it can do of all the parameters.

 

 Yes APM tools doesn't provide environmental data of physical layer. yCrash script might help you.

Got something else on mind? Post Your Question

Not the answer you're looking for? Browse other questions tagged
  • sys

  • systime

  • user time

  • gcp