Profile Image
ISHITA VIRMANI

Will the problem be solved if I increase the thread pool max value which is currently 400

Our zeppelin process often gets stuck with the kind of heap dump attached along-with, line #55 of org.apache.zeppelin.interpreter.remote.PooledRemoteClient file in getClient() method is blocking 396 threads. Is this the reason behind our zeppelin server getting stuck. Or am I looking in the wrong direction


Report URL - https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjQvMDcvMTkvMjA1NTVfMjAyNC0wNy0xNi0xMjozMDowMS5vdXQtLTctNTItMjE=

    Please Sign In or to post your comment or answer

    Profile Image

    Kousika M

    Hello Ishita Virmani,

    Greetings!

    >>Will the problem be solved if I increase the thread pool max value which is currently 400

    Increasing the thread pool value cannot solve the problem you are facing now. 


    On reveiwing your report I could see almost 40% (i.e) 396 threads are in blocked state. When threads are blocked for prolonged period, your application will become unresponsive. 

    'qtp1906565212-13756' thread is stuck on socketConnect() method in java.net.PlainSocketImpl file. Before getting stuck, this thread obtained 2 locks due to which

    396 threads are BLOCKED as shown in the below screenshot.

     

     

     

    Please find the stacktrace of the blocking thread 'qtp1906565212-13756' below from which you can identify the lines of code that is causing the performance problems in your application.

     

    qtp1906565212-13756 Stack Trace is:
    java.lang.Thread.State: RUNNABLE
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    - locked <0x000000071978b570> (a java.net.SocksSocketImpl)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.lambda$new$0(RemoteInterpreterProcess.java:58)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess$$Lambda$339/1765457639.getWithIO(Unknown Source)
    at org.apache.zeppelin.interpreter.remote.RemoteClientFactory.create(RemoteClientFactory.java:52)
    at org.apache.zeppelin.interpreter.remote.RemoteClientFactory.create(RemoteClientFactory.java:31)
    at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)
    at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)
    at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
    at org.apache.zeppelin.interpreter.remote.PooledRemoteClient.getClient(PooledRemoteClient.java:55)
    - locked <0x00000004b391b5a8> (a org.apache.zeppelin.interpreter.remote.PooledRemoteClient)
    at org.apache.zeppelin.interpreter.remote.PooledRemoteClient.callRemoteFunction(PooledRemoteClient.java:99)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:100)
    at org.apache.zeppelin.interpreter.InterpreterSettingManager.getAllResourcesExcept(InterpreterSettingManager.java:702)
    at org.apache.zeppelin.interpreter.InterpreterSettingManager.getAllResources(InterpreterSettingManager.java:684)
    at org.apache.zeppelin.helium.Helium.suggestApp(Helium.java:397)
    at org.apache.zeppelin.rest.HeliumRestApi.suggest(HeliumRestApi.java:141)
    at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$$Lambda$229/1434527698.invoke(Unknown Source)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80)
    at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:763)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1651)
    at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
    at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
    at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
    at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
    at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
    at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
    at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
    at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
    at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
    at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:450)
    at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
    at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
    at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
    at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:387)
    at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
    at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1638)
    at org.apache.zeppelin.server.CorsFilter.doFilter(CorsFilter.java:64)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1638)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:567)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1377)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:507)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1292)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
    at io.micrometer.core.instrument.binder.jetty.TimedHandler.handle(TimedHandler.java:120)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:501)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
    at org.eclipse.jetty.server.HttpChannel$$Lambda$287/539823937.dispatch(Unknown Source)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:540)
    at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:395)
    at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:161)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
    at java.lang.Thread.run(Thread.java:748)


    Thanks.

    Profile Image

    ISHITA VIRMANI

    Hello Kousika M,

     

    Thanks for your response.

     

    You are right, until we do not identify the faulty socket connection, no matter how many threads we keep on increasing within the pool, this will create a deadlock. And I was looking in that direction as well, but since the stack trace has too many components mentioned, couldn't find out the code piece initiating this socket connection. 

     

    Support of any kind would be highly appreciated. We are using open source apache zeppelin.

     

    Profile Image

    Ram Lakshmanan

    Hello Ishita!

     You are looking at the right direction only: line #55 of org.apache.zeppelin.interpreter.remote.PooledRemoteClient file in getClient() method is blocking 396 threads. This is a synchronized method. Looks like the thread which is enters this method is making a call to external system. There it's taking quite some time to return. 

     

    Since it's a synchronized method, only one thread will be allowed to enter this method. Thus any other thread that is trying to enter this method is getting stuck. You need to investigate the call to the external system is taking such a long time. You need to check whether you have instrumented time outs for this external system call. 

    Got something else on mind? Post Your Question