2

I am actually currently investigating long running connections of a Java/Tomcat based Web application. After ruling out any internal or application based reasons, I am now down to the network layer. The reason why I am investigating this issue is that we have seemingly random spikes in our response time monitoring. While investigating, I found that this behavior is not so random at all, but triggered by certain client HTTP Requests. The special thing about those connections is that they all originate from the same IP address and seem to use a Bluecoat Proxy, because I see a x-bluecoat-via HTTP header.

As I said, the application itself performs normally, only the end of the connection (from Tomcat's point of view) seems to be somehow delayed. The server does not talk directly to the client but is behind an F5 Loadbalancer which should actually cache the answers (which might not happen because of a accept-encoding identity header and the actual response being to large for the buffer).

I got a TCP dump, due to an unfortunate mistake I currently only see packages from the LB to the appserver, not the actual packages send from the appserver.

The dump contains multiple requests on the same TCP/IP connection, which is due to connection pooling done by the F5. The last HTTP-Request on this connection is the actual connection that was flagged as long running (925836.442ms) in our logging. What I see is the request packets, a series of ACKs which leads me to believe that the appserver is writing it's answer and then finally two FIN, ACK packages followed by a RST, ACK which is the last packet send by the F5.

From a timing point of view this all happens in the course of 250ms, the last packet is send 15 Minutes and 13 Seconds before I see the response log on the appserver which is written after the response is believed to be finished by Tomcat.

I'm kind of out of Ideas at the moment and have a couple of open questions:

Is there any reason Linux would keep a connection open that has received a RST and not tell the application layer?

Is there any other timeout that could lead to this behavior? If this would be the TCP retransmit timeout, I would see more RSTs from the LB.

Any other idea why a closed connection on the wire would lead to a still open connection in the application layer?

How can something that happens in application layer (special HTTP Request) lead to a reproducible behavior in the transport layer?

Maybe I'm completely on the wrong track and this is a connection keep-alive issue inside Tomcat?

2
  • I guess you should change your approach; as you said, the problem seems to be for certain clients. You should see what's going on between clients (Proxy Server) and LB. Since LB may cache some things, monitoring traffic between LB and Web server doesn't make sense to me. Though if you want Linux server to close dead-like connections, you can play with tcp_keepalive parameters in /proc/sys/net/ipv4/
    – Mehdi
    Commented Sep 13, 2013 at 14:56
  • You need to get simultaneous captures at various points of the communication. On the F5 you can get front and backend captures and actually you can do the same on a Bluecoat. I've got the feeling you're not sending keepalives on that TCP connection and a device in between client and server (e.g. F5, BlueCoat, a firewall, etc) is silently removing the connection from its table, hence dropping any further packets and leading to an eventual timeout. Commented Feb 15, 2015 at 11:48

1 Answer 1

0

I can't really help on the networking layer, but on the Tomcat there are several places where you could configure that http://tomcat.apache.org/connectors-doc/reference/workers.html . You could try to overwrite the time-out and configure it to close the connection after a certain amount of time .

On the link you also have load balancer configurations which might be helpfull in your scenario .

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .