HTTP Server responsiveness

I'm serving an HTML page consisting of static resources compiled into my application (no FileX) and some callbacks which are called by Javascript on the page to fill in some dynamic values.

I noticed that the server can become unresponsive for some time, usually 1-2 minutes or so when under high load, e.g. by repeatedly pressing F5. The client will keep retransmitting packets for some time which are not acknowledged by the server (monitored with Wireshark).

The HTTP server then reports some connection failures in its nx_http_server_connection_failures property. After waiting for said time the server is responsive again.

The strange thing is that I only observed this behaviour on three notebooks running Windows 10/Firefox, on a Linux PC/Firefox I can not reproduce this behaviour and I can keep hitting F5 without disturbing the communication. Two notebooks are from our company, one from a university, so it seems unlikely that this is caused by specific settings on these devices. On the Linux PC the connections get cancelled by sending RST to the server for the outdated connections, which I cannot observe on the Windows devices when under high load.

Anything I can do about this? Maybe decrease the chance of this happening, or decrease the recovery time? I already tried to play with some of the webserver settings, e.g. timeouts max connections in queue, ... without too much success.

Below the webserver statistics after some testing on both devices:

{
    "server": {
        "connections pending": 0,
        "allocation errors": 0,
        "connection failures": 333,
        "connection successes": 2671,
        "get requests": 2661,
        "Invalid HTTP headers": 6,
        "total bytes received": 0,
        "total bytes sent": 0,
        "unknown requests": 0
    },
    "ip": {
        "invalid packets": 0,
        "invalid transmit packets": 0,
        "packets forwarded": 0,
        "packets reassembled": 0,
        "raw packet suspended count": 0,
        "raw received packet count": 0,
        "reassembly failures": 0,
        "receive checksum errors": 0,
        "receive packets dropped": 131,
        "send packets dropped": 60,
        "successful fragment requests": 0,
        "TCP active connections": 6,
        "TCP bytes received": 1571584,
        "TCP bytes sent": 8172739,
        "TCP checksum errors": 0,
        "TCP connections": 3043,
        "TCP connections dropped": 11,
        "TCP created sockets count": 1,
        "TCP disconnections": 2983,
        "TCP created sockets count": 1,
        "TCP invalid packets": 0,
        "TCP packets received": 3246,
        "TCP packets sent": 6834,
        "TCP passive connections": 3048,
        "TCP receive packets dropped": 58,
        "TCP received packets count": 0,
        "TCP resets received": 497,
        "TCP resets sent": 9,
        "TCP retransmit packets": 30
    }
}

HTTP Server configuration:

IP configuration:

Shared packet pool between IP, HTTP Server and NetX BSD Support configuration:
  • In reply to ChrisS:

    By the way, I'm also seeing this issue in my bootloader, which only serves a very simple one request page for uploading a new firmware, but I will need to refresh a few more times in a row, so it seems to be somewhat related to webpage complexity or number of requests made.
  • In reply to ChrisS:

    Hello,

    attached are four wireshark dumps, two with Firefox, two with Chrome. 

    bootloader_success_chrome.pcapng and bootloader_success_firefox.pcapng show the loading of the bootloader page once which works as expected.

    In bootloader_refresh_problems_firefox.pcapng I repeatedly refresh the page by holding F5 until I get packet retransmissions. After waiting some time the webserver works again as expected. In bootloader_success_chrome.pcapng I tried to do the same, but was not able to provoke this behaviour. However, I noticed that with Chrome I need to press the button again while in Firefox I can hold it down to continously refresh, so maybe the frequency was too low here. I also see this behaviour with my more complex web page in my firmware, so I don't really think that this is the reason.

    traces.zip

  • In reply to ChrisS:

    This sounds very much like packet pool starvation. If so, he has checked everything but hte 'right' places. He needs to check for packet pool statistics (nx_packet_pool_info) and adjust his TCP settings (reduce the transmit queue length). Ideally he should have a separate packet pool for his HTTP server (only used to transmit packets). That way, if that packet pool is depleted, NetX can still in theory receive packets. If it cannot even receive packets, such as ACKs from the browser which would free up packets sitting on the transmit queue waiting to be ACKed or retransmitted, then the system gets stuck.

    Warren Miller and I just wrote an App Note for packet pool (and TCP) management. I will look for the link. If packet pool starvation is the problem, there are several useful suggestions for diagnosing it and mitigating it.

    I think the reason Linux does not jam up the server is that by sending RSTs, that kills the connection which frees packets trapped on the transmit queue.

    Thanks,
    Janet

  • In reply to Janet Christiansen:

    Chris, never mind I just downloaded the trace file.
  • In reply to Janet Christiansen:

    Here is the Knowledge Base article referenced:
    en-support.renesas.com/.../18188850

    Warren
  • In reply to WarrenM:

    Thanks Warren. Chris, please send this link to your customer. I think it will be very helpful to them. If not, I would want to look at his project to see what might be causing the problem.

    Janet
  • In reply to Janet Christiansen:

    I now use a separate pool for the HTTP server again.
    My observations so far:
    When I limit the "Maximum number of connections in queue" to values below "Maximum number of queued transmit packets (units)" and HTTP server packet pool size the problem where the server becomes completely unresponsive for some time after holding down F5 can be fixed but the performance of the server degrades. In particular, I tried values of "Maximum number of connections in queue" = 2 and both "Maximum number of queued transmit packets (units)" and HTTP server packet pool size = 20 right now. When I do the opposite the communication breaks down for obvious reasons. I've tried various combinations of these values between 5 and 20 right now and only the attempt with "Maximum number of connections in queue" = 2 completely prevented the problem from happening, although I was able to reduce the time where the server wouldn't respond. I guess I'll have to try some more to find an optimum here.

    I initially thought that the reason why this only happens on Firefox might be because it doesn't appear to respect the caching header I transmit ("Cache-Control: public, max-age=31536000") when pressing F5 while Chrome does. Firefox only uses its cache when clicking in the URL bar and pressing return. This results in less packets being transmitted which prevents this issue.

    However, even when I disable the cache in Chrome I cannot recreate this behaviour, so it must relate to how the browsers handle their connections when refreshing.