HTTP Server responsiveness

I'm serving an HTML page consisting of static resources compiled into my application (no FileX) and some callbacks which are called by Javascript on the page to fill in some dynamic values.

I noticed that the server can become unresponsive for some time, usually 1-2 minutes or so when under high load, e.g. by repeatedly pressing F5. The client will keep retransmitting packets for some time which are not acknowledged by the server (monitored with Wireshark).

The HTTP server then reports some connection failures in its nx_http_server_connection_failures property. After waiting for said time the server is responsive again.

The strange thing is that I only observed this behaviour on three notebooks running Windows 10/Firefox, on a Linux PC/Firefox I can not reproduce this behaviour and I can keep hitting F5 without disturbing the communication. Two notebooks are from our company, one from a university, so it seems unlikely that this is caused by specific settings on these devices. On the Linux PC the connections get cancelled by sending RST to the server for the outdated connections, which I cannot observe on the Windows devices when under high load.

Anything I can do about this? Maybe decrease the chance of this happening, or decrease the recovery time? I already tried to play with some of the webserver settings, e.g. timeouts max connections in queue, ... without too much success.

Below the webserver statistics after some testing on both devices:

{
    "server": {
        "connections pending": 0,
        "allocation errors": 0,
        "connection failures": 333,
        "connection successes": 2671,
        "get requests": 2661,
        "Invalid HTTP headers": 6,
        "total bytes received": 0,
        "total bytes sent": 0,
        "unknown requests": 0
    },
    "ip": {
        "invalid packets": 0,
        "invalid transmit packets": 0,
        "packets forwarded": 0,
        "packets reassembled": 0,
        "raw packet suspended count": 0,
        "raw received packet count": 0,
        "reassembly failures": 0,
        "receive checksum errors": 0,
        "receive packets dropped": 131,
        "send packets dropped": 60,
        "successful fragment requests": 0,
        "TCP active connections": 6,
        "TCP bytes received": 1571584,
        "TCP bytes sent": 8172739,
        "TCP checksum errors": 0,
        "TCP connections": 3043,
        "TCP connections dropped": 11,
        "TCP created sockets count": 1,
        "TCP disconnections": 2983,
        "TCP created sockets count": 1,
        "TCP invalid packets": 0,
        "TCP packets received": 3246,
        "TCP packets sent": 6834,
        "TCP passive connections": 3048,
        "TCP receive packets dropped": 58,
        "TCP received packets count": 0,
        "TCP resets received": 497,
        "TCP resets sent": 9,
        "TCP retransmit packets": 30
    }
}

HTTP Server configuration:

IP configuration:

Shared packet pool between IP, HTTP Server and NetX BSD Support configuration:
  • I tested some more, it's really only Firefox that causes this issue under Windows, IE, Edge and Chrome are fine under Windows and Firefox works under Linux. I'm still interested in possible fixes/improvements though.
  • In reply to ChrisS:

    Hi,

    I wonder if it a packet starvation issue. I see you have a lot of packets of suitable size but as they are shared between all services it could be that one of them is starving others and causing the bottleneck. Try adding separate packet pools and see if it makes a difference. Then start to combine and see when things change as this may give some clue to what is happening. A colleague mentioned that Firefox may be opening lots of parallel connections causing a problem.

    Regards,

    Ian.
  • In reply to Ian:

    Hi Ian,
    I'm going to give that a try. I just tried to change Firefox about:config to limit the max number of connections but this didn't have any effect on the problem. I think it's more related to how and if Firefxox cancels pending connections that are not needed anymore. This behaviour looks different in Wireshark when comparing Firefox under Windows and Linux.

  • In reply to ChrisS:

    I splitted my packet pool into three separate pools:
    IP: 40 x 2048 bytes
    HTTP Server: 30 x 2048 bytes
    BSD Sockets: 16 x 2048 bytes

    Unfortunately this did not help.
  • In reply to ChrisS:

    Hi,

    This is probably no help but I have a test system using SSP v1.5.2 serving Javascript gauges with periodic updates. All the page content is fetched from USB mass storage using FileX. This runs fine under high load with Firefox on Windows and Linux (Ubuntu). Firefox Linux version is 61.0.1 64-bit. So, the difference with your system apart from the lack of FileX is the use of BSD sockets. So, this may be an area to look into to track down the issue.

    Regards,

    Ian.
  • In reply to Ian:

    Disabling the BSD sockets thread doesn't help me either unfortunately. I think this is mostly tied to packet pool size, e.g. if I use a larger pool I can spam it a bit more with requests before timing out, but holding the F5 button will definitely cause the issue on Firefox/Windows.
  • In reply to ChrisS:

    First there is no need to set packet payload above 1568 bytes. No harm done if you do, but the extra size is never used for any device with an MTU of 1536 bytes or thereabouts.

    Second, check which packet pool may be depleted. Use the nx_packet_pool_info_get() service to get the number of free packets in the pool, and number of empty requests (failed packet allocations). A more direct approach is querying directly, g_packet_pool0[or whatever the name of your packet pool is].nx_packet_pool_available.

    Third, you can limit packet pool depletion by limiting the size of the transmit queue of the HTTP server socket. There is a document on 'packet pool tips' which I can forward to you (not sure how to do this in Renesas Rulz, but I'll copy the last paragraph of it for you here:

    "A useful utility for helping to manage packet pools is the nx_tcp_socket_transmit_configure() service. It will allow an application to:

    1. Set the timeout shift to zero so that there is a constant retransmission interval, and not an exponential increasing one (as is the default). This shortens the time between retransmissions and when necessary before the socket resets if no response is received from the TCP peer.
    2. Set the size of the transmit queue. The transmit que can be adjusted during testing to whatever size is empirically best.
    3. Set the maximum number of retries (retransmissions). The default is 10 but this could be adjusted to reduce the time before bailing on a connection.
    4. Set the socket timeout value. (Not recommended unless absolutely necessary because the default value is the optimal value of performance and preventing network congestion and changing introduces unnecessary overhead). But this is an option and can be a safe way to do so with some extra testing.

    This function can be called run time too, so these TCP parameters do not have to be determined at compile time."

    Let me know if that helps. There was a customer who ran into similar problems moving his TCP server behind a router/VPN which happened to delay packets longer than delays on a local network, and this was sufficient under high traffic conditions to cause packet depletion.
  • In reply to Janet Christiansen:

    @All-
    The full document on Packet Pool Size and Optimization Tips and Techniques is available in the Knowledge Base here:
    en-support.renesas.com/.../18188850

    Warren
  • In reply to Janet Christiansen:

    Thanks for the tips! I added some monitoring for the minimal nx_packet_pool_available value and it didn't drop below 52/70 (using one packet pool again) when this problem occurs, so this cannot be the reason. I now also tried to set the binary left shift multiplier to 0, set max number of retries to 3 instead of 10, tried to increase transmit queue and max. number of listen requests. However, with none of these tries I've managed to be able to refresh 3 times or more in quick succession. I'll try to work a bit on my webpage to reduce the number of requests, maybe this helps a bit. Firefox reports that the page loads 18 resources in total right now. I have set a cache-control header, but in Firefox it only applies when one pressres enter in the URL field, not when pressing F5.

  • In reply to ChrisS:

    Hi Chris,

    What board/MCU is your project running on? What is the phy used? Are ethernet pins set to maximum drive capacity available? What is the SSP version?

    Regards
  • In reply to ChrisS:

    Hi Chris-
    Some additional pointers from Express Logic:
    ++++
    Do not increase transmit queue length. The idea is to decrease it to reduce the number of packets stranded on the socket queue.

    Do not increase the max listen requests yet. This will not deplete the transmit queue but might keep a higher number of received connection requests (SYN packets) on the socket receive queue.

    Recommend calling nx_packet_pool_info_get which will report if at any time a packet pool allocation failed. Monitoring it manually may miss the period of time when the packet pool is depleted.

    Is the packet pool of interest the one the HTTP server uses to transmit packets? The dynamics of the web page should be that the transmit packet pool gets the heaviest usage. However, if the receive packet pool is depleted, the web server performance would decrease dramatically.

    Can you send a packet trace where the web server bogs down?
    ++++
    Thanx,
    Warren
  • In reply to WarrenM:

    Hi,

    Can you let us know the Synergy device being used? Also, can you attach the map file from the linker to a forum post please? There can be an issue when packet pools cross between internal SRAM boundaries.

    Regards,

    Ian.
  • In reply to Renesas Karol:

    Hello Karol,

    this is a custom board with S5D5 and a Broadcom BCM5241 PHY. Drive capacity is max, SSP version is 1.5.0 rc1. I'm going to try to break this down to a minimum example, but I'm very busy right now unfortunately, so it may take a while.

    Thanks,
    Christian
  • In reply to Ian:

    Hi Ian,
    I'm not sure if I can append the whole map file, are these the relevant lines?

    _tx_thread_preempt_disable
    0x4 ./synergy/ssp/src/framework/el/tx/tx_src/tx_thread_initialize.o
    g_packet_pool0_pool_memory
    0x23e38 ./src/synergy_gen/common_data.o
    ctr_drbg 0x4 ./src/cloud/connector/src/pctp/mbedTLS/pctp_mbedtls.o


    g_http_server 0x288 ./src/synergy_gen/http_thread.o
    g_packet_pool0 0x3c ./src/synergy_gen/common_data.o
    _tx_block_pool_created_count
    0x4 ./synergy/ssp/src/framework/el/tx/tx_src/tx_initialize_high_level.o
    _tx_timer_created_count
    0x4 ./synergy/ssp/src/framework/el/tx/tx_src/tx_timer_initialize.o

    .bss.nx_bsd_addrinfo_pool_memory
    0x1ffe05d8 0x2d0 ./synergy/ssp/src/framework/el/nx_bsd/nx_bsd.o
    .bss.nx_bsd_socket_pool_memory
    0x1ffe08a8 0x2780 ./synergy/ssp/src/framework/el/nx_bsd/nx_bsd.o

    COMMON 0x1ffeb108 0x26b68 ./src/synergy_gen/common_data.o
    0x1ffeb108 g_fx_media0
    0x1ffeb10c g_packet_pool0_pool_memory
    0x2000ef44 nx_record0

    0x2000f29c g_ip0
    0x2000fa34 g_packet_pool0
    0x2000fa70 nx_bsd_stack_memory

    Best regards,
    Christian
  • In reply to WarrenM:

    Hi Warren,

    I will look into this, I'm a bit low on time right now unfortunately, but I will definitely get back to this. I'll also create some traces of Firefox vs. Chrome behaviour.

    Right now I only use one packet pool for everything. I had tried separating it into multiple pools (see above) without an improvement.

    I'm currently monitoring the minimum number of available packets in a loop of one thread (accessing the variable directly instead of calling nx_packet_pool_info_get though). It always remained at something lik 50 available packets of 70 packets total, but of course it might miss it if lots of packets are allocated and deallocated again quickly.