NetX http, ping general tcp issues when connecting to device via VPN

Hello Forum,

We have been working on a product using a S7G2 part on ssp1.4.0 being built with the IAR tools.  We have discovered a weirdness we cant explain when we connect to the device through a VPN.  We have been working on the code base for months using an internal network.  No issues.  Multiple machines hitting the device pounding on several different ports without any troubles.  Today we connected to it via a VPN.  Specifically a Ubiquiti Edge router lite running 1.10.7.  The vpn client joins the the network with a 192.168.3.201 IP number.  Once in the vpn client computer can ping the device(192.168.3.240) and also get data from port 502.  If we point a web browser at the device the webpages do not load or only partially load, the pings will time out and then destination host becomes unreachable.  All of the other local machines also loose access to the device as well.  We have our our packet pool packet size set for 1568, and have 32 packets in the pool.  We are using 8 recv and 32 transmit buffer descriptors.  Our gateway is configured for .3.1.    

The zipped logs are just over the limit for adding to the post, but you can get to the zip file though this link. (if it does not work please let me know) What is weird is when the connection is to a local computer everything looks nice and clean.  You see the requests and then see the packets and their acks.  When the connection is though the VPN you see lots of packets going out and then lots of acks coming back in groups.  Not nice and orderly like the local machine is doing.  Eventually you see where the device drops off the network.  We are wondering if we are running out of packet.  If that is the case, would the files eventually load and the ack come back?  Unless all the packets eventually get lost then no acks would come back.  Would that cause the webserver and tcp for that matter to completely go offline and not recover.  If this is running out of packets in the pool, Is there a setting for them to shorten the time out for the outstanding packets, or should the packet pool be set to a larger number?  Is there a recommended setting based upon the size of the files we are sending?  We have about 0.5MB of data that is sent to a uncached browser and the largest file is around ~120KB.  

Has anyone had any experiences like this and have any recommendations?    

Thank you 

Matt

  • Hi Matt-
    Express Logic looked over the log file and has some suggestions.

    ++++
    There were 19 unACKed packets in TCP transmit queue. There were 8 packets used by rx BD. There were 5 SYN packets (connection requests) in TCP server socket listen queue. That is exactly 32 packets, the size of the packet pool. So most likely there was packet pool depletion.

    There are two ways to avoid that.

    1) Use a separate packet pool for the HTTP server. Typically the IP instance will use the g_packet_pool0. If another packet pool is added, e.g. g_packet_pool1, that packet pool would be used only by the HTTP server:

    status = nx_http_server_create(&my_server, "My HTTP Server", &server_ip, &ram_disk,
    pointer, 2048, &g_packet_pool1, authentication_check, NX_NULL);

    2) Increase the number of packets in the g_packet_pool0. It would be more memory efficient, especially if the plan is to transmit large files, to use a separate packet pool for the HTTP server, and increase that as needed.
    ++++

    Let us know if this helps.

    Warren
  • In reply to WarrenM:

    Hello Warren,

    Thanks for responding to this post. I've been curious how to size the http server packet pool to work with large files. One thing I noticed from these captures is that on one of them the packets sent were quickly ACKed so the pool size could be small, but on the one that failed there were eight or ten packets sent out before ACKs came back and then when the pool was eventually depleted and the transmission stopped working.

    How does NetX handle the depletion? Shouldn't it stop sending new packets until some have been ACKed which would free up space?

    If I need to send a 1MB file how do I size the packet pool to handle scenarios where the client is slow to respond and not risk locking up the webserver?

    Thanks for your help.

    Marc
  • In reply to Bowmanm:

    Hi Warren and Marc,

    Thank you for your responses.

    So Warren, I may have a misunderstanding on how the NetX works with TCP connections. Please correct me where Im wrong. But Im thinking how this works is that we have a pool of packets. A request is made from a client web browser, and the requested files and or data is then sent back using the packets from the pool. The packets have a payload of around 1500 bytes. The transmission process consumes the packets available in the pool. When the pool runs out of packets, the tcp stack waits for acks to come back and release specific packets. A good transmission. If the packet from the webserver or the responding ack becomes damaged in the transmission and the round trip response does not make it back to the tcp stack, that packet is now lost and the stack waits for a set period of time before resending. Eventually if all of the packets in the pool are lost or in the status of waiting for the ack to return one will run out of packets for the stack to do anything with and will totally stop responding as we saw. This is a function of the size of the overall size of all the data to be sent. But at some point I would expect the tcp stack to declare those packets that it never receives an ack for to be gone the stack would resent the packet. What I saw was the tcp stack totally stopped responding when it ran out of packets. Im guessing there is a setting that determines how long it will be before those lost packets will get resent. So is that a setting we have access to and is it a setting we should ever touch. Im thinking it maybe controlled via some tcp standard that the stack complies with. To Marcs point, when a large file is sent, lets say it 1 meg and you have only a 32 packet pool. For faster transmission you would have more packets available, but if you had limited memory and were willing to take the performance hit in overall through put, is there a way to have fewer packets in the pool and furthermore, if they are lost recover from that kind of event more quickly or gracefully? Specifically not having the tcp stack stop all communications. Im thinking that timeout (if it is actually a setting) is maybe taking into account worst case timing from days long ago where the internet had limited bandwidth and might have had many hops on very slow old links. If that is the case are there more modern standards that might have updated timing standards for resending lost packets? This is of course presuming any of my understanding of this is at all correct.

    Thank you Warren and Marc!
    Matt
  • In reply to rupertm:

    Hi Matt-
    Here is an initial response from EL. They are doing more analysis and will have more information when that is completed.

    +++++++
    I may have a misunderstanding on how the NetX works with TCP connections. Please correct me where Im wrong. But Im thinking how this works is that we have a pool of packets. A request is made from a client web browser, and the requested files and or data is then sent back using the packets from the pool.

    The packets have a payload of around 1500 bytes. The transmission process consumes the packets available in the pool. When the pool runs out of packets, the tcp stack waits for acks to come back and release specific packets. A good transmission. If the packet from the webserver or the responding ack becomes damaged in the transmission and the round trip response does not make it back to the tcp stack, that packet is now lost and the stack waits for a set period of time before resending.

    <Correct>

    Eventually if all of the packets in the pool are lost or in the status of waiting for the ack to return one will run out of packets for the stack to do anything with and will totally stop responding as we saw. This is a function of the size of the overall size of all the data to be sent. But at some point I would expect the tcp stack to declare those packets that it never receives an ack for to be gone the stack would resent the packet. What I saw was the tcp stack totally stopped responding when it ran out of packets.

    <There is a socket timeout for when packets get retransmitted which I believe defaults to 1 second. I wouldn’t mess with this. There is also a max number of retries (10, default value of NX_TCP_MAXIMUM_RETRIES). You can try reducing this number in lieu of increasing the packet pool size. When a socket times out the max number of times, it is automatically reset (connection is closed and transmit queues are flushed). >


    Im guessing there is a setting that determines how long it will be before those lost packets will get resent. So is that a setting we have access to and is it a setting we should ever touch. Im thinking it maybe controlled via some tcp standard that the stack complies with. To Marcs point, when a large file is sent, lets say it 1 meg and you have only a 32 packet pool. For faster transmission you would have more packets available, but if you had limited memory and were willing to take the performance hit in overall through put, is there a way to have fewer packets in the pool and furthermore, if they are lost recover from that kind of event more quickly or gracefully?

    <Increasing the packet pool will improve performance only up to the point where
    1) the TCP peers receive window fills up (with Windows PC this virtually never happens), or
    2) flow control limits how fast a TCP peer can send data without acknowledgment of previously transmitted data, to prevent network traffic congestion.

    On the receive side, the TCP application is obliged to receive and process TCP data and promptly release packets back to the packet pool. This needs to be looked at further and we will report back what we find.>

    Specifically not having the tcp stack stop all communications. Im thinking that timeout (if it is actually a setting) is maybe taking into account worst case timing from days long ago where the internet had limited bandwidth and might have had many hops on very slow old links. If that is the case are there more modern standards that might have updated timing standards for resending lost packets? This is of course presuming any of my understanding of this is at all correct.

    <Yes, and NetX uses modern variations to TCP (Reno, others) for proper flow control and packet loss handling. The default timeouts for NetX TCP socket retransmissions are in step with ‘modern’ network flow control, and it is not recommended to modify them unless you really know your TCP and congestion control protocol .”>
    ++++++
    Look for more info as analysis progresses.

    Warren
  • In reply to WarrenM:

    Hi Matt-
    More info from EL that should help solve the issue:

    +++

    • As long as the user application uses the same packet pool with IP instance, there is no way to prevent the user application, in this case the web browser, from exhausting the packet pool. And once there are no packets for IP thread internal operations, such as receiving ACK packets and ARP request, NetX’s whole network interface is effectively down. So a good solution is to use separate packet pools for the user application and IP instance. In that case, even if the user application packet pool is depleted, NetX can still respond to ACKs which in turn would result in packets on the retransmit queue being released back to the user application pool. And then their web browser is back in business.

    • The difference between the local and VPN network is the delay of receiving ACKs from the TCP client. For the local network the delay is much smaller than for the VPN network. In the local packet trace, the stream of data packets sent by the web browser without an ACK is typically 2-3 packets. In the VPN network the stream is much longer, 5 to 18 packets, Consequently in the local network, TCP doesn’t queue as many packets waiting to be ACKed, and no packet pool depletion.

    • Contrary to the earlier suggestions, there are no flow control limitations in play here.

    • As per the customer's suggestion of changing TCP: TCP protocol has nothing to do with packet allocation. NetX does not attempt to manage use of packets by the user application. It left to the user application to check its packet pool is not empty before sending the next packet (g_packet_pool.nx_packet_pool_available). Note this is more easily doen if the application has its own packet pool. Further, the TCP protocol does not permit dropping packets that require retransmission.

    A good utility for this situation is the nx_tcp_socket_transmit_configure service. It will allow an application to :

    1. Set the timeout shift to zero so that there is a constant retransmission interval, and not an exponential increasing one. This shortens the time between retransmissions and if necessary before the socket resets if no response is received from the TCP peer.
    2. Set the size of the transmit queue. It appears his socket permits him to keep 18 or more sockets on that queue. He should shorten that to 6 or 8 or whatever empirically is best.
    3. Set the maximum number of retries (retransmissions). The default is 10 but he could lower that reduce the the time of bailing on a connection.
    4. Set the socket timeout value. I don’t recommend messing with this because the default value is the optimal value of performance and preventing network congestion and unnecessary overhead. But this would be the safe way to do so. This value is in timer ticks.

    This function can be called run time too, so these TCP parameters do not have to be determined at compile time.
    +++
    Let us know if this helps.

    Warren
  • In reply to WarrenM:

    Hi Warren,

    Thank you and please thank Express Logic for me as well.

    This gives us some things to try and sheds some light on how all that stuff works.

    Ill let you know how things go. Have a great weekend.
    Thanks
    Matt
  • In reply to rupertm:

    Here is a new Knowledge Base article that organizes the above information.
    en-support.renesas.com/.../18188850

    Warren