Running out TCP packets when using the TLS stack

Hi,

I am implementing a HTTPS client with the NetX Duo TLS Session framework.  It works quite well most of the time, but I am having a LOT of troubles to get robustness under some circumstances. The issue is always related to the IP packet pool running out of packets. I suspect that there is a memory leak somewhere but cannot find any in my own code.

Currently, I am doing a continuous loop to post a GET HTTPS request and receive its response. I have a working application that will loop correctly forever as long as the TLS session ending (Close Notify Alert) is initiated and sent by the TLS stack. In the case where this Close Notify Alert is initiated by the host, the TLS stack seems to consume a Packet that is never returned to the pool. After a couple of tries, the pool is empty and nothing works anymore...

I would really need some help in order to track down the cause of this memory leak because I am running out of ideas. Any help would be greatly appreciated.

Basically, here are the steps I am doing in a continuous loop.

1- Reinitialize my TLS session before each new connection attempt. I realized through experimentation that trying to reuse a session (even after calling nx_secure_tls_session_end() or nx_secure_tls_session_delete()) does not clear everything, more specifically the certificate stores.

static void tlsSessionReinit(Handle H)
{
    int i;
    
    /* Delete session and reinit it */
    nx_secure_tls_session_delete(&g_tls_session);

    /* Clear structure because nx_secure_tls_session_start() does not clear everything... */
    memset(&g_tls_session, 0, sizeof(NX_SECURE_TLS_SESSION));
    tls_dtls_session_init(&g_tls_session);
    
    /* Configure TLS reassembly buffer. Per RFC 5246, TLS records may have up to 2^14 bytes. */
    ERROR_CHECK(nx_secure_tls_session_packet_buffer_set(&g_tls_session, &H->tlsRecordBuffer[0], TLS_RECORD_SIZE_MAX));

    /* Allocate space for storing the remote (host) certificates */
    for (i = 0; i < REMOTE_CERTIFICATE_COUNT_MAX; ++i)
    {
        ERROR_CHECK(nx_secure_tls_remote_certificate_allocate(&g_tls_session,
                                                              &H->RemoteCertifs[i].Instance,
                                                              H->RemoteCertifs[i].buffer,
                                                              blkPl_getBlockSize(BLOCK_POOL_CERTIFICATE)));
    }

    /* Set callback for the TLS module to get timestamp */
    ERROR_CHECK(nx_secure_tls_session_time_function_set(&g_tls_session, cbGetTimestamp));

    /* Set callback for TLS handshake certificate validation */
    ERROR_CHECK(nx_secure_tls_session_certificate_callback_set(&g_tls_session, cbCertificateValidate));
}

2-  Next I configure my certificate stores with certificateInitialize(), nx_secure_tls_trusted_certificate_add() and nx_secure_tls_local_certificate_add().

3- I bind my TCP socket, connect it and start my TLS session:

static bool tlsConnect(Handle H, const char* hostName, unsigned int port)
{
    bool retVal = false;
    NXD_ADDRESS Address = { .nxd_ip_version = 0 };

    /* Resolve server address */
    retVal = net_resolveAddress(hostName, &Address, true);

    /* Try to connect socket to server */
    if (retVal)
    {
        UINT err;

        /* Rebind socket with a new local port number.  */
        ERROR_CHECK(nx_tcp_client_socket_unbind(&H->TcpSocket));
        if (H->tcpLocalPort++ == TCP_LOCAL_PORT_MAX)
            H->tcpLocalPort = TCP_LOCAL_PORT_MIN;
        ERROR_CHECK(nx_tcp_client_socket_bind(&H->TcpSocket, H->tcpLocalPort, NX_NO_WAIT));

        err = nxd_tcp_client_socket_connect(&H->TcpSocket, &Address, port, HTTPS_DEFAULT_TIMEOUT);

        if (err != NX_SUCCESS)
            PRINTF_DBG("https: ERROR! Could not connect socket to host %s (err %u)" CRLF, hostName, err);
        else
        {
            PRINTF_DBG("https: Socket connected with host %s" CRLF, hostName);

            H->isTcpSocketConnected = true;

            /* Start TLS session using that socket */
            err = nx_secure_tls_session_start(&g_tls_session, &H->TcpSocket, HTTPS_DEFAULT_TIMEOUT);
            if (err != NX_SUCCESS)
                PRINTF_DBG("TLS: ERROR! Handshake failed (err %u)." CRLF, err);
            else
                H->isTlsSessionStarted = true;
        }

        retVal = (err == NX_SUCCESS);
    }

    if (!retVal)
        tlsDisconnect(H);

    return retVal;
}

4- I send a GET request to the host:

static bool httpGet(Handle H, const char * urn)
{
    bool retVal = false;
    char *buffer;
    size_t bufferSize = 0U;
    utils_SBuffer Buffer;
    NX_PACKET *PPacket = NULL;
    size_t dataSize;

    /* Allocate a work buffer */
    buffer = blkPl_allocate(BLKPL_ID_2KB, &bufferSize);
    utils_bufferInit(&Buffer, buffer, bufferSize, 1U);

    /* Build the GET request */
    utils_bufferAppendf(&Buffer, "GET %s HTTP/1.1" CRLF, urn);
    utils_bufferAppendf(&Buffer, "Host: %s:%u" CRLF, H->Config.hostName, H->Config.port);
    utils_bufferAppendf(&Buffer, "Authorization: Basic %s" CRLF, H->credentials);
    utils_bufferAppendf(&Buffer, "Accept: application/json" CRLF);
    utils_bufferAppendf(&Buffer, CRLF);

    /* Buffer must be large enough by design including the null-space */
    dataSize = strlen(buffer);
    ASSERT((dataSize > 0U) && (dataSize < (bufferSize - 1)));

    /* Allocate a TLS packet for this request */
    ERROR_CHECK(nx_secure_tls_packet_allocate(&g_tls_session, &g_packet_pool, &PPacket, UTILS_DEFAULT_TIMEOUT));


    /* Append data to packet */
    ERROR_CHECK(nx_packet_data_append(PPacket, buffer, dataSize, &g_packet_pool, UTILS_DEFAULT_TIMEOUT));

    /* Send request to host */
    err = nx_secure_tls_session_send(&g_tls_session, PPacket, UTILS_DEFAULT_TIMEOUT);

    /* According to doc, must manually release the packet if the operation failed. */
    if (err != NX_SUCCESS)
    {
        nx_packet_release(PPacket);
        PRINTF_DBG("HTTPS: Failed to send packet (err=%u)" CRLF, err);
    }

    blkPl_release(buffer);

    return (err == NX_SUCCESS);
}

5- Then I wait for the response:

static bool httpResponse(Handle H, char *bufferOut, size_t bufferSize)
{
    bool retVal = false;
    NX_PACKET *PPacket = NX_NULL;
    size_t blockSize = 0U;
    ULONG timeout = UTILS_DEFAULT_TIMEOUT;
    int packetSize;
    char *packetBuffer;

    ASSERT_PRE(bufferSize > 0U);
    ASSERT_PRE(bufferOut != NULL);

    PRINTF_DBG("HTTPS: Waiting for GET response" CRLF);

    /* Allocate a temp buffer for extracting packet data. Since tcp packets are at most 1568, a 2KB buffer is enough */
    packetBuffer = blkPl_allocate(BLKPL_ID_2KB, &blockSize);

    /* Loop until parsing is completed */
    while (!H->HttpParserResults.isCompleted)
    {
        /* Wait for the next packet to come */
        err = nx_secure_tls_session_receive(&g_tls_session, &PPacket, timeout);
        if (err == NX_SUCCESS)
        {
            ULONG bytesCopied = 0U;

            /* Extract the received data */
            ERROR_CHECK(nx_packet_data_retrieve(PPacket, packetBuffer, &bytesCopied));

            err = nx_packet_release(PPacket);
            if (err != NX_SUCCESS)
                PRINTF_DBG("HTTPS: nx_packet_release() failed (err=%u)" CRLF, err);

            /* Parse data */
            (...)

            /* Check for errors */
            if (parseError)
            {
                PRINTF_DBG("HTTPS: Response parsing failed. %s" CRLF,
                retVal = false;
                break;
            }
        }

    }

    blkPl_release(packetBuffer);
    return retVal;
}

6- Finally, I end the TLS session and disconnect the TCP port:

static void tlsDisconnect(Handle H)
{
    UINT err;

    /* Close tls session. We unconditionally end the session no matter if the TLS was started or not and simply let
     * the function return an error code if it was not started. This is because tests showed that some IP packets could
     * be left allocated in the TLS stack after a failing handshake. */
    // See comment above -- if (H->isTlsSessionStarted)
    {
        err = nx_secure_tls_session_end(&g_tls_session, HTTPS_DEFAULT_TIMEOUT);
        H->isTlsSessionStarted = false;
        PRINTF_DBG("TLS: Session ended (err=%u)" CRLF, err);
    }

    /* Unconditionally close the socket just to make sure that everything is cleared and let the function return an
     * error code if the socket was not opened */
    // See comment above -- if (H->isTcpSocketConnected)
    {
        err = nx_tcp_socket_disconnect(&H->TcpSocket, HTTPS_DEFAULT_TIMEOUT);
        H->isTcpSocketConnected = false;
        PRINTF_DBG("HTTPS: TCP socket disconnected (err=%u)" CRLF, err);
    }
}

Thanks for your help

Franck

  • Hi Franck,

    Regarding one comment in your code:
    "Allocate a temp buffer for extracting packet data. Since tcp packets are at most 1568, a 2KB buffer is enough"

    And then there is a call:
    /* Extract the received data */
    ERROR_CHECK(nx_packet_data_retrieve(PPacket, packetBuffer, &bytesCopied));

    The packet can be chained with other packets and thus nx_packet_data_retrieve can consume more than 2KB (verify bytesCopied). I suggest using the nx_packet_data_extract_offset which is safer (it takes not only the buffer, but also the buffer size).

    I'm not sure if this solves the issue, but for sure it can corrupt the memory. Please check if the issue still occurs.

    Regards,
    adboc
  • In reply to adboc:

    Hi adboc,

    Good catch, it looks like it's worth commenting the code :) I will certainly rework this part of the code and use nx_packet_data_extract_offset() to avoid any corruption.

    The GET request that I am using for my loop test results in a response much smaller than 2KB. Just to make sure, I have added a pre-check with nx_packet_length_get() and added an assert on the returned bytesCopied value. Both shows that the response packet size is always very small.

    So unfortunately this is not the cause of my issue. Any other suggestions?

    Thanks
    Franck
  • In reply to Franck:

    I have run tests over the weekend and I am still running out of packets after some time. After some debugging, I might have found the source of the issue which, I think, resides in the TLS stack.

    I am more and more convinced that I am getting lost packets (not released) when the host side sends TLS alerts like the one shown below in Wireshark:

    I believe that the faulty packet allocation resides in the function _nx_secure_tls_session_receive() on line 190:

    During normal operation, this packet gets released when _nx_secure_tls_send_record() is called successfully on line 214. But the problem is that this line is not called when the remote side sends an alert because of the if condition on line 211. When this happens, I don't see any other place where the packet is manually released with a call to nx_packet_release(). I cannot confirm that this IS the root cause of my leak issue because I cannot modify this protected code an apply the fix, but I am pretty sure it is.

    Renesas experts, does it sound right to you?

    Franck

  • In reply to Franck:

    Hi Franck,

    Renesas have acknowledged that this is an issue and that they are investigating and a fix/workaround will be supplied as soon as possible.

    Regards,
    adboc
  • In reply to adboc:

    Thank you adboc for the follow-up. I hope you can provide the fixed or workaround as soon as possible. In the meantime, we had no choice but to re-write the _nx_secure_tls_session_receive.c file from scratch in order to apply a temporary fix.

    Regards,
    Franck

  • In reply to Franck:

    Still no fix in the latest patch release SSP v1.3.3... This is very annoying since this is a major issue and only a two-lines modification:

            /* If we could allocate a packet, and the error is not because we recieved an alert from the remote host,
            then send an alert to the remote host. */
            if (status == NX_SUCCESS && error_number != NX_SECURE_TLS_ALERT_RECEIVED)
            {
                _nx_secure_tls_send_alert(tls_session, send_packet, (UCHAR)alert_number, (UCHAR)alert_level);
                status = _nx_secure_tls_send_record(tls_session, send_packet, NX_SECURE_TLS_ALERT, wait_option);
                if(status != NX_SUCCESS)
                    nx_packet_release(send_packet);
            }
            else
                nx_packet_release(send_packet);

    I really hope this will be part of the next patch release, this source file is protected and can't be edited and is part of the Synergy generated folder that is automatically populated.

    Franck

  • In reply to Franck:

    Franck,

    Thank you for highlight this issue. This fix is tied to several other fixes/enhancements needed for the NetxSecure update in v5.11, This will be included on the SSP 1.4.0 released scheduled in mid-March. A customer early access version will be available around Feb. 19th. If you need access to his early release, then please make this request to you local Renesas Sales person or to the online Synergy chat.

    Best Regards,

    Peter.
  • In reply to Peter Carbone:

    Thank you Peter for this follow-up, it is appreciated. We are also glad to hear that this new NetxSecure version will come with several other fixes and enhancements, this is welcome news. We look forward to get access to this new release.

    Regards,
    Franck