Receiving large MQTT messages

Still in our investigation phase and we have interest in maximizing our MQTT receive packet size. This effort is basically following on from http://renesasrulz.com/synergy/f/synergy---forum/15788/receiving-mqtt-messages.

As a part of this investigation, it came to my attention that if I exceed the size of an Ethernet frame (taking into consideration the MQTT header and whatnot), NetX is unable to handle the two packets correctly on receipt. The thing that worries me about this the most is that after about four of these incorrectly handled packets are received, MQTT fails altogether and the (mosquitto) broker assumes a timeout based disconnect. So, my questions are two-fold:

  1. Is this a defined size limitation? Is it documented somewhere? I've only found a couple PDFs on this and neither really seems to go into significant detail (r11um0068eu0511-synergy-nxd-mqtt.pdf and r11an0344eu0100-synergy-nxd-mqtt-mod-guide.pdf).
  2. Is there any knowledge or understanding about this multiple too-large-messages failure?

I have more details of things that I don't understand about how this plays out that I'll try to cover below.

First, I have a dedicated MQTT Thread with a NetX Duo MQTT stack using the following settings:

  • MQTT Client : Topic Name Max Length = 32
  • MQTT Client : Message Max Length = 3000
  • MQTT Client : Stack Size = 4096
  • g_packet_pool0 : Packet Size = 2000
  • g_packet_pool0 : Number of Packets in Pool = 16
  • g_packet_pool1 : Packet Size = 3104
  • g_packet_pool1 : Number of Packets in Pool = 16

For my testing purposes, I've been using a message of 1450 characters.

When this large message comes in, I can see _nxd_mqtt_packet_receive_process() kick in two times, once for each packet, but even though it evaluates the packets as being of packet_type MQTT_CONTROL_PACKET_TYPE_PUBLISH, it never goes into _nxd_mqtt_process_publish(). Frustratingly, I don't feel like I can really trust the debugger to show me what's happening because the relationships it draws between the assembly and the source don't always make sense. Regardless, I created a number of breakpoints that have log and resume actions for basically every source line that has an address associated with it (see below image for an idea where I'm talking about). What I can see is that it looks like it reaches the switch case where this should happen, however, it never enters _nxd_mqtt_process_publish(). If the first message the system gets is the 1450 message, for the first packet, I can see that the packet_ptr->nx_packet_length is 1460 (aka the max that will fit an Ethernet TCP frame), and the second packet is the length of the remaining message (4). Subsequent received messages well thenceforth show a length of 4, regardless of if they fit in one packet or not, though, interestingly, single packet messages do still seem to trigger all the appropriate functions and callbacks to be processed.

Example logs below contain primarily line identifiers, but also where it prints 3 '\003', that refers to the packet_type where 3 is MQTT_CONTROL_PACKET_TYPE_PUBLISH. And where it prints bare numbers, that is the packet_ptr->nx_packet_length. Lastly, bear in mind that the g_packet_pool0_pool_memory is randomly dependent on on network traffic, so I'm not sure that the location being the same is noteworthy in any way.

Message size: 1444

_nxd_mqtt_packet_receive_process:1764
_nxd_mqtt_packet_receive_process:1776
_nxd_mqtt_packet_receive_process:1777
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1786
3 '\003'
0x1ffe73a4 <g_packet_pool0_pool_memory+10300>
_nxd_mqtt_packet_receive_process:1794
3 '\003'
0x1ffe73a4 <g_packet_pool0_pool_memory+10300>
1458
Made it to _nxd_mqtt_process_publish()
Made it to _nxd_mqtt_process_publish()
_nxd_mqtt_packet_receive_process:1843
1458
_nxd_mqtt_packet_receive_process:1852

 Message size: 1450 (first ever received message)

_nxd_mqtt_packet_receive_process:1764
_nxd_mqtt_packet_receive_process:1776
_nxd_mqtt_packet_receive_process:1777
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1786
3 '\003'
0x1ffe8798 <g_packet_pool0_pool_memory+12360>
_nxd_mqtt_packet_receive_process:1794
3 '\003'
0x1ffe8798 <g_packet_pool0_pool_memory+12360>
1460
_nxd_mqtt_packet_receive_process:1764
_nxd_mqtt_packet_receive_process:1776
_nxd_mqtt_packet_receive_process:1777
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1786
3 '\003'
0x1ffebfec <g_packet_pool0_pool_memory+26780>
_nxd_mqtt_packet_receive_process:1794
3 '\003'
0x1ffebfec <g_packet_pool0_pool_memory+26780>
4

 Message size: 1450 (later received message)

_nxd_mqtt_packet_receive_process:1764
_nxd_mqtt_packet_receive_process:1776
_nxd_mqtt_packet_receive_process:1777
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1786
3 '\003'
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
_nxd_mqtt_packet_receive_process:1794
3 '\003'
_nxd_mqtt_packet_receive_process:1794
3 '\003'
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
4
4
_nxd_mqtt_packet_receive_process:1764
_nxd_mqtt_packet_receive_process:1776
_nxd_mqtt_packet_receive_process:1777
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1786
3 '\003'
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
_nxd_mqtt_packet_receive_process:1794
3 '\003'
_nxd_mqtt_packet_receive_process:1794
3 '\003'
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
4
4
  • From "NetX Duo™ MQTT (NetX Duo MQTT) for clients User Guide" in the X-Ware documentation :-

     

  • In reply to Jeremy:

    Hi Jeremy,

    I did see this, but when I found that it is able to publish messages that require more than one packet, I was obliged to consider the reverse. Finding that receiving messages that require multiple packets crashes MQTT makes me concerned, as if there were a malicious entity on the network, it could easily crash my MQTT client (intentionally or not).
  • In reply to elene.trull:

    Elene,

    You should *never* set the MAX MESSAGE LENGTH to greater than packet payload minus TCP and IP headers. If you do that, you will have no protection against MQTT crashing. MQTT does not support packet chaining so it cannot handle such large messages anyway. Your only protection against oversized messages is to keep the MAX MESSAGE SIZE to within what will fit in the packet payload.

    If you still see peculiar results after reducing your MAX MESSAGE LENGTH, we need to look into that as it might be a possible bug in MQTT worthy of opening up a support ticket or starting a new Rulz post.

    Janet
  • In reply to JanetC:

    Hi Janet,

    Thanks for the information, and sorry for the delayed response! I have set up the following configuration, and I still see that after several large messages are received, the NetX Duo client crashes.

  • In reply to elene.trull:

    Sorry, to follow up... I realized that I may have miscalculated, so I bumped the max length down to 1300 and double checked, and I still see problems with that configuration.
  • In reply to elene.trull:

    Your new configuration looks reasonable. Is MQTT still crashing or just failing to handle a large message (spanning two packets) properly?

    I don't understand this nx_packet_length = 4 in your previous debug output. That looks like the packet type, which if it is 4 that is PUB_ACK, not packet length. If this so called packet of length 4 does not go to nxd_mqtt_process_publish(), where does it end up going?

    Can you send me a packet trace (wireshark)? I need to see exactly what is received by your NetX device, and what is going out. Debug output doesn't really tell me what I need to know just yet.

    Thanks,
    Janet
  • In reply to JanetC:

    Hi Janet,

    The nx_packet_length = 4 is explicitly the remainder of the message that doesn't fit in one TCP frame. If the output were meant to be the packet type, it would print as it did when I printed the packet type MQTT_CONTROL_PACKET_TYPE_PUBLISH (3 '\003'), but instead it printed an ASCII 4.

    I'm attaching a wireshark capture with the following circumstances (I dragged and dropped the file in, but I'm not sure it took it, so we'll see after I post....):

    host machine (Windows) running mosquitto broker

    PK-S5D9 publishes -t "/bay/00:11:22:33:44:55:66:77/TestTopic" -m "TestMessage"

    host machine (Windows) running mosquitto sub -t "/bay/00:11:22:33:44:55:66:77/TestTopic"

    host machine (Windows) running mosquitto pub -t "/bay/00:11:22:33:44:55:66:77/example" -m "012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901"

    Having moved on from the original effort, I don't currently have the instrumented breakpoints from the debugging effort described earlier in the thread, the code has evolved some, and the failure doesn't seem exactly the same. Regardless, with the above circumstances, it appears that I get a NXD_MQTT_INTERNAL_ERROR due to the length being larger than the packet size as determined by _nxd_mqtt_read_remaining_length() (called via _nxd_mqtt_packet_receive_process() -> _nxd_mqtt_process_publish()).

    However, after receiving a handful of too long messages, the devkit gets disconnected, and not by the broker, which claims the disconnect after keepalive times out.

  • In reply to elene.trull:

    Hi Elene

    The best way to debug this is to use the pcap data to exactly simulate what your windows server is sending the MQTT Client and track down this 4 byte packet. Please feel free to email your packet trace to my email address, jchristiansen@expresslogic.com if that is easier.

    In the meantime, I'll look at your project to see how your MQTT client is set up.

    Regards,
    Janet
  • In reply to JanetC:

    Oops, misunderstanding. I do have your packet trace. I thought this zip file was your project.
    Janet
  • In reply to JanetC:

    On other item. since you are using separate packet pools for the MQTT client and the underlying IP thread task, try increasing the number of packets in the IP thread task pool to 32. There is some weirdness on the part of the server (it is sending 1463 byte TCP packets, in violation of the 1460 MSS option, but it also looks like your receive packet pool might be depleted.

    Janet
  • In reply to JanetC:

    I'm attaching the old project where I'd started this investigation, in case it proves useful - it's really bare bones and doesn't implement any of the notify callbacks or anything. The production related code I would take additional effort to scrub, else we'll need an NDA. 

    MQTTInvestigation.zip (I did have to strip the synergy folder to meet the size constraints, so that will have to be re-generated.)

  • In reply to JanetC:

    Changing this allowed me to send ~13 too large packets before the MQTT client stops working (as opposed to ~4 with the smaller packet pool).
  • In reply to elene.trull:

    When it stops working, can you check both packet pools for packet availability? The easiest way to do so is to call nx_packet_pool_info_get.c and check the empty_pool_requests variable. Also, open up the instances of the packet pools in your debugger and check nx_packet_pool_available for the current state of the packet pool. If either there are empty pool requests or the number of available packets is zero, packet pool depletion might be the problem.

    There was a packet leak fixed where invalid packets e.g. chained or too large packets, were not properly released. I will verify this and get back to you.
  • In reply to JanetC:

    nxd_mqtt_client.txt

    Hi Elene

     

    This is the nxd_mqtt_client.c file which has a patch to correct a packet leak in the MQTT Client.  If you want to test this fix, you should be able to substitute this into the synergy folder in your project (save the original first).  In my test project, that would be here:

     

    C:\Users\jachr\e2_studio_7_5_1\Workspace_1\test_MQTT\synergy\ssp\src\framework\el\nxd_application_layer\nxd_mqtt_client

    Then before you rebuild the project, right click on your project and choose properties -> Builder and uncheck Synergy Builder or it will clobber this patched MQTT Client file.  Let me know if you need help with this. 

    Regards,

    Janet