Receiving large MQTT messages

Still in our investigation phase and we have interest in maximizing our MQTT receive packet size. This effort is basically following on from http://renesasrulz.com/synergy/f/synergy---forum/15788/receiving-mqtt-messages.

As a part of this investigation, it came to my attention that if I exceed the size of an Ethernet frame (taking into consideration the MQTT header and whatnot), NetX is unable to handle the two packets correctly on receipt. The thing that worries me about this the most is that after about four of these incorrectly handled packets are received, MQTT fails altogether and the (mosquitto) broker assumes a timeout based disconnect. So, my questions are two-fold:

  1. Is this a defined size limitation? Is it documented somewhere? I've only found a couple PDFs on this and neither really seems to go into significant detail (r11um0068eu0511-synergy-nxd-mqtt.pdf and r11an0344eu0100-synergy-nxd-mqtt-mod-guide.pdf).
  2. Is there any knowledge or understanding about this multiple too-large-messages failure?

I have more details of things that I don't understand about how this plays out that I'll try to cover below.

First, I have a dedicated MQTT Thread with a NetX Duo MQTT stack using the following settings:

  • MQTT Client : Topic Name Max Length = 32
  • MQTT Client : Message Max Length = 3000
  • MQTT Client : Stack Size = 4096
  • g_packet_pool0 : Packet Size = 2000
  • g_packet_pool0 : Number of Packets in Pool = 16
  • g_packet_pool1 : Packet Size = 3104
  • g_packet_pool1 : Number of Packets in Pool = 16

For my testing purposes, I've been using a message of 1450 characters.

When this large message comes in, I can see _nxd_mqtt_packet_receive_process() kick in two times, once for each packet, but even though it evaluates the packets as being of packet_type MQTT_CONTROL_PACKET_TYPE_PUBLISH, it never goes into _nxd_mqtt_process_publish(). Frustratingly, I don't feel like I can really trust the debugger to show me what's happening because the relationships it draws between the assembly and the source don't always make sense. Regardless, I created a number of breakpoints that have log and resume actions for basically every source line that has an address associated with it (see below image for an idea where I'm talking about). What I can see is that it looks like it reaches the switch case where this should happen, however, it never enters _nxd_mqtt_process_publish(). If the first message the system gets is the 1450 message, for the first packet, I can see that the packet_ptr->nx_packet_length is 1460 (aka the max that will fit an Ethernet TCP frame), and the second packet is the length of the remaining message (4). Subsequent received messages well thenceforth show a length of 4, regardless of if they fit in one packet or not, though, interestingly, single packet messages do still seem to trigger all the appropriate functions and callbacks to be processed.

Example logs below contain primarily line identifiers, but also where it prints 3 '\003', that refers to the packet_type where 3 is MQTT_CONTROL_PACKET_TYPE_PUBLISH. And where it prints bare numbers, that is the packet_ptr->nx_packet_length. Lastly, bear in mind that the g_packet_pool0_pool_memory is randomly dependent on on network traffic, so I'm not sure that the location being the same is noteworthy in any way.

Message size: 1444

_nxd_mqtt_packet_receive_process:1764
_nxd_mqtt_packet_receive_process:1776
_nxd_mqtt_packet_receive_process:1777
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1786
3 '\003'
0x1ffe73a4 <g_packet_pool0_pool_memory+10300>
_nxd_mqtt_packet_receive_process:1794
3 '\003'
0x1ffe73a4 <g_packet_pool0_pool_memory+10300>
1458
Made it to _nxd_mqtt_process_publish()
Made it to _nxd_mqtt_process_publish()
_nxd_mqtt_packet_receive_process:1843
1458
_nxd_mqtt_packet_receive_process:1852

 Message size: 1450 (first ever received message)

_nxd_mqtt_packet_receive_process:1764
_nxd_mqtt_packet_receive_process:1776
_nxd_mqtt_packet_receive_process:1777
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1786
3 '\003'
0x1ffe8798 <g_packet_pool0_pool_memory+12360>
_nxd_mqtt_packet_receive_process:1794
3 '\003'
0x1ffe8798 <g_packet_pool0_pool_memory+12360>
1460
_nxd_mqtt_packet_receive_process:1764
_nxd_mqtt_packet_receive_process:1776
_nxd_mqtt_packet_receive_process:1777
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1786
3 '\003'
0x1ffebfec <g_packet_pool0_pool_memory+26780>
_nxd_mqtt_packet_receive_process:1794
3 '\003'
0x1ffebfec <g_packet_pool0_pool_memory+26780>
4

 Message size: 1450 (later received message)

_nxd_mqtt_packet_receive_process:1764
_nxd_mqtt_packet_receive_process:1776
_nxd_mqtt_packet_receive_process:1777
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1786
3 '\003'
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
_nxd_mqtt_packet_receive_process:1794
3 '\003'
_nxd_mqtt_packet_receive_process:1794
3 '\003'
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
4
4
_nxd_mqtt_packet_receive_process:1764
_nxd_mqtt_packet_receive_process:1776
_nxd_mqtt_packet_receive_process:1777
_nxd_mqtt_packet_receive_process:1780
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1783
_nxd_mqtt_packet_receive_process:1786
3 '\003'
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
_nxd_mqtt_packet_receive_process:1794
3 '\003'
_nxd_mqtt_packet_receive_process:1794
3 '\003'
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
0x1ffeb404 <g_packet_pool0_pool_memory+26780>
4
4
  • In reply to JanetC:

    Here is the state of the pools after the too large messages wreak their havoc:

      While I tried to capture this, the value of the empty requests kept going up.

  • In reply to JanetC:

    I was able to run with the guidance you gave and with the packet pools both set to quantity 16. With the modified file, the packet pool doesn't see the empty_requests increase, publishing and subscribing can both continue even while the too large messages essentially are treated as though they never arrived - as far as I can tell.

    I think this patch is helpful. What's the course of action to officially get it included? I'm a little worried about patching given that this lives in the generated code and the workflow you described, while it works, is hazard prone.
  • In reply to elene.trull:

    Elene,

    You can continue to use the patch but a 'hot fix' or official patch from Renesas can only happen after they go through their marketing review process. I will create a "work request" to get that process started. But to raise the visibility of this work request, I need to tie the work request to a customer support ticket.

    So the best thing you can do is create support ticket request summarizing the problem of the lost MQTT client connection and that there is no workaround on your end, and no way to move forward without a fix. I am not sure you can request that I be assigned the ticket, so when you know the ticket number, please let me know what it is so I can add it to the work request.

    Hopefully you can continue with the rest of your development in the meantime, until a formal solution is made available.

    Regards,
    Janet
  • In reply to JanetC:

    Tickets submitted to both Renesas (311546) and Express Logic (2692).