Thread safe memory allocations?

We're using S5D9/D5 devices with SSP 1.60 and e2studio 7.3.0 and our project uses ThreadX.

It seems standard malloc/free routines are not thread safe and when called from multiple threads, heap is corrupted sooner or later. I tried to define __malloc_lock() and __malloc_unlock() routines to disable/enable interrupts just for the sake of test and problems disappeared. However, I'm not sure if it is sufficient and also disabling interrups is an overkill (I'd use mutex if it is a right way). I tried to search docs and found virtually nothing regarding this subject which probably means I'm missing something because it is really important thing.

Questions:

- what is correct and supported way to handle it?

- can you point me to the right documentation?

- should we use a big ThreadX byte pool instead of standard heap?

- also, why SSP doesn't handle it internally? Or does it?

Thanks,

Michal

  • Usually C standard functions can't handle multithreading.
    So OS provides memory management functions.
    You should use "tx_byte_allocate" or "tx_block_allocate" .
    The former allows any size allocation, but has lower performance.

    from threadX user manual
    ---------------------------------
    The performance of this service is a function of the block size and the
    amount of fragmentation in the pool. Hence, this service should not be
    used during time-critical threads of execution.-
    --------------------------------
  • Hi Michal,

    >>- what is correct and supported way to handle it?

    >>- can you point me to the right documentation?

    As I mentioned earlier, you should use threadX memory management functions.
    See ThreadX User’s Manual.

    >>- should we use a big ThreadX byte pool instead of standard heap?
    Yes.

    >>- also, why SSP doesn't handle it internally? Or does it?
    Basically SSP modue is independent of OS library.
    That is why you can select with OS or without OS.
  • Renesas mentioned about this.

    Can I use malloc() in a Synergy Platform project? | Renesas Customer Hub
    en.na4.teamsupport.com/.../18069293

    Sorrry for posting a bunch of times.
  • The SSP code does not directly use malloc(), and the use of mallo() in an embbedded system is discouraged.
    If you must use malloc, then have a look at :-

    embeddedartistry.com/.../implementing-malloc-with-threadx

    or at the end of this Rulz post :-

    renesasrulz.com/.../when-malloc-fails-allocation-it-doesn-t-return-null
  • In reply to tenballs:

    Hi tenballs,

    thanks for your responses. I'm aware about ThreadX memory pools but I don't see any real advantages against properly implemented malloc/free in our case. I see some disadvantages, though. Also, I don't think a note somewhere at web site is proper documentation. This is important things and if people aren't aware about it, it can lead to rare, random and very hard to debug problems. I'd expect at least a small chapter in standard documentation which explains possibilities, recommendations and typical use cases.
  • In reply to Jeremy:

    Jeremy,

    yes, I noticed, SSP uses memory pools, it leads to problems in our case. For example we extensively use crypto library and there isn't a good way how to find proper crypto pool size. Oversize is memory waste and undersize can easily lead to deadlock. In turn, we have all crypto instances opened all the time and use maximal pool size necessary for it to avoid unpredictable behavior. Memory waste and I can only hope it is implemented reasonably and there aren't new allocations during crypto operations. It looks like this but I'm again missing good documenation here.

    My experience with malloc is opposite. For example, I've used FreeRTOS with proper thread safe malloc port. Which I expected also here. Well, it is different.

    I've already found your code and tried __malloc_lock/unlock solution which seems to work. However, your code is more complicated, could you explain why? For me it was enough to write above two functions and they were called when needed. What's the purpose of the wrappers you have there?

    Also, you're handling mutex reenter from the same thread. Why? ThreadX mutexes aren't recursive? I haven't tried it, yet, but it'd be major drawback...
  • In reply to Michal:

    How can malloc be 'opposite' from our crypto library? Regardless which memory allocation tools you use, you have to 'plan for the worst' when it comes to allocating memory for using cryptography (or more technically certificate processing).

    Also, how is malloc preferrable to threadX memory services? Both just allow allocation and releasing of memory space. I'm curious because the first thing I ever learned about threadsafe programming is DONT USE MALLOC. And so many Synergy programmers cannot seem to bear parting with malloc

    Janet
  • In reply to JanetC:

    Janet,

    the problem with crypto library is I don't know what is the worst case. Is it documented somewhere what are memory requirements per given crypto algorithm? If so, I could simply alloc sum for all used contexts. Currently I've found when examined crypto pool after initialization but I can't be sure there aren't additional allocations during use. Our tests don't incicate it but it isn't reliable enough. In addition, there is wait time parameter for crypto. It is used for both crypto mutex and pool allocations. I'd be perfectly fine if crypto returns "no memory" error when pool is insufficient (well, it returns timeout in this case but I can live with it) but I'd have to set wait time to minimum which is unusable for multithreaded environment because it often fails on crypto lock. So I have to use "infinite" instead and risk deadlock if pool is insufficient. I believe there should be two independent parameters.

    Yes, thread safe malloc is equivallent to pool services. Except malloc can use all available memory but thread pool has to be set to fixed size. Well, maybe it could be allocated from heap during initialization, I haven't tried.

    Well, I don't resist on using malloc although we would have to change a lot of code we porting from different platform where malloc was perfectly fine. I asked what is the correct and supported way because haven't find it in the documentation. However, if I use __malloc_lock/unlock solution, is there any disadvantage against one big ThreadX pool? To me it looks the same.

    Michal

  • In reply to JanetC:

    I tested and checked ThreadX sources if mutexes are recursive and they are. In turn, I tried:

    void __malloc_lock(void) {
         tx_mutex_get(&MallocMutex, TX_WAIT_FOREVER);
    }

    void __malloc_unlock(void) {
        tx_mutex_put(&MallocMutex);
    }

    and it seems to work (sure, I used asserts to check return codes). The multithreaded test which failed before because of heap corruption now passes.

    So... is it correct and sufficient solution?

  • In reply to Michal:

    Hi Michal,

    >>However, if I use __malloc_lock/unlock solution, is there any disadvantage against one big ThreadX pool?
    Strictly speaking, if you use malloc you must lock/unlock at every buffer access.
    (Because freeing buffer crashes your access at other thread.
    If you don't call free() except program termination, You don't have to consider that)

    Furthermore, all allocated buffer are locked simultaneously because of one multex.

    ThreadX memory pool has owner thread informations.
    So you don't care pool access at any thread.
  • In reply to Michal:

    In response to your question about my quick example that i posted in this thread :-

    renesasrulz.com/.../when-malloc-fails-allocation-it-doesn-t-return-null

    I stated "however it looks like newlib nano (the default for a Synergy project) doesn't actually make calls to __malloc_lock or __malloc_unlock." this was using GCC compiler 4.9.3.20150529. Hence the need for the wrapper function.

    Using GCC compiler version 7.2.1 malloc from newlib-nano does use the the lock functions, so no wrapper functions are required when using GCC 7.21.

    With respect to the mutex, the ThreadX documentation states :-

    So my example could have been simplified.

  • In reply to Jeremy:

    Thanks. Simplified the way I did above?

    It seems to work well but I'm trying to figure out if I'm not missing something. Because malloc "works" even without it until extensive multithreaded testing is performed.
  • In reply to tenballs:

    > Strictly speaking, if you use malloc you must lock/unlock at every buffer access.
    Do you mean buffer de/allocation? Sure but the same applies for memory pools.

    > Furthermore, all allocated buffer are locked simultaneously because of one multex.
    For a while. I guess the same applies for memory pool but I haven't checked implemenation.

    It is quite possible there is small performance difference between thread safe malloc a pools but for our purposes it is unimportant. We don't have real time constraints in most of the code and where we have, we use preallocated memory.

    > ThreadX memory pool has owner thread informations.
    I believe it can be useful in some cases but we don't have such ones.
  • In reply to Michal:

    Getting back to the original complaint of 'wasted memory', you can use the nx_secure_metadata_size_calculate API to calculate exact memory requirements for crypto. TLS and crypto are expensive operations in time and space but if you need security you have to deal with these costs.

    Note that certificate buffers, e.g. for handling remote certificates and maintaining a trust certificate store, are not covered by metadata. but most servers will be economical in their certificates as the larger the certificate the greater the costs .

    Our TLS implementation otherwise is very memory efficient. Comparisons with TLS and mBed and TLS with FreeRTOS are favorable. Microsoft did some comparison testing. They were not able to run two simultaneous TLS connections on their device with either mBED or FreeRTOS but were able to with ThreadX/NetX Secure TLS (no dynamic memory allocation required). You won't find a 'leaner' TLS implementation elsewhere.

    Malloc and heap:
    In the opinion of our TLS developer who has something like 15 years experience of embedded security development, no and no. malloc isn't really the memory saver most programmers think it is. Under the hood, it is much more expensive than it appears. And you do not need to use heap memory for dynamic memory in the ThreadX environment.

    So my guess is as far as NetX Secure/crypto is concerned, you are putting yourself to way too much work trying to out-think our TLS stack. You should be able to work off our examples of NetX Secure demos to fine tune to your application.

    Janet
  • In reply to JanetC:

    We are not using your TLS stack. We're using lower level crypto algorithms as AES, RSA and SHA256 directly. I can't describe details here but our device has rather special requirements and we have to implement security handling ourselves. However, it was only an example. Memory pools used by SSP are exclusive for given module and this memory can't be used by anything else. That's may not be a disadvantage in some cases but can be in others. We have cases when we need all crypto algorightms at once so pool size has to cover it. In other cases we don't need any but need a lot of memory for something else. If crypto uses heap memory, it could be used here but with pool it can't be. That's my point.

    > malloc isn't really the memory saver most programmers think it is. Under the hood, it is much more expensive than it appears.

    I don't understand. To me is seems as the (memory) overhead here is the same for both possibilities. There can be performance difference but as I said, it isn't important for us.

    Michal