SSP_ERR_CMD_LOCKED during data flash write and erase

We're using S5D9 device and when use data flash rather heavily (we use it for file system), we receive SSP_ERR_CMD_LOCKED error sooner or later. Usually at write but sometimes also at erase call. For write the sequence is erase some blocks and then write them so they're erased for sure. We're NOT using BGO, all is synchronous and there is no access to flash from other threads.

We're using SSP 1.60 + e2studio 7.3.0 and S5D9 development boards. It occurs on more than one board.

The error source is in the function HW_FLASH_HP_operation_status_check() line 1244 for both write and erase errors:

I don't see we could do anything wrong there, we just call the APIs, check all return codes and this is always the first error. It occurs at different addresses during every test run. What to do with it?

  • What is the value of the FSTATR register :-

    uint32_t fstatr;

    fstatr = R_FACI->FSTATR;

    When the error occurs?

    Also, does the error occur in a particular address range? You say you are using the dataflash for a filesystem, does the error occur in the flash area that holds the FAT (assuming you are using a FAT filesystem), or an area that is used more heavily?

    Do you have any idea of the number of erase cycles the data flash has been through (or can you estimate it)?
  • In reply to Jeremy:

    ILGLERR and FRDY flags are set.

    We use own filesystem, not FAT. I haven't noticed any particular range, it seems random. However, I noticed cases when the same area was successfully erased and written not so long before and the next erase/write fails.

    My estimation is tens to low hundreds erase cycles.

    Well, it seems somewhat related to timing. We can't reproduce it with debug build, it is reproducible but not easy with release build and it is very easy to reproduce with release build with our traces which slightly slow things down (we use RTT). Still under investigation.
  • In reply to Michal:

    FSTATR when error occurs:

  • In reply to Michal:

    It is really related to timing. I added 10 ms delays before write and erase API calls and now it is easily reproducible also in release build (no traces). Still not reproducible in debug build even when the same delays are used. Crazy, I'd expect just opposite effect of delays...
  • In reply to Jeremy:

    Another observation: if I set compiler optimization to debug (-Og) for the single flash hw driver file synergy\ssp\src\driver\r_flash_hp\hw\target\hw_flash_hp.c, the problem can't be reproduced. I don't see it as a solution but it indicates there can be some bug in this code related to optimization.
  • In reply to Jeremy:

    I tested it with S5D5 development board as S5D5 is our final target. I was able to reproduce error even there. It is much harder, I reproduced in only using the easiest case for S5D9 i.e. release build with traces and added delays. However, the problem exists there and even if very rare, it is weird one which has to be solved as it could "kill" the devices in the field.

    For now I'd say it is hw driver or hw itself bug which is somewhat related to timing.
  • In reply to Michal:

    Do you have a project that exhibits the issue that you are able to share, so we can investigate the issue?
  • In reply to Jeremy:

    That's a problem. It is part of our project which is already rather big and contains a lot of company IP. Most of it is unrelated to this problem but I'd have to create a new project and extract there only relevant parts. Which could take few days which I don't have now...

    However, I'm willing to examine anything you need and/or make some experiments. Just tell me what to try.

    What ILGLERR flag means and how it can happen? I don't see this register documented in hw manual...

    BTW, we successfully reproduced it with the brand new S5D9 board so it isn't flash wearing issue.

  • In reply to Michal:

    The ILGLERR flag indicates either the flash sequencer has detected an illegal command (or command sequence), or the flash sequencer has detected an illegal flash memory access (i.e. try to program or erase outside a valid flash area).
  • In reply to Jeremy:

    Also, do you call the Flash API from multiple threads?
  • In reply to Jeremy:

    No, we can reproduce it from the unit test which is called during initialization when all other threads wait for it. Also, we use mutex as data flash access lock.

    The flash area used is valid, this was the first thing I checked. Actually, our functions always check area validity before calling flash API. In additon, we call blackCheck API for the range before write and it passes. The code looks like this:

  • In reply to Michal:

    BTW, why I'm never notified about your answers even when I have enabled it? I receive a lot of spam from Renesas but not important notifications...
  • In reply to Jeremy:

    Another observation: I tried recovery. After failed write erase the whole area and write again. For failed erase just repeat it. It works, till now it succeeded always on the 1st attempt. Using the same parameters as original so I don't see how our code could be wrong. Instead, it seems as a weird timing problem in hw driver code or hw itself.

    The number of errors isn't small, usually 3 - 5 during 10 sec erase/write test. Errors in one test, number in <> is time in seconds (from the boot, not from test start):

  • In reply to Michal:

    Did you find out the cause of this error? I also started seeing it again just recently on S5D5 after updating from SSP 1.5.0 to 1.7.0 (no idea if this matters). I had this problem some time before and tried to solve it by repeatedly calling erase and possibly reset when this error occurs. However, right now this doesn't seem to suffice anymore. I also see the same FSTATR value like you posted above.

  • In reply to ChrisS:

    In my case the error seems to occur in block 387. I've tried this on two devices so I suspect flash wear is not the issue. The second test device which I have with me right now has approximately 5000 writes.

    Right now I also believe this is some sort of timing error. I had been calling erase() with num_blocks=34 in my case and the error occured at block 2 or 3. Now I loop the blocks and call erase once for each block and retry up to ten times on failure (calling reset() to clear the status first). This appears to solve the issue.