Issues when using FileX USB mass storage to write to certain types of USB drives - SSP 1.4, S7G2

I am using SSP 1.4 on and S7G2

I have been running into an issue with FIleX on USB Mass Storage which appears to be dependent on the type of USB drive that is connected. I set up a test that creates, opens, write and closes files repeatedly to test the timing and durability of writing to the USB drive.  Each file created and written is roughly 120kB to 180kB of logged data.  When I run the test on a 64GB USB 3.1 drive (Samsung BAR Plus) I have not experienced any issues, however when I run the same test with an inexpensive (No Brand Name) 128MB USB drive the thread will eventually hang, typically it fails to return from a fx_file_write.  I have tried multiple drives of the same type to see if it was just a bad USB device.  Most times I run the test I am able to write between 3-10 files before the process fails to complete, occasionally it has failed even on the first file, sometimes it’s completed 20 files without issue.

I understand that a filex operation may occasionally fail but since it never returns, I am unable to continue using the USB device and thread.  The thread is left in a “Semaphore Suspended” state.  Is there any recommendation on ways to keep the thread alive, retry or abort? 

The following is an overview of the process I have implemented

Opening the File -

  1. status = fx_media_space_available(g_fx_media0_ptr, &ulSpaceAvailable);
    • if ulSpaceAvailable < 512kB, I will not attempt to create the file
  2. status = fx_file_create(g_fx_media0_ptr, (char *)FileName);
  3. status = fx_file_open(g_fx_media0_ptr, ptrFile, (char *)FileName, FX_OPEN_FOR_WRITE);
  4. status = fx_file_write_notify_set(ptrFile, write_complete);
    • I have tried with and without setting the callback with no noticeable difference. The write notify set callback sets a bool flag “bWriteComplete = true”.
  5. status = fx_file_seek(ptrFile, 0);

If (status != FX_SUCCESS) from any of the previous commands I will stop the process and go to closing the file

Writing to File – This process will loop through up to 99 times to print out all of the log information.  The file buffer and write length can be up to 4096 bytes

  1. status = fx_file_allocate(ptrFile, ulWriteLength);
    • I have tried without and without file allocate with no noticeable difference
  2. bWriteComplete = false;
    • If the write notify callback is not set then I will skip this step
  3. status = fx_file_write(ptrFile, FileBuffer, ulWriteLength);
  4. while(bWriteComplete == false);
    • If the write notify callback is not set then I will skip this step

If (status != FX_SUCCESS) from any of the previous commands I will stop the process and go to closing the file

Closing File

  • status = fx_file_close(&ptrFile);
  • fx_media_flush(g_fx_media0_ptr);

I am using the USBHS Controller, the USB High Speed Interrupt Priority is set to 4, USBX Pool Memory Size is set to 65536. 

I have tried to put a delay of up to 60 seconds between each time the process is called without success.

In the past I had issues where fx_file_write appeared to be interfering with my SPI Interrupts causing missed readings from the SPI device.  I changed some priorities and now I am not missing readings but it appears the fx_file_write is not completing.  I am not certain if it is related at this point.

Are there any known issues with certain types of USB devices?  Are there any recommendations on how to keep the thread alive?

  • A few additional notes:
    I have tried increasing the stack size of the thread calling fx_file_write to 8192 (much larger than it should be). I also confirmed the stack has not overrun and has the known pattern 0xEF.
    I have tried increasing the stack size for the Mass Storage Class internal thread from 1024 to 2048. Increased the stack size for USBX threads from 1024 to 2048, and stack size for Enumeration thread from 1024 to 2048.
    I have also tried changing the thread priorities from 20 to 6
    I am out of ideas where to go from here.
  • Hi Kurt

    I can almost guarantee that this is not a FileX issue but rather something wrong with the flash drive or USB. I think the problem in general is “low end USB flash drives.” These are likely not within spec, and just barely work. A percentage of the super cheap 4GB ThreadX flash drives that we (Express Logic) used to give away at trade shows and such don’t work, or don't work for long.

    I will look into how to abort the thread suspended on the fx_file_write call.

    Are you constrained to use FileX with low end USB drives? there are other ways to download data to USB.

    Janet
  • In reply to KurtK:

    Try to get in contact with Express Logic directly. We found some FileX bugs and they sent us an up to date source package with a new FileX. The FileX with the SSP distro may not be the most recent.
  • In reply to JanetC:

    HI again

    You can use tx_thread_wait_abort to abort the wait condition on the thread. That will keep the thread alive at least. I noticed your USBx memory was set to 65k. Can you try doubling that? I don't think this is the problem, but with FileX/mass storage involved, it seems USBx requires a much larger memory pool. Of course this would show up with the higher quality USB drives but not a difficult thing to try at least.

    Do you have access to FileX source code? Our in house FileX expert will be back in the office next week and I can ask him what he suggests.
  • In reply to JanetC:

    There was a FileX bug in fx_file_write that was fixed in 5.5sp1. SSP 1.4.0 has 5.5. But that bug did not cause fx_file_write to hang up. It occurred if fault tolerance was enabled, and caused incorrect data writes. Don't bother getting in contact with Express Logic directly because 1) they will redirect you to Renesas support as they are told to do, and 2) I am Express Logic support.
  • In reply to JanetC:

    JanetC,

    Thanks for all of the information!  I am not constrained to using low end USB drives, i was just doing some testing with what I had sitting around and had found the issue.  It was concerning that it appeared to hang the thread and I didn't want a user to run into the issue in the field.

    After some more testing last night and this morning I actually discovered that it was not hanging but taking a very long time to time out and return.  I believe the timeout time involves a combination of the following settings.  

    • "Timeout in mili second for a BOT transfer request" = 100000
    • "Timeout in mili second for the status from a command in the Control/Bulk/Interrupt transport" = 30000

    After a period of time the fx_file_write does return with 0x04 (FX_NOT_FOUND) which isn't a return value listed in the FileX documentation.
    After the fx_file_write returns I then try fx_file_close followed by fx_media_flush both also take a long time to time out and return with an error code also not listed in the documentation.

    So it appears the thread does returning after a very long time but I don't believe the USB drive is usable again until a power cycle or the user removes and plugs the drive back in.  I have to find a way to catch the error condition and prevent the write attempt which causes the thread the hang.

  • In reply to JanetC:

    I have tried to increase the USBX memory to roughly 100k with no difference.  

    Are there any guidelines or specifications to what USB drives may have issues?  The product I am working would not be accessible by the user and rebooting everytime a write error occurs would not be acceptable.  

    I do not currently have the FIleX source code included, I do have USBX Host Class Mass Storage Source and USBX Source include.

    I had a thought that slowing down the transfer speed may help so I put 50ms delays after each fx_file_write and tried to reduce the "Maximum transfer size in bytes in one BOT data-transport phase" from 1024 to 512bytes and it did not appear to help.
    I would really like the system to work with any drive the user may connect so I am open to any suggestions you may have.

    Thanks,
    Kurt

  • In reply to KurtK:

    Hi Kurt

    It is hard to keep our documentation up to date. Part of the problem is an API might call internal functions which in turn call more internal functions, so it is hard to track down all the possible error statuses that can be returned.

    However, in this case FX_NOT_FOUND is returned directly lby fx_file_write when it tries to find the previous cluster of a copy head cluster. But that should not take 100 seconds to return.

    Either way, I agree your application needs to find a way to prevent these long hangs. There are definitely a lot of cheap USB devices out there that you cannot 'fix ' on the FileX/USBx side, and will need to abort the file write in a timely manner. That can only be done with USBx. By the way, did you try tx_thread_wait_abort on the suspended thread?

    Our in-house USBx expert comes back tomorrow. I will ask him what he thinks is the best way to handle this.

    Janet
  • In reply to JanetC:

    Hi again

    I forgot to ask, what do you mean by "I have to find a way to catch the error condition and prevent the write attempt which causes the thread the hang." Are you asking if there is a way FileX or USBx can tell ahead of time if fx_file_write will suspend, or having suspended once, not to allow further FileX calls on the device until it is power cycled?
  • In reply to JanetC:

    I would like to find a reliable way to determine if the USB write has failed so I can stop using the flash drive and alert the user. It might be the case that I simply catch a FX_NOT_FOUND and set a flag that the USB is disabled. I would be curious to know if there is another way you would recommend to determine the state of the USB drive before I try to access it since the commands take a long time to timeout and return.
    Thanks again for your help,
    Kurt
  • In reply to KurtK:

    Kurt

    Ok, that makes sense, thanks.

    If you have can send us a capture of the data between your Host and the devices that are suspending, that should enable us to see where the delay /hangup is happening. We use the Beagle TotalPhase analyzer for USBx tracing but any packet trace is fine, as long as their is a packet viewer available for it. If you don't have this, I am reasonably sure that our USBx guy can reproduce what you are seeing using some of our own cheapie USBx drives.

    Janet
  • In reply to JanetC:

    Hi Kurt

    We think the problem affects a USB transfer which is timing out. If this is the case, reducing the timeout value UX_HOST_CLASS_STORAGE_TRANSFER_TIMEOUT in ux_host_class_storage.h would shorten the lengthy timeout. It’s currently set to 10,000 ThreadX ticks (100 seconds).

    The ux_host_class_storage.h is found in [Project Directory]\synergy\ssp\inc\framework\el. Build your project at least once, then modify this option in the file to something like 500 or 1000 ticks. I don't think you need to make the file readonly or turn off Synergy Builder option in the project properties.

    See if reducing that value improves the super long wait to return from fx_file_write() and the other calls fx_media_flush, fx_file_close that you reported also taking a long time to return.

    We would need a packet trace from you to know exactly where it was hanging up (I mentioned this yesterday), to find out if there might be a way to check the storage state of the device. Otherwise, USBX doesn’t have a generic method for doing so.

    Janet
  • In reply to JanetC:

    Thanks again Janet,
    I will try to adjust the timeout value and do some more testing.
    I don't currently have the analyzer hardware since this is my first project using USB so I will have to work on getting the hardware to get you a trace.
    In the mean time I have ordered and am testing a few different USB drives from SanDisk, Kingston and PNY to verify they don't have any issues. Then I can specify to the user which drives are tested and supported.
  • In reply to KurtK:

    The Beagle analyzer ain't cheap but well worth it when you consider it is your best debugging tool, and developer/development time is more expensive in the long run. Anyway, let me know what you find out.

    Janet