How do I use both internal SRAM and external SDRAM for malloc on RZA1?

We are using:

  • 3.12.x kernel
  • XIP for kernel base libraries
  • SPI flash for our application
  • external SDRAM for Linux
  • (parts of) internal SRAM for frame buffer

We would like to make the unused portion of the on-chip RAM available to malloc, basically to complement the external SDRAM.  Preferably in a manner that is completely transparent to any user space code that uses new or malloc.

So if I'm using 32MB external SDRAM, and I'm using 2MB out of 10MB for framebuffer, I'd like to end up with 32+(10-2)=40MB of RAM that is available to and managed by the Linux kernel and thus available for malloc etc.

Is this at all possible in Linux for RZ?  If so, is it possible in the 3.12 kernel and what exactly would need to go into the device tree and other files?

Now in absence of making this 100% transparent to Linux user space, I guess the next best thing would be to replace malloc w/ a custom implementation that is aware of the SRAM (and has mapped it into the user space address space via mmap). I think I understand how to override the default malloc so that the application is not even aware that a custom one is used, and I understand conceptually what this custom malloc would have to do, but struggling to find a library to provides a heap allocation implementation that is as good as malloc, in terms of ability to deal with fragmentation.   So any pointes to a good heap allocator library would also be appreciated.

 

  • Yes, it would be transparent. You just need to add the additional memory block to the Device Tree.

    Have a look at this post:
    renesasrulz.com/.../20323

    I think I remember something about when you add multiple memory blocks, the kernel wants them in order from lowest address to highest. So your internal memory (at 0x2000000) would go after your SDRAM memory (at either 0x08000000 or 0x0C000000)
  • In reply to Chris:

    Hi
    Thanks for the reply. we have some more questions :
    1)
    do we need to enable CONFIG_SRAM in kernel also ?

    2)
    After adding this SRAM memory block to device tree , we want to do memory validation of this internal memory block once this internal SRAM memory appears as "free memory" .
    Can you suggest some Utilities present in kernel or user space which we can use for memory validation of this internal SRAM block ?
    shall we enable CONFIG_MEMTEST in kernel for memory validation of this SRAM block ?
  • In reply to Praveen:

    1) no
    2) yes, try CONFIG_MEMTEST
  • In reply to Chris:

    I decided to try "memtester" to test the RAM in user space, instead of rebuilding the kernel.

    It looks like this is basically working, i.e. I can see and access 32+8=40MB of RAM.   We leave the first 2 MB for frame buffer use.  After booting the kernel, 35MB are available to user space apps and memtester reports no issues when telling it to do a test on 35MB.

    We have an application w/ custom screen driver, that accesses the framebuffer memory as well as corresponding config registers directly (we mmap() the physical addresses such as 0x6000000 etc. into our application's address space).  Our screen driver does double buffering.   The first frame buffer at 0x60000000 is the one that's always visible and screen manipulation takes place at the 2nd buffer with memcpy() from 2nd buffer into visible buffer whenever screen updates are done.  When I run memtester while our application is running, the following happens

    1) memtester reports an error pretty early on
    2) an artifact appears on the screen always in the same place
    3) if I use our application (e.g. switch to a different screen in the UI), more artifacts show up on screen and then the system usually crashes

    Question: 

    a) do you see anything suspicious in the output below, including /proc/iomem ?
    b) exactly what's the difference between 0x20000000 and 0x60000000 - just an alias?
    c) is the additional system RAM which we now use at 0x20200000 cached or uncached?

     

     

    boot message:


    Memory: 40072K/40960K available (3145K kernel code, 161K rwdata, 680K rodata, 146K init, 153K bss, 888K reserved)
    Virtual kernel memory layout:
        vector  : 0xffff0000 - 0xffff1000   (   4 kB)
        fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
        vmalloc : 0xd5000000 - 0xff000000   ( 672 MB)
        lowmem  : 0xc0000000 - 0xd4a00000   ( 330 MB)
          .text : 0xbf000000 - 0xbf3bc7cc   (3826 kB)
          .init : 0xc000a000 - 0xc0013000   (  36 kB)
          .data : 0xc0008000 - 0xc0039540   ( 198 kB)
           .bss : 0xc0039540 - 0xc005fa00   ( 154 kB)
       

    $ cat /proc/iomem
    0c000000-0dffffff : System RAM
      0c008000-0c05f9ff : Kernel data
    18000000-1bffffff : physmap-flash.0
      18000000-1bffffff : physmap-flash.0
    20200000-209fffff : System RAM
    3fefb000-3fefb0ff : spibsc.1
    60000000-60176fff : vdc5fb.0: fb
    60800000-608fffff : jcu:iram  (this overlaps w/20200000-209fffff - is that OK? we don't use the jcu) 
    etc...
    etc...
    etc...

     

       
    $ echo 3 > /proc/sys/vm/drop_caches
    $ free
                 total         used         free       shared      buffers
    Mem:         40108         4300        35808            0           20
    -/+ buffers:               4280        35828
    Swap:            0            0            0
    $

    $ ./memtester 35 4
    memtester version 4.3.0 (32-bit)
    Copyright (C) 2001-2012 Charles Cazabon.
    Licensed under the GNU General Public License version 2 (only).

    pagesize is 4096
    pagesizemask is 0xfffff000
    want 35MB (36700160 bytes)
    got  34MB (35721216 bytes), trying mlock ...locked.
    Loop 1/4:
      Stuck Address       : ok
      Random Value        : ok
      Compare XOR         : ok
      Compare SUB         : ok
      Compare MUL         : ok
      Compare DIV         : ok
      Compare OR          : ok
      Compare AND         : ok
      Sequential Increment: ok
      Solid Bits          : ok
      Block Sequential    : ok
      Checkerboard        : ok
      Bit Spread          : ok
      Bit Flip            : ok
      Walking Ones        : ok
      Walking Zeroes      : ok
      8-bit Writes        : ok
      16-bit Writes       : ok
     

    At this point I'm starting our application.


    $ free
                 total         used         free       shared      buffers
    Mem:         40108        12180        27928            0           20
    -/+ buffers:              12160        27948
    Swap:            0            0            0

    $ ./memtester 27 1
    memtester version 4.3.0 (32-bit)
    Copyright (C) 2001-2012 Charles Cazabon.
    Licensed under the GNU General Public License version 2 (only).

    pagesize is 4096
    pagesizemask is 0xfffff000
    want 27MB (28311552 bytes)
    got  27MB (28311552 bytes), trying mlock ...locked.
    Loop 1/1:
      Stuck Address       : testing   0FAILURE: possible bad address line at offset 0x009bd270.
    Skipping to next test...
      Random Value        : FAILURE: 0xf7bef7be != 0x7fbe9e76 at offset 0x009bd270.
    FAILURE: 0xf7bef7be != 0x7bfc1d77 at offset 0x009bd274.
    FAILURE: 0xf7bef7be != 0xc3afe114 at offset 0x009bd278.
    FAILURE: 0xf7bef7be != 0x9dff9204 at offset 0x009bd27c.
    FAILURE: 0xf7bef7be != 0x368dfcb8 at offset 0x009bd280.
    FAILURE: 0xf7bef7be != 0xbfb74e5a at offset 0x009bd284.

     

     

     

    $ ./memtester 27 1
    memtester version 4.3.0 (32-bit)
    Copyright (C) 2001-2012 Charles Cazabon.
    Licensed under the GNU General Public License version 2 (only).

    pagesize is 4096
    pagesizemask is 0xfffff000
    want 27MB (28311552 bytes)
    got  27MB (28311552 bytes), trying mlock ...locked.
    Loop 1/1:
      Stuck Address       : testing   1FAILURE: possible bad address line at offset 0x00e287e8.
    Skipping to next test...
      Random Value        : \BUG: Bad page map in process memtester  pte:f7bef7be pmd:20533831
    addr:b6200000 vm_flags:00102073 anon_vma:c12b4da0 mapping:  (null) index:b6200
    CPU: 0 PID: 8972 Comm: memtester Not tainted 3.14.28-ltsi #100
    [<bf009b00>] (unwind_backtrace) from [<bf0081a8>] (show_stack+0x10/0x14)
    [<bf0081a8>] (show_stack) from [<bf06fc2c>] (print_bad_pte+0x158/0x18c)
    [<bf06fc2c>] (print_bad_pte) from [<bf0728b8>] (handle_mm_fault+0x1a8/0x6ac)
    [<bf0728b8>] (handle_mm_fault) from [<bf00d720>] (do_page_fault+0x10c/0x368)
    [<bf00d720>] (do_page_fault) from [<bf000398>] (do_DataAbort+0x34/0x98)
    [<bf000398>] (do_DataAbort) from [<bf008d34>] (__dabt_usr+0x34/0x40)
    Exception stack(0xc0e8bfb0 to 0xc0e8bff8)
    bfa0:                                     efe60df1 00000001 b5480800 000001af
    bfc0: 00033600 b54807fc b6200000 0035fe01 b6f96e14 d1b71759 000009c4 4f640df1
    bfe0: b6fcd070 bee47b00 b6200004 b6fbb6b8 000f0030 ffffffff
    Disabling lock debugging due to kernel taint
    BUG: Bad page map in process memtester  pte:4a695acb pmd:20533831
    addr:b622a000 vm_flags:00100073 anon_vma:c12b4da0 mapping:  (null) index:b622a
    CPU: 0 PID: 8972 Comm: memtester Tainted: G    B        3.14.28-ltsi #100

  • In reply to Chris Netter:

    Some relevant parts of /proc/x/maps of our ui application:

    b69ec000-b6b63000 rw-s 60000000 00:05 21         /dev/mem  (this corresponds to the 800x480x2 x2 of fb memory that we have mapped)
    b6b63000-b6b64000 rw-s fcff7000 00:05 21         /dev/mem
    b6b64000-b6c20000 rw-s 00000000 00:05 110        /dev/fb0
    b6f7d000-b6f7e000 rw-s f0000000 00:05 21         /dev/mem

  • In reply to Chris Netter:

    > Question:

    > a) do you see anything suspicious in the output below, including /proc/iomem ?

    Part of system RAM (20200000-209fffff) and part of frame buffer (60000000-60176fff) are overlapped.
    Regarding the jcu:iram area, there is no problem if you really don't use JCU.

    > b) exactly what's the difference between 0x20000000 and 0x60000000 - just an alias?

    No. The pages/banks of the internal RAM are sorted (rearranged).
    Refer to the RZ/A1H User's Manual (Hardware).

    > c) is the additional system RAM which we now use at 0x20200000 cached or uncached?

    Strictly speaking, it depends on how you use it.
    As long as you use it without being aware of the physical address, it is cached.

    Best Regards,

  • In reply to Pecteilis:

    Thanks. Could you please be more detailed about "Part of system RAM (20200000-209fffff) and part of frame buffer (60000000-60176fff) are overlapped". I don't see the overlap. The fb starts at offset 0 of the internal ram and is about 1.5MB. System RAM starts at offset 2MB. What am I missing?
  • In reply to Chris Netter:

    Accesses to 0x20000000 are cached. Accesses to 0x60000000 are non-cache.

     

    Have a look at section 53 in the hardware manual. Specifically Table 53.1.

    0x2000000 is not an exact mirror of 0x60000000.

    The actual physical memory banks in the chip are mapped in different orders for 0x20000000 and 0x60000000.

    For example, from the table:

    0x6030_0000 to 0x603F_FFFF Page 1 of bank 1 (mirrored) (1024 KB)

    0x2010_0000 to 0x201F_FFFF Page 1 of bank 1 (1024 KB)

     

    This means if you write to address 0x60300000, the data will also be changed in 0x20100000 because it's the same physical RAM location.

    I believe the banks/pages are re-ordered in order to get the best bus optimization when you are displaying multiple display layers at once. If each layer is on a different memory bus, then you can have simultaneous bus accesses with no wait. Same idea would apply to application data. The LCD constantly reading memory from a RAM bank/bus will have no effect on an application using memory in a different memory bank/bus.

     

  • In reply to Chris:

    Got it. 0x20500000 onwards is Page 0 bank 1 which conflicts with my fb. I think I know what to do now, which is to set up the device tree so that 8MB if system ram that come from SRAM are not using any address that belongs to Page 0 / bank 0 or Page 0 / bank 1, because that's where my fb is located. This seems to translate into 2 chunks of 4MB each.
  • In reply to Chris Netter:

    And I do admit, when you first asked, I completely forgot to mention that quirk about the pages are in a different order between 0x20000000 and 0x60000000.

    You should be able to specify the individual chucks in the device tree in the memory node. You might end up with 4 or 5 memory definitions, but the kernel will be fine with that and after the MMU re-maps everything, no one will really care.
  • In reply to Chris:

    it is mentioned that Accesses to 0x20000000 are cached. Accesses to 0x60000000 are non-cache.
    is this ensured by hardware ?
  • In reply to Praveen:

    Yes, accesses to 0x60000000 are never cached. That is why it is recommend to use those addresses when setting up the LCD controller.
  • In reply to Chris:

    Discussing it a bit more .
    a) if we map 0x20000000 memory block using mmap and /dev/mem interface in user space , accesses to 0x20000000 will be non-cached ?
    b) if we map 0x20000000 memory block using ioremap() in kernel space , accesses to 0x20000000 will be non-cached ?

  • In reply to Praveen:

    On a chip HW level, if the address on the internal bus is 0x20000000, then it is caching is possible (and is enabled in u-boot by default). If the address on the internal bus is 0x60000000, then it will always be non-cached.

    As for trying to disable L1 caching for a specific PAGE when mapping using ioremap, I believe the MMU can be setup to do that, so you if use the correct ioremap_xx function, you might be able to have a non-cached access to the internal memory.
  • In reply to Chris:

    Dear Chris,

    I'am working on renesas RZ/A1H Board with XIP kernel base in that board i have worked with 10MB internal Ram as well as 32MB(SD-RAM)External Ram, now i want to enable both 32MB SD-RAM +8MB internal RAM and i know that 2MB is for frame buffer , i followed the above steps and made change in dts file like this

    memory@0C000000 {
    device_type = "memory";
    reg = <0x0C000000 0x02000000 /* 32MB @ 0x0C000000 */
    0x20000000 0x0A000000>;/* 10Mbyte of Internal RAM only */

    and i have re-builded it and flashed to the board, in u-boot prompt i'am using the command run xsa_boot to boot the board,after booting process is completed when i check the memory it is still showing 32MB of SD-RAM
    $ free
    total used free shared buffers cached
    Mem: 31956 3876 28080 32 0 824
    -/+ buffers/cache: 3052 28904
    Swap: 0 0 0

    is there any thing that i need to change in u-boot file

    /* Default addresses */
    #define DTB_ADDR_FLASH "C0000" /* Location of Device Tree in QSPI Flash (SPI flash offset) */
    #define DTB_ADDR_RAM "20500000" /* Internal RAM location to copy Device Tree */
    #define DTB_ADDR_SDRAM "0D800000" /* External SDRAM location to copy Device Tree */
    /* #define DTB_ADDR_SDRAM "0D800000" */ /* External SDRAM location to copy Device Tree */

    #define MEM_ADDR_RAM "0x20000000 0x00A00000" /* System Memory for when using on-chip RAM (10MB) */
    /* #define MEM_ADDR_SDRAM "0x08000000 0x02000000"*/ /* System Memory for when using external SDRAM RAM (32MB) */
    #define MEM_ADDR_SDRAM "0x0C000000 0x02000000" /* System Memory for when using external SDRAM RAM (32MB) */

    #define KERNEL_ADDR_FLASH "0x18200000" /* Flash location of xipImage or uImage binary */
    #define UIMAGE_ADDR_SDRAM "09000000" /* Address to copy uImage to in external SDRAM */
    #define UIMAGE_ADDR_SIZE "0x400000" /* Size of the uImage binary in Flash (4MB) */


    /* Default kernel command line options */
    setenv("cmdline_common", "console=ttySC3,115200 ignore_loglevel rw root=/dev/mmcblk0p1 earlyprintk earlycon=scif,0xE8008800");
    /*setenv("cmdline_common", "console=ttySC3,115200 console=tty0 ignore_loglevel rw root=/dev/null rootflags=physaddr=0x18800000 earlyprintk earlycon=scif,0xE8008800"); */

    /* Root file system choices */
    setenv("fs_axfs", "rootfstype=axfs rootflags=physaddr=0x18800000");
    setenv("fs_mtd", "root=/dev/mtdblock0");
    setenv("fs_ext3", "root=/dev/mmcblk0p1");

    /* Read DTB from Flash into either internal on-chip RAM or external SDRAM */
    setenv("dtb_read_ram", "sf probe 0; sf read "DTB_ADDR_RAM" "DTB_ADDR_FLASH" 8000; fdt addr "DTB_ADDR_RAM" ; setenv addr_dtb "DTB_ADDR_RAM"");
    setenv("dtb_read_sdram", "sf probe 0; sf read "DTB_ADDR_SDRAM" "DTB_ADDR_FLASH" 8000; fdt addr "DTB_ADDR_SDRAM" ; setenv addr_dtb "DTB_ADDR_SDRAM"");

    /* Set the system memory address and size. This overrides the setting in Device Tree */
    setenv("dtb_mem_ram", "fdt memory "MEM_ADDR_RAM""); /* Use internal RAM for system memory */
    setenv("dtb_mem_sdram", "fdt memory "MEM_ADDR_SDRAM""); /* Use external SDRAM for system memory */

    /* Kernel booting operations */
    setenv("xImg", "qspi single; setenv cmd bootx "KERNEL_ADDR_FLASH" ${addr_dtb}; run cmd"); /* Boot a XIP Kernel */
    setenv("uImg", "qspi dual; cp.b "KERNEL_ADDR_FLASH" "UIMAGE_ADDR_SDRAM" "UIMAGE_ADDR_SIZE"; bootm start "UIMAGE_ADDR_SDRAM" - "DTB_ADDR_SDRAM"; bootm loados ; bootm go"); /* Boot a uImage kernel */

    /* => run xa_boot */
    /* Boot XIP using internal RAM only, file system is AXFS, LCD dynamically allocated */
    setenv("xa_boot", "run dtb_read_ram; run dtb_mem_ram; setenv bootargs ${cmdline_common} ${fs_axfs}; fdt chosen; run xImg");
    setenv("xa1_boot", "run dtb_read_ram; run dtb_mem_ram; setenv bootargs ${cmdline_common} ${fs_ext3}; fdt chosen; run xImg");
    /* => run xsa_boot */
    /* Boot XIP using external 32MB SDRAM, file system is AXFS, LCD FB fixed to internal RAM */
    setenv("xsa_boot", "run dtb_read_sdram; run dtb_mem_sdram; setenv bootargs ${cmdline_common} ${fs_axfs}; fdt chosen; run xImg");

    setenv("bootcmd", "run xa_boot");

    please suggest me what changes i have to make to get 32MB External RAM + 8MB internal Ram.


    Thanks and regards
    Mahesh R