We are using:
We would like to make the unused portion of the on-chip RAM available to malloc, basically to complement the external SDRAM. Preferably in a manner that is completely transparent to any user space code that uses new or malloc.
So if I'm using 32MB external SDRAM, and I'm using 2MB out of 10MB for framebuffer, I'd like to end up with 32+(10-2)=40MB of RAM that is available to and managed by the Linux kernel and thus available for malloc etc.
Is this at all possible in Linux for RZ? If so, is it possible in the 3.12 kernel and what exactly would need to go into the device tree and other files?
Now in absence of making this 100% transparent to Linux user space, I guess the next best thing would be to replace malloc w/ a custom implementation that is aware of the SRAM (and has mapped it into the user space address space via mmap). I think I understand how to override the default malloc so that the application is not even aware that a custom one is used, and I understand conceptually what this custom malloc would have to do, but struggling to find a library to provides a heap allocation implementation that is as good as malloc, in terms of ability to deal with fragmentation. So any pointes to a good heap allocator library would also be appreciated.
In reply to Chris:
In reply to Praveen:
I decided to try "memtester" to test the RAM in user space, instead of rebuilding the kernel.
It looks like this is basically working, i.e. I can see and access 32+8=40MB of RAM. We leave the first 2 MB for frame buffer use. After booting the kernel, 35MB are available to user space apps and memtester reports no issues when telling it to do a test on 35MB.
We have an application w/ custom screen driver, that accesses the framebuffer memory as well as corresponding config registers directly (we mmap() the physical addresses such as 0x6000000 etc. into our application's address space). Our screen driver does double buffering. The first frame buffer at 0x60000000 is the one that's always visible and screen manipulation takes place at the 2nd buffer with memcpy() from 2nd buffer into visible buffer whenever screen updates are done. When I run memtester while our application is running, the following happens
1) memtester reports an error pretty early on2) an artifact appears on the screen always in the same place3) if I use our application (e.g. switch to a different screen in the UI), more artifacts show up on screen and then the system usually crashes
a) do you see anything suspicious in the output below, including /proc/iomem ?b) exactly what's the difference between 0x20000000 and 0x60000000 - just an alias?c) is the additional system RAM which we now use at 0x20200000 cached or uncached?
Memory: 40072K/40960K available (3145K kernel code, 161K rwdata, 680K rodata, 146K init, 153K bss, 888K reserved)Virtual kernel memory layout: vector : 0xffff0000 - 0xffff1000 ( 4 kB) fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB) vmalloc : 0xd5000000 - 0xff000000 ( 672 MB) lowmem : 0xc0000000 - 0xd4a00000 ( 330 MB) .text : 0xbf000000 - 0xbf3bc7cc (3826 kB) .init : 0xc000a000 - 0xc0013000 ( 36 kB) .data : 0xc0008000 - 0xc0039540 ( 198 kB) .bss : 0xc0039540 - 0xc005fa00 ( 154 kB)
$ cat /proc/iomem0c000000-0dffffff : System RAM 0c008000-0c05f9ff : Kernel data18000000-1bffffff : physmap-flash.0 18000000-1bffffff : physmap-flash.020200000-209fffff : System RAM3fefb000-3fefb0ff : spibsc.160000000-60176fff : vdc5fb.0: fb60800000-608fffff : jcu:iram (this overlaps w/20200000-209fffff - is that OK? we don't use the jcu) etc...etc...etc...
$ echo 3 > /proc/sys/vm/drop_caches$ free total used free shared buffersMem: 40108 4300 35808 0 20-/+ buffers: 4280 35828Swap: 0 0 0$
$ ./memtester 35 4memtester version 4.3.0 (32-bit)Copyright (C) 2001-2012 Charles Cazabon.Licensed under the GNU General Public License version 2 (only).
pagesize is 4096pagesizemask is 0xfffff000want 35MB (36700160 bytes)got 34MB (35721216 bytes), trying mlock ...locked.Loop 1/4: Stuck Address : ok Random Value : ok Compare XOR : ok Compare SUB : ok Compare MUL : ok Compare DIV : ok Compare OR : ok Compare AND : ok Sequential Increment: ok Solid Bits : ok Block Sequential : ok Checkerboard : ok Bit Spread : ok Bit Flip : ok Walking Ones : ok Walking Zeroes : ok 8-bit Writes : ok 16-bit Writes : ok
At this point I'm starting our application.
$ free total used free shared buffersMem: 40108 12180 27928 0 20-/+ buffers: 12160 27948Swap: 0 0 0
$ ./memtester 27 1memtester version 4.3.0 (32-bit)Copyright (C) 2001-2012 Charles Cazabon.Licensed under the GNU General Public License version 2 (only).
pagesize is 4096pagesizemask is 0xfffff000want 27MB (28311552 bytes)got 27MB (28311552 bytes), trying mlock ...locked.Loop 1/1: Stuck Address : testing 0FAILURE: possible bad address line at offset 0x009bd270.Skipping to next test... Random Value : FAILURE: 0xf7bef7be != 0x7fbe9e76 at offset 0x009bd270.FAILURE: 0xf7bef7be != 0x7bfc1d77 at offset 0x009bd274.FAILURE: 0xf7bef7be != 0xc3afe114 at offset 0x009bd278.FAILURE: 0xf7bef7be != 0x9dff9204 at offset 0x009bd27c.FAILURE: 0xf7bef7be != 0x368dfcb8 at offset 0x009bd280.FAILURE: 0xf7bef7be != 0xbfb74e5a at offset 0x009bd284.
pagesize is 4096pagesizemask is 0xfffff000want 27MB (28311552 bytes)got 27MB (28311552 bytes), trying mlock ...locked.Loop 1/1: Stuck Address : testing 1FAILURE: possible bad address line at offset 0x00e287e8.Skipping to next test... Random Value : \BUG: Bad page map in process memtester pte:f7bef7be pmd:20533831addr:b6200000 vm_flags:00102073 anon_vma:c12b4da0 mapping: (null) index:b6200CPU: 0 PID: 8972 Comm: memtester Not tainted 3.14.28-ltsi #100[<bf009b00>] (unwind_backtrace) from [<bf0081a8>] (show_stack+0x10/0x14)[<bf0081a8>] (show_stack) from [<bf06fc2c>] (print_bad_pte+0x158/0x18c)[<bf06fc2c>] (print_bad_pte) from [<bf0728b8>] (handle_mm_fault+0x1a8/0x6ac)[<bf0728b8>] (handle_mm_fault) from [<bf00d720>] (do_page_fault+0x10c/0x368)[<bf00d720>] (do_page_fault) from [<bf000398>] (do_DataAbort+0x34/0x98)[<bf000398>] (do_DataAbort) from [<bf008d34>] (__dabt_usr+0x34/0x40)Exception stack(0xc0e8bfb0 to 0xc0e8bff8)bfa0: efe60df1 00000001 b5480800 000001afbfc0: 00033600 b54807fc b6200000 0035fe01 b6f96e14 d1b71759 000009c4 4f640df1bfe0: b6fcd070 bee47b00 b6200004 b6fbb6b8 000f0030 ffffffffDisabling lock debugging due to kernel taintBUG: Bad page map in process memtester pte:4a695acb pmd:20533831addr:b622a000 vm_flags:00100073 anon_vma:c12b4da0 mapping: (null) index:b622aCPU: 0 PID: 8972 Comm: memtester Tainted: G B 3.14.28-ltsi #100
In reply to Chris Netter:
Some relevant parts of /proc/x/maps of our ui application:
b69ec000-b6b63000 rw-s 60000000 00:05 21 /dev/mem (this corresponds to the 800x480x2 x2 of fb memory that we have mapped)b6b63000-b6b64000 rw-s fcff7000 00:05 21 /dev/memb6b64000-b6c20000 rw-s 00000000 00:05 110 /dev/fb0b6f7d000-b6f7e000 rw-s f0000000 00:05 21 /dev/mem
> Question: > a) do you see anything suspicious in the output below, including /proc/iomem ?Part of system RAM (20200000-209fffff) and part of frame buffer (60000000-60176fff) are overlapped.Regarding the jcu:iram area, there is no problem if you really don't use JCU.> b) exactly what's the difference between 0x20000000 and 0x60000000 - just an alias?No. The pages/banks of the internal RAM are sorted (rearranged).Refer to the RZ/A1H User's Manual (Hardware).> c) is the additional system RAM which we now use at 0x20200000 cached or uncached?Strictly speaking, it depends on how you use it.As long as you use it without being aware of the physical address, it is cached.
In reply to Pecteilis:
Accesses to 0x20000000 are cached. Accesses to 0x60000000 are non-cache.
Have a look at section 53 in the hardware manual. Specifically Table 53.1.
0x2000000 is not an exact mirror of 0x60000000.
The actual physical memory banks in the chip are mapped in different orders for 0x20000000 and 0x60000000.
For example, from the table:
0x6030_0000 to 0x603F_FFFF Page 1 of bank 1 (mirrored) (1024 KB)
0x2010_0000 to 0x201F_FFFF Page 1 of bank 1 (1024 KB)
This means if you write to address 0x60300000, the data will also be changed in 0x20100000 because it's the same physical RAM location.
I believe the banks/pages are re-ordered in order to get the best bus optimization when you are displaying multiple display layers at once. If each layer is on a different memory bus, then you can have simultaneous bus accesses with no wait. Same idea would apply to application data. The LCD constantly reading memory from a RAM bank/bus will have no effect on an application using memory in a different memory bank/bus.
Discussing it a bit more . a) if we map 0x20000000 memory block using mmap and /dev/mem interface in user space , accesses to 0x20000000 will be non-cached ? b) if we map 0x20000000 memory block using ioremap() in kernel space , accesses to 0x20000000 will be non-cached ?