linux kernel out of memory conditions and tuning of disk cache usage

We are using RZ/A1 processor based custom target board . linux kernel version is 4.9.123 ( downloaded from renesas site).

user manual r01uh0403ej0200_rz_a1h pdf is referred .

1)the platform is low memory platform having memory 64MB

2) we are doing around 45MB TCP data transfer from PC to target using netcat utility .

On Target , a process receives data over socket and writes the data to flash disk .

3) At the start of data transfer , we explicitly clear linux kernel cached memory by  calling echo 3 > /proc/sys/vm/drop_caches .

4) during TCP data transfer , we could see free -m showing "free" getting dropped to almost 1MB and most of the memory appearing as "cached" 

# free -m
                 total         used   free     shared   buffers   cached
Mem:          57            56       1       0            2           42
-/+ buffers/cache:        12      45
Swap:          0              0        0

5) sometimes , we observed kernel memory getting exhausted as page allocation failure happens in kernel  and the target reboots after the backtrace is printed below :

# [  775.947949] nc.traditional: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)

[  775.956362] CPU: 0 PID: 1288 Comm: nc.traditional Tainted: G           O    4.9.123-pic6-g31a13de-dirty #19

[  775.966085] Hardware name: Generic R7S72100 (Flattened Device Tree)

[  775.972501] [<c0109829>] (unwind_backtrace) from [<c010796f>] (show_stack+0xb/0xc)

[  775.980118] [<c010796f>] (show_stack) from [<c0151de3>] (warn_alloc+0x89/0xba)

[  775.987361] [<c0151de3>] (warn_alloc) from [<c0152043>] (__alloc_pages_nodemask+0x1eb/0x634)

[  775.995790] [<c0152043>] (__alloc_pages_nodemask) from [<c0152523>] (__alloc_page_frag+0x39/0xde)

[  776.004685] [<c0152523>] (__alloc_page_frag) from [<c03190f1>] (__netdev_alloc_skb+0x51/0xb0)

[  776.013217] [<c03190f1>] (__netdev_alloc_skb) from [<c02c1b6f>] (sh_eth_poll+0xbf/0x3c0)

[  776.021342] [<c02c1b6f>] (sh_eth_poll) from [<c031fd8f>] (net_rx_action+0x77/0x170)

[  776.029051] [<c031fd8f>] (net_rx_action) from [<c011238f>] (__do_softirq+0x107/0x160)

[  776.036896] [<c011238f>] (__do_softirq) from [<c0112589>] (irq_exit+0x5d/0x80)

[  776.044165] [<c0112589>] (irq_exit) from [<c012f4db>] (__handle_domain_irq+0x57/0x8c)

[  776.052007] [<c012f4db>] (__handle_domain_irq) from [<c01012e1>] (gic_handle_irq+0x31/0x48)

[  776.060362] [<c01012e1>] (gic_handle_irq) from [<c0108025>] (__irq_svc+0x65/0xac)

[  776.067835] Exception stack(0xc1cafd70 to 0xc1cafdb8)

[  776.072876] fd60:                                     0002751c c1dec6a0 0000000c 521c3be5

[  776.081042] fd80: 56feb08e f64823a6 ffb35f7b feab513d f9cb0643 0000056c c1caff10 ffffe000

[  776.089204] fda0: b1f49160 c1cafdc4 c180c677 c0234ace 200e0033 ffffffff

[  776.095816] [<c0108025>] (__irq_svc) from [<c0234ace>] (__copy_to_user_std+0x7e/0x430)

[  776.103796] [<c0234ace>] (__copy_to_user_std) from [<c0241715>] (copy_page_to_iter+0x105/0x250)

[  776.112503] [<c0241715>] (copy_page_to_iter) from [<c0319aeb>] (skb_copy_datagram_iter+0xa3/0x108)

[  776.121469] [<c0319aeb>] (skb_copy_datagram_iter) from [<c03443a7>] (tcp_recvmsg+0x3ab/0x5f4)

[  776.130045] [<c03443a7>] (tcp_recvmsg) from [<c035e249>] (inet_recvmsg+0x21/0x2c)

[  776.137576] [<c035e249>] (inet_recvmsg) from [<c031009f>] (sock_read_iter+0x51/0x6e)

[  776.145384] [<c031009f>] (sock_read_iter) from [<c017795d>] (__vfs_read+0x97/0xb0)

[  776.152967] [<c017795d>] (__vfs_read) from [<c01781d9>] (vfs_read+0x51/0xb0)

[  776.159983] [<c01781d9>] (vfs_read) from [<c0178aab>] (SyS_read+0x27/0x52)

[  776.166837] [<c0178aab>] (SyS_read) from [<c0105261>] (ret_fast_syscall+0x1/0x54)

 

6) please help investigate the issue , we have certain questions as below :

a) how the kernel memory got exhausted ? at the time of low memory conditions in kernel , are the kernel page flusher threads , which should have written dirty pages from page cache to flash disk , not executing at right time ? we observed that the page reclaim procedure in kernel triggers at a slow rate . 

it looks to us that somehow the procedure which should have written pages from kernel disk cache to mmc disk does not execute at right time , so in low memory

condition , dirty pages present in kernel disk caches do not get sync to disk at right time and hence kernel goes out of memory when it tried to allocate 

page on behalf of nc process . are there any parameters available within the linux memory subsystem with which the reclaim procedure can be fined tuned ?

 

b) can we reserve some amount of free memory so that linux kernel does not caches it and kernel can use it for its other required page allocation as needed on behalf of nc process above ?

can some tuning be done in linux memory subsystem eg by using /proc/sys/vm/min_free_kbytes  to achieve this objective .

We want to ensure that all available free memory should not end up be used by disk cache mechanism in linux kernel but some amount of memory should be available for page allocations required as well. 

 

c) can we be provided with further clues on how to debug this issue further for out of memory condition in kernel  ?

what other reasons can be possible for this low memory condition in kernel  ?

 

Regards

Amit

  • 4) during TCP data transfer , we could see free -m showing "free" getting dropped to almost 1MB and most of the memory appearing as "cached" 

    # free -m
               total       used    free     shared   buffers   cached
    Mem:          57         56       1       0         2       42
    -/+ buffers/cache:       12      45
    Swap:          0          0       0

     

    Since cache memory is always instantly freed whenever something asks for memory, the numbers on the buffers/cache line are the real numbers you look at when determining how much memory is being used.

     

    Since "sh_eth_poll" is being called (according to your crack log), then you are doing so much traffic that the networking stack has switched to polling instead of pure interrupt. So, you are receiving a lot of data.

    Going by memory (it's been a while) but SKBs (ie, the network buffers that get passed around) need to be allocated from "DMA capable kernel memory" (as in, memory that is not fragmented). There's only so much of that to be had. That's why you still have lots of "system" memory left, but you get this 'out of memory' message.

    Try this and see if it's an easy fix.

    Have the kernel pre-allocate a bigger DMA capable memory pool at boot up since you'll be doing a lot of network traffic.

    I think the default pool size is 256KB. So, try a pool size of 1MB.

    Add this to your kernel command line:   coherent_pool=1M

     

    Side note, this trick also helps when you get a similar message when playing audio (someone last month had this issue with RZ/G board)

  • In reply to Chris:

    Hi Chris

    Thanks for the reply . i will try as you suggested .

    >>Since cache memory is always instantly freed whenever something asks for memory
    that's what my main concern is . why the page reclaim procedure not triggering . we try the same test
    on another TI AM3352 platform and could see the page reclaim happening in free -m output

    >>Going by memory (it's been a while) but SKBs (ie, the network buffers that get passed around) need to be allocated >>from "DMA capable kernel memory"

    i am wondering then whether we have only SKBs allocations happening continuously from
    lowmem dma capable region and these allocations are not freed once data is copied to kernel page cache .
    then it sounds like memory leak right ? or the procedure which frees these allocations is not executing at right time .
    shall i monitor /proc/zoneinfo to verify the same ?

    is their any evidence in meminfo printed below at time of problem :

    [ 776.174308] Mem-Info:
    [ 776.176650] active_anon:2037 inactive_anon:23 isolated_anon:0
    [ 776.176650] active_file:2636 inactive_file:7391 isolated_file:32
    [ 776.176650] unevictable:0 dirty:1366 writeback:1281 unstable:0
    [ 776.176650] slab_reclaimable:719 slab_unreclaimable:724
    [ 776.176650] mapped:1990 shmem:26 pagetables:159 bounce:0
    [ 776.176650] free:373 free_pcp:6 free_cma:0
    [ 776.209062] Node 0 active_anon:8148kB inactive_anon:92kB
    active_file:10544kB inactive_file:29564kB unevictable:0kB
    isolated(anon):0kB isolated(file):128kB mapped:7960kB dirty:5464kB
    writeback:5124kB shmem:104kB writeback_tmp:0kB unstable:0kB
    pages_scanned:0 all_unreclaimable? no [ 776.233602] Normal
    free:1492kB min:964kB low:1204kB high:1444kB active_anon:8148kB
    inactive_anon:92kB active_file:10544kB inactive_file:29564kB
    unevictable:0kB writepending:10588kB present:65536kB
    managed:59304kB mlocked:0kB slab_reclaimable:2876kB
    slab_unreclaimable:2896kB kernel_stack:1152kB pagetables:636kB
    bounce:0kB
    free_pcp:24kB local_pcp:24kB free_cma:0kB [ 776.265406]
    lowmem_reserve[]: 0 0 [ 776.268761] Normal: 7*4kB (H) 5*8kB (H)
    7*16kB (H) 5*32kB (H) 6*64kB (H) 2*128kB (H) 2*256kB (H) 0*512kB
    0*1024kB 0*2048kB 0*4096kB = 1492kB
    10071 total pagecache pages
    [ 776.284124] 0 pages in swap cache
    [ 776.287446] Swap cache stats: add 0, delete 0, find 0/0 [
    776.292645] Free swap = 0kB [ 776.295532] Total swap = 0kB
    [ 776.298421] 16384 pages RAM [ 776.301224] 0 pages
    HighMem/MovableOnly [ 776.305052] 1558 pages reserved

    >>I think the default pool size is 256KB. So, try a pool size of 1MB.
    thanks for the fix . i will give it a try and come back with results .

    Regards
    Amit
  • In reply to AmitNagal:

    1.I think the default pool size is 256KB. So, try a pool size of 1MB.
    We still observe the issue .


    2.As per kernel documentation www.kernel.org/.../vm.txt :
    min_free_kbytes:

    This is used to force the Linux VM to keep a minimum number
    of kilobytes free. The VM uses this number to compute a
    watermark[WMARK_MIN] value for each lowmem zone in the system.
    Each lowmem zone gets a number of reserved free pages based
    proportionally on its size.

    Some minimal amount of memory is needed to satisfy PF_MEMALLOC
    allocations; if you set this to lower than 1024KB, your system will
    become subtly broken, and prone to deadlock under high loads.

    I observe on RZ board :
    # cat /proc/sys/vm/min_free_kbytes
    965

    why it is set less than recommendations ?
    can we be helped with the correct value to be set on RZ platform with 64 MB memory .

    3.Logs :

    # cat /proc/cmdline
    ignore_loglevel earlyprintk earlycon=scif,0xE8008800 coherent_pool=1M rootfstype=ext4 ro rootwait root=/dev/mmcblk0p10 actual_slot=ROOTFS_B


    Latest logs enclosed :
    [ 134.798562] nc.traditional: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
    [ 134.806979] CPU: 0 PID: 1442 Comm: nc.traditional Tainted: G O 4.9.123-pic6-g31a13de-dirty #20
    [ 134.816701] Hardware name: Generic R7S72100 (Flattened Device Tree)
    [ 134.823122] [<c0109829>] (unwind_backtrace) from [<c010796f>] (show_stack+0xb/0xc)
    [ 134.830731] [<c010796f>] (show_stack) from [<c0151de3>] (warn_alloc+0x89/0xba)
    [ 134.837969] [<c0151de3>] (warn_alloc) from [<c0152043>] (__alloc_pages_nodemask+0x1eb/0x634)
    [ 134.846402] [<c0152043>] (__alloc_pages_nodemask) from [<c0152523>] (__alloc_page_frag+0x39/0xde)
    [ 134.855300] [<c0152523>] (__alloc_page_frag) from [<c03190f1>] (__netdev_alloc_skb+0x51/0xb0)
    [ 134.863835] [<c03190f1>] (__netdev_alloc_skb) from [<c02c1b6f>] (sh_eth_poll+0xbf/0x3c0)
    [ 134.871959] [<c02c1b6f>] (sh_eth_poll) from [<c031fd8f>] (net_rx_action+0x77/0x170)
    [ 134.879671] [<c031fd8f>] (net_rx_action) from [<c011238f>] (__do_softirq+0x107/0x160)
    [ 134.887513] [<c011238f>] (__do_softirq) from [<c0112589>] (irq_exit+0x5d/0x80)
    [ 134.894784] [<c0112589>] (irq_exit) from [<c012f4db>] (__handle_domain_irq+0x57/0x8c)
    [ 134.902622] [<c012f4db>] (__handle_domain_irq) from [<c01012e1>] (gic_handle_irq+0x31/0x48)
    [ 134.910978] [<c01012e1>] (gic_handle_irq) from [<c0108025>] (__irq_svc+0x65/0xac)
    [ 134.918453] Exception stack(0xc10b1c18 to 0xc10b1c60)
    [ 134.923489] 1c00: c07289e0 00000000
    [ 134.931657] 1c20: 00004000 00000000 c07289e0 00000000 00000000 034200ca 034200ca c10b1d18
    [ 134.939822] 1c40: c10b0000 00000001 c10b1d30 c10b1c68 c01516d5 c0154560 000e0033 ffffffff
    [ 134.948017] [<c0108025>] (__irq_svc) from [<c0154560>] (node_dirty_ok+0x6/0x9a)
    [ 134.955338] [<c0154560>] (node_dirty_ok) from [<c01516d5>] (get_page_from_freelist+0x5d/0x500)
    [ 134.963938] [<c01516d5>] (get_page_from_freelist) from [<c0151f1f>] (__alloc_pages_nodemask+0xc7/0x634)
    [ 134.973355] [<c0151f1f>] (__alloc_pages_nodemask) from [<c014d407>] (pagecache_get_page+0x143/0x1e0)
    [ 134.982481] [<c014d407>] (pagecache_get_page) from [<c014d4b9>] (grab_cache_page_write_begin+0x15/0x20)
    [ 134.991907] [<c014d4b9>] (grab_cache_page_write_begin) from [<c01c4ed7>] (ext4_da_write_begin+0x7f/0x240)
    [ 135.001464] [<c01c4ed7>] (ext4_da_write_begin) from [<c014d085>] (generic_perform_write+0xbd/0x11e)
    [ 135.010502] [<c014d085>] (generic_perform_write) from [<c014e0c7>] (__generic_file_write_iter+0x101/0x122)
    [ 135.020141] [<c014e0c7>] (__generic_file_write_iter) from [<c01be22b>] (ext4_file_write_iter+0x193/0x2c0)
    [ 135.029738] [<c01be22b>] (ext4_file_write_iter) from [<c0177a0d>] (__vfs_write+0x97/0xb4)
    [ 135.037921] [<c0177a0d>] (__vfs_write) from [<c01782a7>] (vfs_write+0x6f/0xe8)
    [ 135.045148] [<c01782a7>] (vfs_write) from [<c0178afd>] (SyS_write+0x27/0x52)
    [ 135.052181] [<c0178afd>] (SyS_write) from [<c0105261>] (ret_fast_syscall+0x1/0x54)
    [ 135.059744] Mem-Info:
    [ 135.062086] active_anon:3744 inactive_anon:25 isolated_anon:0
    [ 135.062086] active_file:2582 inactive_file:2626 isolated_file:32
    [ 135.062086] unevictable:0 dirty:636 writeback:1792 unstable:0
    [ 135.062086] slab_reclaimable:363 slab_unreclaimable:2329
    [ 135.062086] mapped:2508 shmem:25 pagetables:253 bounce:0
    [ 135.062086] free:339 free_pcp:2 free_cma:0
    [ 135.094503] Node 0 active_anon:14976kB inactive_anon:100kB active_file:10328kB inactive_file:10504kB unevictable:0kB isolated(anon):0kB isolated(file):128kB mapped:10032kB dirty:2544kB writeback:7168kB shmem:100kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
    [ 135.119299] Normal free:1356kB min:964kB low:1204kB high:1444kB active_anon:14976kB inactive_anon:100kB active_file:10328kB inactive_file:10504kB unevictable:0kB writepending:9712kB present:65536kB managed:59304kB mlocked:0kB slab_reclaimable:1452kB slab_unreclaimable:9316kB kernel_stack:1424kB pagetables:1012kB bounce:0kB free_pcp:8kB local_pcp:8kB free_cma:0kB
    [ 135.151104] lowmem_reserve[]: 0 0
    [ 135.154461] Normal: 3*4kB (H) 2*8kB (H) 5*16kB (H) 15*32kB (H) 12*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1356kB
    5268 total pagecache pages
    [ 135.169202] 0 pages in swap cache
    [ 135.172525] Swap cache stats: add 0, delete 0, find 0/0
    [ 135.177726] Free swap = 0kB
    [ 135.180614] Total swap = 0kB
    [ 135.183502] 16384 pages RAM
    [ 135.186303] 0 pages HighMem/MovableOnly
    [ 135.190132] 1558 pages reserved
  • This thread will be archived due to inactivity.

    Mike Clements
    ReneasRulz Moderator