global interrupts disabled

Hi

We are using RZ/A1 processor based custom target board . linux kernel version is 4.9.123 ( downloaded from renesas site).

user manual r01uh0403ej0200_rz_a1h pdf is referred .

we have logged interrupts events in Linux kernel .

it seems sometimes the global interrupts are disabled for almost a period of 4.5 ms . i:e no interrupts are reported on arm core for 4.5 ms .

this is leading to delay in interrupt delivery crucial to our system .

do we have any previous known issues where such disabling of interrupts happens for such time ?

( I also observed these messages on our target "hrtimer: interrupt took 98200 ns" ).

also can we be suggested with ways how to debug this problem further .

thanks

amit

 

 

 

 

 

 

  • > do we have any previous known issues where such disabling of interrupts happens for such time ?

    Since Linux is not a real-time OS, it is generally understood that there will be some delays.

    There are the PREEMPT_RT patches for the kernel that basically do not allow interrupts to be disable anymore. So for systems that need deterministic response times, people use the PREEMPT_RT patches.

    Interrupts might be turned off by either some driver, or by an interrupt subroutine.
    If you want to see if an interrupt sub-routine is causing your issue, you could monitor the function "asm_do_IRQ" in file arch/arm/kernel/irq.c. This is the function that is called when a HW interrupt occurs.

    For past Linux projects (SH4A, not ARM), I have added code to do things like toggle a GPIO or read a HW timer registers at the beginning and end of the handler function to determine what was the longest time any driver spent in an interrupt sub-routine.

    However, if all the interrupt routine are short, then maybe the issue is in some driver calling local_irq_save() (not in an interrupt routine) and keeping interrupts off for a long amount of time. In that case, you would just modify the local_irq_save and local_irq_restore functions to keep track of time the same way you would for the interrupt sub-routines (GPIO pin or HW timer register).
  • In reply to Chris:

    thanks for the reply . i am going to proceed as you suggested .

    >>to determine what was the longest time any driver spent in an interrupt sub-routine.
    __handle_domain_irq seems to me the generic irq layer function inside linux kernel to start profiling timing for
    every interrupt.

    can i use ktime apis inside __handle_domain_irq() for profiling purpose (ktime_to_ns(ktime_get()) ?
    can i use ktime api() even when timer isr() occurs ?

    if not , which hardware timers to use ? MTU2 or OSTM ?
    can you please suggest me any existing linux driver api's which i can use to read current time using above timers ?

    Regards
    Amit
  • In reply to AmitNagal:

    can i use ktime apis inside __handle_domain_irq() for profiling purpose (ktime_to_ns(ktime_get()) ?
    can i use ktime api() even when timer isr() occurs ?

    Sorry, I do no know. I usually use a HW timer (read the register before an after)

     

    if not , which hardware timers to use ? MTU2 or OSTM ?

    OSTM is easy.

    OSTM ch0 is used for the system timer, so it is always running. in free-run mode (overflow). So you can just read the register before and after your interrupts.

     

    can you please suggest me any existing linux driver api's which i can use to read current time using above timers ?

    You have to use ioreamp function to covert physical address to virtual (MMU) address.

    Here are some examples from an old BSP.

    Search for ioremap inside the board-rskrza1.c file

    https://github.com/renesas-rz/rza_linux-3.14/blob/master/arch/arm/mach-shmobile/board-rskrza1.c#L2113

    https://github.com/renesas-rz/rza_linux-3.14/blob/master/arch/arm/mach-shmobile/board-rskrza1.c#L2273

     

     

     

  • In reply to Chris:

    Hi Chris

    thanks for all the inputs .

    As suggested , i am able to read OSTM ch0 counter register(OSTMnCNT) using ioremap_nocache .

    referring to github.com/.../ostm.txt
    this counter will increment every 30ns (33MHZ ) right ?

    ostm.txt also refers to kernel configs CONFIG_HIGH_RES_TIMERS , CONFIG_NO_HZ_IDLE
    In the kernel config , CONFIG_HIGH_RES_TIMERS is enabled but CONFIG_NO_HZ_IDLE is disabled .
    is CONFIG_NO_HZ_IDLE mandatory or optional ?

    Regards
    Amit
  • In reply to AmitNagal:

    Hi Chris

    i used OSTM ch0 system timer and captured interrupt timing traces with it .
    case 1. interrupt isr takes long time to execute :
    i found that sometimes Timer ( interrupt ID 135 ) and Mmc ( Interrupt ID 301) takes long time to execute .

    A) TImer :
    Time consumed : 143839 OSTM ch0 timer ticks = 143839 * 30 = 4315170 ns
    Function : ostm_timer_interrupt

    So here we see the actual timer ISR consumes time close to 4.3 ms

    B) MMC :
    Time consumed :115437 OSTM ch0 timer ticks = 115437* 30 = 3463110 ns
    Function : add_interrupt_randomness()

    here the actual mmc isr does not takes much time , but rather the time close to 3.5 ms is consumed
    by a generic irq layer function add_interrupt_randomness() which is called by handle_irq_event_percpu() .

    Are there any patches available which have already handled latency associated with above 2 scenarios .
    if not , can you suggest ways to investigate the issues further ?

    Case 2 : kernel has disabled interrupts for long time while in critical section ( local_irq_save / local_irq_disable)
    i have not checked it at present . i will check and inform further .




    Regards
    Amit

  • In reply to AmitNagal:

    > Are there any patches available which have already handled latency associated with above 2 scenarios .

    I'm sorry, in general I don't have any patches that will fix this issue. Actually, this is the first time there has been a complain/request.

    The timer ISR case is very interesting because. That is OSTM-1. OSTM-0 is sued as the system time (tick timer).
    OSTM-1 is only really used when an application sets an 'event' timer using something like nanosleep() which provides very accurate sleep times. A total time of 4.3ms would make you think that the application process is getting to run in that Time ISR....but that could not be correct.
    I might go have a look at the kernel code to see if it is doing more than just signalling to wake up a user process.
  • In reply to Chris:

    Hi Chris

    Thanks for the reply .
    >>A total time of 4.3ms would make you think that the application process is getting to run in that Time ISR....but that >>could not be correct.

    You mean in timer isr the application process will run after expiration of sleep time ?

    >>I might go have a look at the kernel code to see if it is doing more than just signalling to wake up a user process.
    thanks . please let us know about the findings .

    i will check vanilla kernel on kernel.org also for any improvements in add_interrupt_randomness() and inform suitably.

    Regards
    Amit
  • In reply to AmitNagal:

    Amit,

    Have you found the source of your interrupt issues?

    Mike Clements
    RenesasRulz Moderator
  • In reply to Mike Clements:

    Hi Mike

    the investigation why timer isr ostm_timer_interrupt sometimes takes close to 4.3 ms to execute is pending .

    regards
    amit