19
\$\begingroup\$

Why should interrupts be short?

I'm currently using STM32 and it has a feature of prioritizing interrupts.

Because of that, I don't see any difference between these two options:

  • the main loop and interrupts
  • one big interrupt with a very low priority and the other interrupts

Why should we keep them short? Is it because of caching? Small stack? Or something else? Why can't we just create the whole system on interrupts and make the main loop sleep?

Any ideas?

\$\endgroup\$
3
  • 3
    \$\begingroup\$ critical IRQ's are use to sync state machine activities, not do async stuff. Keep track of stack height. If work exceeds uC processing power avail, it gets out of sync from overflow. \$\endgroup\$ Commented Nov 26, 2019 at 14:42
  • 7
    \$\begingroup\$ An interrupt can not be short; it is a signal at one point in time. Don̈́'t you mean "interrupt handling time" (time), "interrupt service time" (time), "interrupt service routines (ISR)" (lines of code, presumably executing fast), "ISRs" (lines of code, presumably executing fast), or similar? Can you fix it (by editing your question)? \$\endgroup\$ Commented Nov 27, 2019 at 0:37
  • 4
    \$\begingroup\$ @PeterMortensen: "interrupt" is fairly valid shorthand for interrupt handling, at least among people who understand what it all means. Your comment is a good summary. Or you can think about the interrupt in terms of the interruption to the main code that will eventually be returned to after is ISR. (Unless we decide to context-switch and only return to that point in user-space much later, then ISR time vs. delay to that task aren't close.) The process of delivering an interrupt takes some cycles, too; you could just call that unavoidable overhead for a handler. \$\endgroup\$ Commented Nov 28, 2019 at 3:40

11 Answers 11

34
\$\begingroup\$

Not only is there no reason you can't do that. An event driven system is not uncommon. BUT. On many architectures if you are in one interrupt you cannot have another, so even a low-priority interrupt can in some systems prevent a higher priority interrupt from happening, so the general rule is to get in and out fast. Often a design approach is the interrupt comes in and does some handling, but leaves a task for the kernel or foreground task to finish, more seen with operating systems.

I believe the Cortex-M can nest interrupts, so if one comes in on top of another it gets handled. Most of the chip is not related to ARM, so there isn't any reason why the chip vendor can't put an interrupt handler in front of the Cortex-M and/or a priority handler, in addition to whatever is in the arm end of things.

Nevertheless, an interrupt-driven design needs to be designed. As with any other design, you need to do your system engineering. You need to know every possible interrupt (under normal situations, undefined instruction, etc are fatal) and how often they occur. Many will be regular periods.

You also need to know how long each interrupt takes, and schedule this such that you can ensure that all interrupts will be handled in an amount of time that meets your specification. For example, a timer-based interrupt event for something takes 50 ms to handle normally, but there is one non-periodic random interrupt that takes 20 ms. Is 70 ms tolerable for either of these? Does the chip/core have a priority solution? If not, have you laid out your interrupt priorities such that you can meet timing on everything?

At the end of the day, there is little difference between all interrupts and some interrupts and some foreground with respect to the design and timing of the interrupts; you have the same problem.

So the general rule is make interrupts lean and mean and fast, but the reality is do your system engineering and know how much time you have for each or groups of them combined. It must take less than x amount of time.

\$\endgroup\$
1
  • 6
    \$\begingroup\$ Linux calls that first-paragraph design "top half / bottom half" interrupt handling. The "top half" is the true ISR which might not even talk to the device hardware, but just queue work for a bottom half that gets scheduled like any other task. \$\endgroup\$ Commented Nov 28, 2019 at 3:43
18
\$\begingroup\$

Re-entrancy is/becomes a major headache the more your ISRs are complex. The wikipedia entry has a pretty succinct description of a re-entrant ISR:

A reentrant interrupt handler is an interrupt handler that re-enables interrupts early in the interrupt handler. This may reduce interrupt latency.[6] In general, while programming interrupt service routines, it is recommended to re-enable interrupts as soon as possible in the interrupt handler. This practice helps to avoid losing interrupts.[7]

Assuming you're not writing a general purpose OS where re-entrancy might very well be unavoidable, you have to keep in mind the added complexity of having re-entrancy in your custom bespoke controller code might just not be worth the perceived ease of writing lazy long runninng ISRs.

Here's an example:

  • long running, low priority ISR starts doing something, calls a version of malloc that you have implemented.
  • Mid malloc call, you get interrupted by a higher priority thread: it calls malloc too
  • scenario a: high priority ISR exits malloc before low priority
  • scenario b: high priority ISR exits malloc after low priority *

Ok, you'll say, I'll put a spin lock on malloc, then all you need to do is repeat the above condition to the spinlock_aquire primitive you've created: is it re-entrant?

I should point out here that just slapping locks on things and calling it a day is a recipe for priority inversion based deadlocks. It's not that simple.

The poor man's solution to all these problems is to keep your ISRs short. For instance, in the NT kernel, a long time ago (haven't kept up to date), any IRQL above the bottom two most weren't allowed to even look at paged memory. This was because paging was handled by an interrupt...

So choice becomes: implement a basic queuing mechanism and have your ISRs dispatch work units, or make your ISRs behave freely but ensure you have an extremely robust environment that will not choke on weird issues (e.g. priority inversion).

If your task at hand is super simple, like turning on a light in your arduino controlled drone, then by all means go ahead. But if you want to avoid engineering headaches later on as your code gets more complex, you really should avoid the perceived benefit of giving yourself no constraints at all with an ISR.


* clarification: scenario b cannot occur at face value since the higher priority ISR will always execute and complete before the lower priority one. However, the code-path taken by either one can be swapped. So in the malloc routine itself, if there is any access to a global data structure, that code can be interrupted in any possible combination of ways.

Further to this point, it should be stated that the re-entrancy requirement for any function is for its entire call tree. This becomes very difficult to ensure if you end up using third party libraries.

Also note: to address a comment from @user253751, re-entrancy isn't solved by merely wrapping things in a lock. Rather, re-entrancy has a set of requirements that are well understood. Here are some relatively trivial code examples that illustrate the matter.

It can be seen, by looking at this that writing a re-entant malloc or resource acquisition function becomes very difficult.

\$\endgroup\$
10
  • \$\begingroup\$ But reentrant ISRs are not common - you normally have to explicitly enable interrupts in ISRs. \$\endgroup\$ Commented Nov 27, 2019 at 0:51
  • 4
    \$\begingroup\$ @PeterMortensen The OP's "do everything in an interrupt and sleep in the main loop" model requires that you do though \$\endgroup\$
    – DKNguyen
    Commented Nov 27, 2019 at 1:29
  • \$\begingroup\$ A lock won't help because the original thread can't unlock the lock until the high priority interrupt is done, which it won't be until it gets the lock. \$\endgroup\$ Commented Nov 27, 2019 at 10:02
  • \$\begingroup\$ @user253751 That's why I said it's a recipe for priority inversion based deadlocks. Re-entrant locks do exist, by the way. (They're useful for cooperative multi-tasking via fibers or asynchronous code like futures etc) \$\endgroup\$
    – MB.
    Commented Nov 27, 2019 at 14:11
  • 1
    \$\begingroup\$ On a single-core machine, you'd normally just disable interrupts instead of taking an actually lock, to make a small critical section atomic. Because it's not just priority inversion, it's more like a parent function that can't run (and unlock) until you're done. So preventing other ISRs from running at inopportune times works better, as long as you don't leave interrupts disabled for too long. \$\endgroup\$ Commented Nov 28, 2019 at 3:48
10
\$\begingroup\$

For a general purpose computer, keeping the interrupt handler short permits normal processing to be reasonably deterministic which may or may not be an issue depending on application.

In a hard real time embedded process (where determinism is of critical importance) this makes a lot of sense.

I implemented a very precise tilt sensor (2 axis) where the sensor drive was controlled by timers and the main loop was in sleep mode until the counter fired the interrupt to update which axis was being driven; the external ADC I was using was also controlled by a (different) timer to minimise acquisition jitter for the averaged samples (which can cause havoc in a sampled data system).

To maintain the ability to do some calculations in a very narrow time window, the interrupt handlers needed to be fast (take the interrupt, do the minimum required and exit).

In other applications, all the interesting stuff can happen in the interrupt handlers (as was the case when I was involved with a video on demand system with up to 2000 simultaneous streams available).

There is no 'one size fits all' answer for interrupt handlers and is application dependent, although as noted nested interrupts can get messy particularly if two handlers need to share a single resource (which can lead to data corruption if not dealt with properly and in a multitasking system can also lead to priority inversion).

What approach one takes to interrupt handlers is application dependent, but a first approach of 'keep them simple and fast' is always a good starting point.

\$\endgroup\$
7
\$\begingroup\$

I've dealt with this sort of thing a lot on a previous project, and here are a few things I've dealt with (often learned the hard way). Some of the details can vary based on your chip architecture and how close you're running to bare metal, but the general ideas still apply.

  • In the time period before you clear/re-enable the interrupt, some other, unrelated interrupt could fire. You could be doing the wrong thing servicing a low-priority interrupt when a critical-priority interrupt is pending. Some interrupt handlers can actually have lower priority than non-interrupt threads, which is a relationship you can't implement if you do all your work inside the handler.
    • This becomes even more of a problem if you share an interrupt between multiple sources. A second interrupt could be completely masked if you haven't finished processing the first one yet. Some hardware architectures are worse about this than others.
  • When interrupts come from external devices, those devices may have expectations about how long it takes you to respond to their interrupt. I've had complex ISRs that took too long to run and the external device assumed I was dead and stopped talking to me.
  • On some architectures, ISRs are run on top of whatever stack the currently-executing program is using (to avoid allocating memory for a separate stack, which could block, fail, or trigger interrupts). You don't know how much stack space you're going to have available, so you really can't afford to do anything beyond clearing the interrupt state and signalling something else to do the work.
  • Some architectures run ISRs at a higher privilege level than normal threads. They could have access to low-level hardware resources that are normally protected. This implies that there can be security implications to doing a lot of work in an ISR.
  • ISRs have the highest priority in your system. Calling a function that blocks or relies internally on interrupts can lead to deadlocks.
  • Some architectures use interrupts internally, for things like triggering cache flushes between cores, or notifying the ALU that a math co-processor has finished computing something. The servicing of these interrupts can be blocked while you're still inside your ISR, so spending too long here can have non-obvious performance impacts.
    • Another internal use for interrupts could be communicating with debug engines (JTAG, etc). I've worked with chips where debug probes weren't very effective while in an ISR but worked great otherwise. Debugging ISR code was an order of magnitude harder than debugging normal code.
  • Some multi-core CPUs will always service interrupts on a specific core. Doing all of your work inside the ISR can force your code to run effectively single-threaded.

Thankfully, all of these problems have the same simple solution. Your interrupt handler should store the necessary information about the interrupt, clear and re-enable the interrupt, then signal some other piece of code to do the actual work. That's easy enough to do on most platforms that there's really not much benefit in trying to cram all your work into the ISR itself.

\$\endgroup\$
8
  • \$\begingroup\$ "Some interrupt handlers can actually have lower priority than non-interrupt threads, which is a relationship you can't implement if you do all your work inside the handler." Maybe I am misunderstanding this, but then how does this interrupt ever trigger? Maybe I just don't understand what a thread is. \$\endgroup\$
    – DKNguyen
    Commented Nov 26, 2019 at 23:55
  • \$\begingroup\$ @DKNguyen- Interrupts themselves always have priority over normal threads, so the thread is put on hold and the ISR runs. The work you want to do in response to such an interrupt isn't as important, though, so you want to schedule it to run later and quickly get back to the important task you were doing. \$\endgroup\$
    – bta
    Commented Nov 26, 2019 at 23:57
  • \$\begingroup\$ I think you misunderstood by poorly worded question. I guess in my mind I am thinking of a thread as the main loop which it probably isn't, because if something had lower priority than the main loop it would just never run. \$\endgroup\$
    – DKNguyen
    Commented Nov 26, 2019 at 23:59
  • 2
    \$\begingroup\$ There are two notions of priority here: technical priority (which thread the CPU chooses to run), and real-world priority (blinking an LED isn't a big deal, but your nuclear reactor control loop is). When you do your work in the ISR, you can get into cases where these two notions of priority get out of sync and your reactor control thread is blocked waiting on your ISR to finish blinking an LED. \$\endgroup\$
    – bta
    Commented Nov 27, 2019 at 0:14
  • 2
    \$\begingroup\$ oh, I see what you mean: "Some interrupt handlers can actually have lower importance/urgency than non-interrupt threads, which is a relationship you can't implement if you do all your work inside the handler." \$\endgroup\$
    – DKNguyen
    Commented Nov 27, 2019 at 1:26
6
\$\begingroup\$

Because interrupts prevent other interrupts from running. Keep them short basically just means do no more than you need to in the interrupt.

one big interrupt with very low priority and the other interrupt

This requires nested interrupts (to allow interrupts to interrupt each other) which can get messy. Not all processes can be safely interrupted.

\$\endgroup\$
4
  • \$\begingroup\$ To expand on DKNguyen's point: nested interrupts get messy because you need so-called re-entrancy. For synchronization primitives and also resource allocation primitives, this can be a big complexity addition. For instance, Windows NT 4 (at the time) was re-entrant while I believe Linux at the time was not. Searching for the words, you'll see the amount of literature dedicated to it. It's not a trivial matter. In any case, when you are writing a controller from scratch, that additional complexicty might not be desirable. \$\endgroup\$
    – MB.
    Commented Nov 26, 2019 at 23:58
  • \$\begingroup\$ @user247243 It makes my head spin just thinking about trying to write a re-entrant function, let alone a interrupt handler. \$\endgroup\$
    – DKNguyen
    Commented Nov 27, 2019 at 0:01
  • \$\begingroup\$ Yeah, it is a real pita. I should point out for clarity sake that when I say "NT was fully re-entrant", I mean this as a "your code had to be re-entrant and so much as smelling slightly non-re-entrant would trigger a bugcheck (BSOD)". It's just a generally higher burden. \$\endgroup\$
    – MB.
    Commented Nov 27, 2019 at 0:21
  • 1
    \$\begingroup\$ This. Sometimes you need to deal with the hardware quickly otherwise data will be lost, which can occur if your handler can't run because of another long-running handler. \$\endgroup\$
    – Artelius
    Commented Nov 27, 2019 at 7:34
5
\$\begingroup\$

I can be a conscious design decision to move your workload into interrupts alone.
But a lot of care and safeguards must be taken to ensure deadlines and stack will not be exceeded.

An NVIC is basically a very crude scheduler. The tasks will run as priority set, when triggered by a timer, software or other source.
In an ARM Cortex M3 you have fine control over priorities and nesting. In an ATMega or 8051 you do not.

If you look at what you are doing thread-wise with this model, you'll have:

  • Main thread via the reset handler. (idle)
    • Interrupt TIMER_ISR_A with a regular thread.
      • Interrupt TIMER_ISR_B with a irregular thread.
        • Interrupt UART_ISR with hard realtime buffer thread.

You're already 3 levels deep, this means at least 3 stackframes!
With the priority grouping you can limit nesting to a few levels. But this might costs you deadlines.

With an preemptive RTOS you'll have finer control and not as much nested interrupts and less risk of blowing the stack. And often included performance metrics.

And obviously all the side-effects of asynchronous shared data and re-entrancy. Since in an ISR you cannot wait for some resource to become available. You'll block all equal and lower priority threads from running at all. Unless only higher priority threads use it.
Lot's of implicit constraints!

If you can fit your software design to operate within the constraints of this model, often combined with the superloop model for tasks that do not carry a hard deadline, then yes, it is a valid design.
If properly documented for your team to understand as well.

\$\endgroup\$
5
\$\begingroup\$

For real-time applications, predictable timing is often more important than performance. External interrupts will have unpredictable timing, so the longer they get, and more impact they will have on the overall timing of your system. Note that I'm talking about high-priority interrupts here: having a long low-priority interrupt is not a problem, as long as it doesn't affect the time-critical stuff (but mind the re-entrancy problem mentioned in another answer).

I have seen several projects which were struggling to get consistent IO timing despite the CPU load being reasonably low. Replacing longer ISRs by polling or reducing them to a simple "copy byte / set flag" handlers and doing the rest in periodic tasks was the only way to get that timing under control, at the expense of some extra CPU load.

\$\endgroup\$
2
\$\begingroup\$

The concept of interrupts essentially allows you to do two things: the normal stuff that you would ordinarily do, and immediate stuff that needs to be handled when some event happens.

You don't need to do things that way, but if you don't, you're not taking real advantage of the interrupt structure. Your code will probably be harder to understand, maintain, and debug.

Something about your suggested architectures reminds me of RTOS's, though.

\$\endgroup\$
3
  • 3
    \$\begingroup\$ About RTOS, it is legitimate to use irq to wake a thread, but the thread is not running as interrupt , the interrupt routine is very short as it simply flags the OS that thread needs to be rescheduled as it's event is ready. \$\endgroup\$
    – crasic
    Commented Nov 26, 2019 at 15:52
  • 3
    \$\begingroup\$ @crasic, there is a model for a preemptive task switching OS where threads do actually run in the interrupt service routines. It can be found at embedded.com/build-a-super-simple-tasker . It requires nesting interrupts of course, but can be quite helpful for platforms like the 8051 with limited stack space. \$\endgroup\$
    – semaj
    Commented Nov 26, 2019 at 16:11
  • \$\begingroup\$ @semaj Thank you for that reference. Certainly there are other software architectures and application requirements to consider. \$\endgroup\$
    – crasic
    Commented Nov 27, 2019 at 1:33
2
\$\begingroup\$

Can't comment yet, so I'll post it here:

It was mentioned by some that one could carefully design an interrupt based scheduling system on a hardware that supports interrupting interrupts.

I'd like to add: On the Cortex-M, that would seem silly, though - as it has dedicated facilities to support context switching for non-interrupt based "threads", in other words, instead of going to the head ache to design a working all-interrupt system, might as well use something like FreeRTOS?

Context switching on ARM Cortex-M

\$\endgroup\$
2
\$\begingroup\$

The big difference between a main loop and interrupts is scheduling.

A main loop works like a fifo system. Step by step it processes what is asked of it. It is linear, so memory consumption is rather deterministic and if you ask the main loop to do more than it can handle, each task will still be done, but just slower. It is also rather easy to debug.

Interrupts, when issued, interrupt the main loop. Thus the main loop is paused, which can cause timing issues if the main loop is paused for too long.

If a second interrupt is issued while the first is still running, there are two ways a processor can handle that. Which way is determined by the processor, but some processors let the programmer choose. Either the processor can reject other interrupts while one is being handled, or it can interrupt the last interrupt, put it on the stack, execute the new interrupt and then return to the last interrupt.

The first option is rather simple, but it could result in interrupts getting lost, because the interrupt condition is no longer valid by the time the first interrupt finishes it's execution. Also, that way lower-priority interrupts can hinder the execution of higher-priority interrupts.

The other option allows for higher-priority interrupts to interrupt lower-priority interrupts and also interrupts cannot get lost. But that's at the cost of introducing concurrency-related issues even on a single-thread system. Also this makes memory consumption non-deterministic, since a lot of interrupt executions being pushed on the stack increases the memory consumption by quite a lot.

I had this issue once when programming an application that handles IR signals on an ATMega328p. My interrupt routine was too long and slow and if two IR senders would send a signal at the same time to the ATMega (which should result in just an unrecognisable signal) the interrupt execution of the first signal flank was not finished before the next signal came in, which caused that interrupt execution to be pushed on the stack. This was repeated until both IR senders stopped sending. Until then the stack had hundreds interrupt executions stacked onto it, which caused a stack overflow. Now, the ATMega328p has no stack overflow prevention so it just continued spilling into the heap area and completely corrupted the whole memory.

\$\endgroup\$
1
\$\begingroup\$

Ideally you want to do as little as possible via interrupts. It is a lot easier debugging a loop than having interrupts jump in seemingly at random. The fewer the better.

\$\endgroup\$
2
  • 2
    \$\begingroup\$ At the expense of polling? \$\endgroup\$ Commented Nov 27, 2019 at 0:39
  • 3
    \$\begingroup\$ @PeterMortensen Polling is a loop. As much as possible should be done under polling. \$\endgroup\$ Commented Nov 27, 2019 at 11:27

Not the answer you're looking for? Browse other questions tagged or ask your own question.