Is there a way to create a finite wait loop where the execution of one loop takes only two cycles (or even one)?
Yes, kind of. Assumption is that the delay is known at assembly time. Solution is to write a macro that adds NOPs as needed, and that computes the number of loopings as needed (may be zero).
Here is the code for the GNU tools, feel free to transpose it for your favourite assembler:
;; We need an upper d-register for the loop.
#define WAIT_REG R16
.macro wait cycles
.if \cycles / 3 > 0
ldi WAIT_REG, \cycles / 3
.Loop.\@:
dec WAIT_REG
brne .Loop.\@
.endif
.if \cycles % 3 > 0
nop
.endif
.if \cycles % 3 > 1
nop
.endif
.endm
Sample usage:
.text
.global main
main:
wait 0
wait 1
wait 2
wait 3
wait 4
wait 10
wait 100
wait 500
ret
The generated assembly disassembles as (comments added by hand):
;; Delay = 0
;; Delay = 1
0: 00 00 nop
;; Delay = 2
2: 00 00 nop
4: 00 00 nop
;; Delay = 3
6: 01 e0 ldi r16, 0x01
00000008 <.Loop.3>:
8: 0a 95 dec r16
a: f1 f7 brne .-4 ; 0x8 <.Loop.3>
;; Delay = 4
c: 01 e0 ldi r16, 0x01
0000000e <.Loop.4>:
e: 0a 95 dec r16
10: f1 f7 brne .-4 ; 0xe <.Loop.4>
12: 00 00 nop
;; Delay = 10
14: 03 e0 ldi r16, 0x03 ; 3
00000016 <.Loop.5>:
16: 0a 95 dec r16
18: f1 f7 brne .-4 ; 0x16 <.Loop.5>
1a: 00 00 nop
;; Delay = 100
1c: 01 e2 ldi r16, 0x21 ; 33
0000001e <.Loop.6>:
1e: 0a 95 dec r16
20: f1 f7 brne .-4 ; 0x1e <.Loop.6>
22: 00 00 nop
;; Delay = 500
24: 06 ea ldi r16, 0xA6 ; 166
00000026 <.Loop.7>:
26: 0a 95 dec r16
28: f1 f7 brne .-4 ; 0x26 <.Loop.7>
2a: 00 00 nop
2c: 00 00 nop
;; Epilogue
2e: 08 95 ret
This can easily be generalized to delays longer than 2+3·255 = 767 cycles by more .if
s and nested loop(s). The required arithmetic is all linear in nature.
In the case when the delay is not known at assembly time, stuff gets more complicated, and there will always be specific delays that cannot be realized, in particular short ones.