20
\$\begingroup\$

In microprocessor 8085 instruction, there is a machine control operation "nop"(no operation). My question is why do we need a no operation? I mean if we have to end the program we will use HLT or RST 3. Or if we want to move to the next instruction we will give the next instructions. But why no operation? What is the need?

\$\endgroup\$
5
  • 6
    \$\begingroup\$ NOP is frequently used in debugging and updating your program. If at a later date, you wish to add some lines to your program, you can simply overwrite the NOP. Otherwise you'll have to insert lines, and inserting means shifting entire program. Also faulty instructions (incorrect) can be replaced (simply overwritten)by NOP following same arguments. \$\endgroup\$ Commented May 31, 2015 at 12:21
  • \$\begingroup\$ Ohkay. But using nop is also increasing the the space. Whereas our primary goal is to make it take little place. \$\endgroup\$
    – Demietra95
    Commented May 31, 2015 at 12:24
  • \$\begingroup\$ * I mean our goal is to make a problem smaller. So doesn't it become a problem also? \$\endgroup\$
    – Demietra95
    Commented May 31, 2015 at 12:25
  • 2
    \$\begingroup\$ Thats why it should be used wisely. Otherwise your entire program will be just a bunch of NOPs. \$\endgroup\$ Commented May 31, 2015 at 12:25
  • 1
    \$\begingroup\$ NOP is also useful for flushing pipelines and hazard handling in modern CPUs. \$\endgroup\$
    – Mitu Raj
    Commented Jun 25, 2022 at 23:19

5 Answers 5

39
\$\begingroup\$

One use of NOP (or NOOP, no-operation) instruction in CPUs and MCUs is to insert a little, predictable, delay in your code. Although NOPs don't perform any operation, it takes some time to process them (the CPU has to fetch and decode the opcode, so it needs some little time do do that). As little as 1 CPU cycle is "wasted" to execute a NOP instruction (the exact number can be inferred from the CPU/MCU datasheet, usually), therefore putting N NOPs in sequence is an easy way to insert a predictable delay:

\$ t_{delay} = N \cdot T_{clock} \cdot K\$

where K is the number of cycles (most often 1) needed for the processing of a NOP instruction, and \$T_{clock}\$ is the clock period.

Why would you do that? It may be useful to force the CPU to wait a little for external (maybe slower) devices to complete their work and report data to the CPU, i.e. NOP is useful for synchronization purposes.

See also the related Wikipedia page on NOP.

Another use is to align code at certain addresses in memory and other "assembly tricks", as explained also in this thread on Programmers.SE and in this other thread on StackOverflow.

Another interesting article on the subject.

This link to a Google book page especially refers to 8085 CPU. Excerpt:

Each NOP instruction uses four clocks for fetching, decoding and executing.

EDIT (to address a concern expressed in a comment)

If you are worrying about speed, keep in mind that (time) efficiency is only one parameter to consider. It all depends on the application: if you want to compute the 10-billionth figure of \$\pi\$, then perhaps your only concern could be speed. On the other hand, if you want to log data from temperature sensors connected to a MCU through an ADC, speed is not usually so important, but waiting the right amount of time to allow the ADC to correctly complete each reading is essential. In this case if the MCU doesn't wait enough it risks to get completely unreliable data (I concede it would get that data faster, though :o).

\$\endgroup\$
11
  • 3
    \$\begingroup\$ A lot of things (especially outputs driving ICs external to the uC) are subject to timing constraints like 'the minimum time between D being stable and the clock edge is 100 us', or 'the IR LED must blink at 1 MHz'. Hence (accurate) delays are often required. \$\endgroup\$ Commented May 31, 2015 at 13:00
  • 6
    \$\begingroup\$ NOPs can be useful for getting timing right when bit-banging a serial protocol. They can also be useful to fill unused code space followed by a jump to the cold-start vector for rare situations if the Program Counter gets corrupted (e.g. PSU glitch, impact by a rare gamma ray event, etc) & starts executing code in an otherwise empty part of code space. \$\endgroup\$ Commented May 31, 2015 at 13:10
  • 7
    \$\begingroup\$ On the Atari 2600 Video Computer System (the second video game console to run programs stored on cartridges), the processor would run exactly 76 cycles each scan line, and many operations would need to be performed some exact number of cycles after the start of a scan line. On that processor, the documented "NOP" instruction takes two cycles, but it's also common for code to make use of an otherwise-useless three-cycle instruction to pad a delay out to a precise number of cycles. Running code faster would have yielded a totally garbled display. \$\endgroup\$
    – supercat
    Commented May 31, 2015 at 19:52
  • 1
    \$\begingroup\$ Using NOPs for delays can make sense even on non-real-time systems in cases where an I/O device imposes a minimum time between consecutive operations, but not a maximum. For example, on many controllers, shifting a byte out an SPI port will take eight CPU cycles. Code which does nothing but fetch bytes from memory and output them to the SPI port could run slightly too fast, but adding logic to test whether the SPI port was ready for each byte would make it needlessly slow. Adding a NOP or two may allow code to achieve the maximum available speed... \$\endgroup\$
    – supercat
    Commented Sep 8, 2015 at 16:23
  • 1
    \$\begingroup\$ ...in the case where there are no interrupts. If an interrupt hits, the NOPs would needlessly waste time, but the time wasted by one or two NOPs would be less than the time required to determine if an interrupt made them unnecessary. \$\endgroup\$
    – supercat
    Commented Sep 8, 2015 at 16:24
9
\$\begingroup\$

The other answers are only considering a NOP that actually executes at some point - that's used quite commonly, but it's not the only use of NOP.

The non-executing NOP is also pretty useful when writing code that can be patched - basically, you'll pad the function with a few NOPs after the RET (or similar instruction). When you have to patch the executable, you can easily add more code to the function starting from the original RET and using as many of those NOPs as you need (e.g. for long jumps or even inline code) and finishing with another RET.

In this use case, noöne ever expects the NOP to execute. The only point is to allow patching the executable - in a theoretical non-padded executable, you'd have to actually change the code of the function itself (sometimes it might fit the original boundaries, but quite often you'll need a jump anyway) - that's a lot more complicated, especially considering manually written assembly or an optimizing compiler; you have to respect jumps and similar constructs that might have pointed on some important piece of code. All in all, pretty tricky.

Of course, this was a lot more heavily used in the olden days, when it was useful to make patches like these small and online. Today, you'll just distribute a recompiled binary and be done with it. There's still some who use patching NOPs (executing or not, and not always literal NOPs - for example, Windows uses MOV EDI, EDI for online patching - that's the kind where you can update a system library while the system is actually running, without needing restarts).

So the last question is, why have a dedicated instruction for something that doesn't really do anything?

  • It is an actual instruction - important when debugging or handcoding assembly. Instructions like MOV AX, AX will do exactly the same, but do not signal the intent quite so clearly.
  • Padding - "code" that's there just to improve overall performance of code that depends on alignment. It's never meant to execute. Some debuggers simply hide padding NOPs in their disassembly.
  • It gives more space for optimizing compilers - the still used pattern is that you've got two steps of compilation, the first one being rather simple and producing lots of unnecessary assembly code, while the second one cleans up, rewires the address references and removes extraneous instructions. This is often seen in JIT-compiled languages as well - both .NET's IL and JVM's byte-code use NOPs quite a lot; the actual compiled assembly code doesn't have those anymore. It should be noted that those are not x86-NOPs, though.
  • It makes online debugging easier both for reading (pre-zeroed memory will be all-NOPs, making dissasembly a lot easier to read) and for hot-patching (though I do by far prefer Edit and Continue in Visual Studio :P).

For executing NOPs, there's of course a few more points:

  • Performance, of course - this is not why it was in 8085, but even the 80486 already had a pipelined instruction execution, which makes "doing nothing" a bit trickier.
  • As seen with MOV EDI, EDI, there's other effective NOPs than the literal NOP. MOV EDI, EDI has the best performance as a 2-byte NOP on x86. If you used two NOPs, that would be two instructions to execute.

EDIT:

Actually, the discussion with @DmitryGrigoryev forced me to think about this a bit more, and I think it's a valuable addition to this question / answer, so let me add some extra bits:

First, point, obviously - why would there be an instruction that does something like mov ax, ax? For example, let's look at the case of 8086 machine code (older than even 386 machine code):

  • There's a dedicated NOP instruction with opcode 0x90. This is still the time when many people wrote in assembly, mind you. So even if there wasn't a dedicated NOP instruction, the NOP keyword (alias/mnemonic) would still be useful and would map to that.
  • Instructions like MOV actually map to many different opcodes, because that saves on time and space - for example, mov al, 42 is "move immediate byte to the al register", which translates to 0xB02A (0xB0 being the opcode, 0x2A being the "immediate" argument). So that takes two bytes.
  • There's no shortcut opcode for mov al, al (since that's a stupid thing to do, basically), so you'll have to use the mov al, rmb (rmb being "register or memory") overload. That actually takes three bytes. (although it would probably use the less-specific mov rb, rmb instead, which should only take two bytes for mov al, al - the argument byte is used to specify both the source and the target register; now you know why 8086 only had 8 registers :D). Compare to NOP, which is a a single byte instruction! This saves on memory and time, since reading the memory in the 8086 was still quite expensive - not to mention loading that program from a tape or floppy or something, of course.

So where does the xchg ax, ax come from? You just have to look at the opcodes of the other xhcg instructions. You'll see 0x86, 0x87 and finally, 0x91 - 0x97. So nop with it's 0x90 seems like a pretty good fit for xchg ax, ax (which, again, isn't an xchg "overload" - you'd have to use xchg rb, rmb, at two bytes). And in fact, I'm pretty sure this was a nice side-effect of the micro-architecture of the time - if I recall correctly, it was easy to map the whole range of 0x90-0x97 to "xchg, acting over registers ax and ax-di" (the operand being symmetric, this gave you the full range, including the nop xchg ax, ax; note that the order is ax, cx, dx, bx, sp, bp, si, di - bx is after dx, not ax; remember, the register names are mnemonics, not ordered names - accumulator, counter, data, base, stack pointer, base pointer, source index, destination index). The same approach was also used for other operands, for example the mov someRegister, immediate set. In a way, you could think of this as if the opcode actually wasn't a full byte - the last few bits are "an argument" to the "real" operand.

All this said, on x86, nop might be considered a real instruction, or not. The original micro-architecture did treat it as a variant of xchg if I recall correctly, but it was actually named nop in the specification. And since xchg ax, ax doesn't really make sense as an instruction, you can see how the designers of the 8086 saved on transistors and pathways in instruction decoding by exploiting the fact that 0x90 maps naturally to something that's entirely "noppy".

On the other hand, the i8051 has an entirely designed-in opcode for nop - 0x00. Kinda practical. The instruction design is basically using the high nibble for operation and the low nibble for selecting the operands - for example, add a is 0x2Y, and 0xX8 means "register 0 direct", so 0x28 is add a, r0. Saves a lot on silicon :)

I could still go on, since CPU design (not to mention compiler design and language design) is quite a broad topic, but I think I've shown many different viewpoints that went into the design quite nicely as is.

\$\endgroup\$
9
  • \$\begingroup\$ Actually, NOP is usually an alias to MOV ax, ax, ADD ax, 0 or similar instructions. Why would you design a dedicated instruction which does nothing when there are plenty out there. \$\endgroup\$ Commented Jun 1, 2015 at 11:10
  • \$\begingroup\$ @DmitryGrigoryev That really goes into the design of the CPU language (well, micro-architecture) itself. Most CPUs (and compilers) will tend to optimize the MOV ax, ax away; NOP will always have a fixed amount of cycles for execution. But I don't see how that's relevant to what I've written in my answer anyway. \$\endgroup\$
    – Luaan
    Commented Jun 1, 2015 at 11:16
  • \$\begingroup\$ CPUs can't really optimize a MOV ax, ax away, because by the time they know it's a MOV the instruction is already in the pipeline. \$\endgroup\$ Commented Jun 1, 2015 at 11:35
  • \$\begingroup\$ @DmitryGrigoryev That really depends on the kind of CPU you're talking about. Modern desktop CPUs do a lot of stuff, not just instruction pipelining. For example, the CPU knows it doesn't have to invalidate cache lines etc, it knows it doesn't have to actually do anything (very important for HyperThreading, and even for the multiple pipes involved in general). I wouldn't be suprised if it also influenced branch prediction (though that would probably be the same for NOP and MOV ax, ax). Modern CPUs are way more complex than the oldschool C compilers :)) \$\endgroup\$
    – Luaan
    Commented Jun 1, 2015 at 11:43
  • 1
    \$\begingroup\$ +1 for optimising no-one to noöne! I think we should all coöperate and go back to this style of spelling! The Z80 (and in turn the 8080) has 7 LD r, r instructions where r is any single register, similar to your MOV ax, ax instruction. The reason it's 7 and not 8 is because one of the instructions is overloaded to become HALT. So the 8080 and Z80 have at least 7 other instructions that do the same as NOP! Interestingly enough, even though these instructions are not logically related by bit pattern they all take 4 T States to execute so there is no reason to use the LD instructions! \$\endgroup\$
    – CJ Dennis
    Commented Jun 1, 2015 at 15:17
5
\$\begingroup\$

Back in the late 70's, we (I was a young research student then) had a little dev system (8080 if memory serves) that ran in 1024 bytes of code (i.e. a single UVEPROM) - it only had four commands to load (L), save (S), print(P), and something else I can't remember. It was driven with a real teletype and punch tape. It was tightly coded!

One example of NOOP use was in an interrupt service routine (ISR), which were spaced in 8 byte intervals. This routine ended up being 9 bytes long ending with a (long) jump to an address slightly further up the address space. This meant, given the little endian byte order, that the high address byte was 00h, and slotted into the first byte of the next ISR, which meant that it (the next ISR) started with with NOOP, just so 'we' could fit the code in the limited space!

So the NOOP is useful. Plus, I suspect it was easiest for intel to code it that way - They probably had a list of instructions they wanted to implement and it started at '1', like all lists do (this was the days of FORTRAN), so the zero NOOP code became a fall out. (I've never seen an article arguing that a NOOP is an essential part of the Computing Science theory (the same Q as: do mathematicians have a nul op, as distinct from the zero of group theory?)

\$\endgroup\$
2
  • \$\begingroup\$ Not all CPUs have NOP encoded to 0x00 (although the 8085, the 8080 and the CPU I'm the most familiar with, the Z80, all do). However, if I was designing a processor that's where I'd put it! Something else that's handy is that memory is usually initialised to all 0x00 so executing this as code will do nothing until the CPU reaches non-zeroed memory. \$\endgroup\$
    – CJ Dennis
    Commented Jun 1, 2015 at 12:08
  • \$\begingroup\$ @CJDennis I've explained why the x86 CPUs don't use 0x00 for nop in my answer. In short, it saves on instruction decoding - xchg ax, ax flows naturally from the way the instruction decoding works, and it does something "noppy", so why not use that and call it nop, right? :) Used to save quite a bit on the silicon for instruction decoding... \$\endgroup\$
    – Luaan
    Commented Jun 1, 2015 at 14:18
5
\$\begingroup\$

On some architectures, NOP is used to occupy unused delay slots. For example, if the branch instruction doesn't clear the pipeline, several instructions after it get executed anyway:

 JMP     .label
 MOV     R2, 1    ; these instructions start execution before the jump
 MOV     R2, 2    ; takes place so they still get executed

But what if you don't have any useful instructions to fit after the JMP? In that case you'll have to use NOPs.

Delay slots are not limited to jumps. On some architectures, data hazards in CPU pipeline are not resolved automatically. This means that after each instruction which modifies a register there is a slot where the new value of the register is not accessible yet. If the next instruction needs that value, the slot should be occupied by a NOP:

 ADD     R1, R1
 NOP               ; The new value of R1 is not ready yet
 ADD     R1, R3

Also, some conditional execution instructions (If-True-False and similar) use slots for each condition, and when a particular condition has no actions associated to it, its slot should be occupied by a NOP:

CMP     R0, R1       ; Compare R0 and R1, setting flags
ITF     GT           ; If-True-False on GT flag 
MOV     R3, R2       ; If 'greater than', move R2 to R3
NOP                  ; Else, do nothing
\$\endgroup\$
1
  • \$\begingroup\$ +1. Of course, those only tend to show up on architectures that don't care about backwards compatibility - if x86 tried something like that when introducing instruction pipelining, almost everyone would simply call it wrong (after all, they just upgraded their CPU, and their applications stopped working!). So x86 has to make sure the application doesn't notice when improvements like this are added to the CPU - until we got to multi-core CPUs anyway... :D \$\endgroup\$
    – Luaan
    Commented Jun 1, 2015 at 14:23
2
\$\begingroup\$

Another example of a use for a two-byte NOP: Link

The MOV EDI, EDI instruction is a two-byte NOP, which is just enough space to patch in a jump instruction so that the function can be updated on the fly. The intention is that the MOV EDI, EDI instruction will be replaced with a two-byte JMP $-5 instruction to redirect control to five bytes of patch space that comes immediately before the start of the function. Five bytes is enough for a full jump instruction, which can send control to the replacement function installed somewhere else in the address space.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.