20
\$\begingroup\$

Describing a specific operation as being done "in hardware" versus "in software" in a given computer system is common. For example, simple computer systems (I am assuming) might not have hardware for division. What do we really mean by this distinction?

Is the distinction effectively based on whether or not the computer's architecture layer (the layer which defines the hardware-software contract) which defines the instruction set for the computer includes an instruction for said operation or not? Going to the division example, would we say that division is done "in hardware" on a given computer if there is an assembly instruction for division, while division is done "in software" (presumably by a compiler interpreting a division instruction and producing the relevant assembly to execute a division algorithm via some number of instructions which are supported by the given instruction set) if not?

\$\endgroup\$
7
  • \$\begingroup\$ If I write software for a mixed language toolchain, capable of sufficient analysis to generate not one but several runtime CPU architectures and control software, even including self-modifying controlling microstores, to target some limited hardware box of FPGAs (for example, but not so limited), and it generates several kinds of division instructions (say it decides I need a 48-bit/24-bit divisor, plus a 12-bit/6-bit divisor, for part of the application run) then what is the meaning of your question? \$\endgroup\$ Commented Feb 25 at 3:51
  • 1
    \$\begingroup\$ You appear to be seeking arbitrary manmade definitions. Something static perhaps, too. Rather, I think, seek the natural boundaries where they lay at the time. But be prepared to adapt. \$\endgroup\$ Commented Feb 25 at 3:51
  • 1
    \$\begingroup\$ I think your assessment is exactly correct (and well taken) after reading the various answers here, @periblepsis. \$\endgroup\$
    – EE18
    Commented Feb 25 at 4:00
  • 3
    \$\begingroup\$ IMO, the instruction set says nothing about what is done in (dedicated purpose-built) hardware and what will be done on generic units. \$\endgroup\$
    – tobalt
    Commented Feb 25 at 6:23
  • 7
    \$\begingroup\$ I feel like this is one of those questions where the answer appears obvious (“I know it when I see it”) but is actually very hard to define. In the end even software is just a hardware configuration. \$\endgroup\$
    – Michael
    Commented Feb 25 at 17:54

10 Answers 10

22
\$\begingroup\$

I think we can agree that if the compiler is synthesizing a lot of instructions unrelated to a division in order to divide two number, it is being done in software, but your question here is a bit more difficult:

would we say that division is done "in hardware" on a given computer if there is an assembly instruction for division

Ordinarily yes, and in most cases this is accurate. There are however a few situations where it gets tricky. I'm giving examples from CPUs I've worked with, so these are not hypothetical.

Trapped/emulated instruction

The CPU may have the division as a single instruction, but it will cause some sort of trap or exception that has to be handled. This is the case for 64-bit division on my favourite CPU, the MC68060. An older generation of the family did have the instruction, but it was very slow and took a lot of transistors just to do something that was rarely done, so they decided to keep the instruction and have it done in software whenever it shows up in the code.

I would call this "software". If the operating system has not set up the correct exception handlers, the program will just crash if this instruction is executed.

It's faster for the compiler to just generate the code to manually divide, than it is to rely on this emulated instruction.

Helper instructions

The DSP56000 series has a div instruction, but it does not actually divide as such. You need to run it a number of times in sequence, as many bits you want out of the output. Is this software or hardware?

Microcode

I guess most CPUs run microcode inside, modern Intel CPUs are probably the most prominent example where they have kept the old instruction set but everything is internally translated beyond recognition to match a different architecture. This can be more obvious in certain cases.

The ARM Cortex M0 comes with a MULS instruction for multiplying two 32-bit integers, but it's up to the company who manufactured the actual chip if they want to have the "fast" or the "slow" version. The slow version takes 32 cycles because it does the multiplication in a loop inside the CPU. It's still an actual instruction though, and I would call this "hardware".

\$\endgroup\$
8
  • 1
    \$\begingroup\$ Another border case: RP2040 has a hardware divider which is not accessed by an instruction but rather by writing the operands into a memory-mapped IO address. \$\endgroup\$
    – jpa
    Commented Feb 26 at 7:18
  • 1
    \$\begingroup\$ Another example of trapped/emulated instructions: The Intel 8086 (and the similar 8088, plus later 80286, 30386 and 80486SX) had an optional coprocessor 8087 with a set of floating-point operations. Indeed in some compilers (I remember Turbo Pascal) you had 3 options: 1. software-only 2. hardware-only 3. hardware, with emulation in case the hardware is not preset at runtime. \$\endgroup\$
    – Jonathan
    Commented Feb 26 at 9:26
  • 2
    \$\begingroup\$ Quibble: @pipe says " I guess most CPUs run microcode inside". Quibble: many if not nearly all RISC processors do NOT run microcode. In fact, not having microcode was originally 1 of the criterion for something being a RISC. \$\endgroup\$
    – Krazy Glew
    Commented Feb 27 at 3:29
  • 1
    \$\begingroup\$ @KrazyGlew: At least a few CPUs that are generally considered "RISC" definitely do/did use microcode. For one example, the ARM1 definitely did. Later generations of RISC CPUs often use micro-ops, and there's still debate about whether that counts as microcode or not (I would say yes, but realize it's enough different from traditional microcode that I can understand how some disagree). \$\endgroup\$ Commented Feb 27 at 19:03
  • 1
    \$\begingroup\$ @KrazyGlew the criterion for RISC is being a load/store architecture and has nothing to do with having microcode or not. Modern ARM CPUs have micro-ops and complex ISAs such as PowerPC almost definitely must use microcode. Or RISC-V, the latest in town, also use micro-op in some implementations and more powerful implementations can even use macro-op, you can see that in the RISC-V spec \$\endgroup\$
    – phuclv
    Commented Mar 7 at 13:27
12
\$\begingroup\$

One good and intuitive example is PWM on microcontrollers. This is possible with software, by setting a pin high or low, then counting a number until a threshold in a simple loop, then flipping the pin, counting again and so on.

Importantly, the CPU will be occupied 100% of the time, because both the pin setting and the counting is done using generic software. One could interleave this with other code, bit still the CPU would run many many instructions per PWM cycle.

Alternatively, the hardware timers can be set up to run independently from the main software. They will use their own registers for counting and thresholds and have their own counter and compare units. The CPU has to do zero instructions to keep this going, apart from occasional updates to the registers, which frees a lot of time for other software. So in this case, the PWM is done in hardware.

\$\endgroup\$
1
  • 1
    \$\begingroup\$ +1. Similarly, you may use "bit-banging" to implement a communication protocol in software if the platform doesn't have dedicated hardware for it. \$\endgroup\$
    – towe
    Commented Feb 26 at 13:53
12
\$\begingroup\$

As a computer architect who has worked on processors including non-microcoded RISCs and microcoded CISCs in industry, I make more distinctions than just hardware/software:

  • Inside the instruction set, or NOT

  • Inside the processor, as opposed to being implemented by some completely separate device.

  • How implemented? A spectrum:

    • dedicated combinatoric hardware circuits implementing full operation
    • hardware state machines controlling circuits implementing partial operation
    • "horizontal microcode" - although I prefer to say microcode that actually controls timing
    • "generic microcode" - sometimes called "vertical microcode",
    • PALcode (like DEC Alpha)
    • trap and emulate
    • "pure software" using ordinary instructions

In this implementation spectrum, software is everything below the line drawn between microcode and PALcode, although from my point of view the processor generic microcode is almost the same as software.

But from my point of view, there is a very big distinction between the "real hardware" combinatoric circuits and hardware state machines, and generic microcode. Horizontal or explicit timing controlled microcode is fuzzy.

Inside the instruction set or not is almost completely orthogonal, except for that last item of "pure" software.

As others have pointed out, inside or outside the processor is almost equally orthogonal.

---+ Inside the instruction set

As you (OP) say, an operation like multiply or divide may be provided by the instruction set or not.

If not provided by the instruction set, e.g. one of the early RISC processors that provided no multiply instruction, then multiply is definitely "implemented in software by ordinary instructions"

If the processor has a multiply step or divide step instruction, I would probably say "lacks full hardware multiply or divide", or is "implemented in software with partial hardware support".

If the full operation is provided by the instruction set: ...

... If implemented by trap and emulate, I would classify that as a software implementation - but T&E is important enough that I might say "implemented by trapping and emulating in software".

---+ "implemented in hardware"

On to "implemented in hardware":

---++ Combinatoric Logic and/or Hardware State Machines

Some operations, like 32-bit multiply, might be implemented as a single comminatoric logic circuit, eg a 32x32=32 or 32x32=64 array multiplier, possibly/pipelined over several cycles; or by a smaller circuit, eg a 32x8=32 or 40 bit slice, taking several passes or iterations.

If a single combinatoric circuit, I would say "implemented in hardware".

If several passes over a partial multiplier erase slice, or similarly for a divider, I tend to make distinctions based on how the iterations or loops are controlled.

If the looping is controlled by a dedicated hardware state machine, then I would say "implemented in [a] hardware [state machine]."

---+ Microcode

If you have significant combinatoric logic hardware, but if the looping or control or sequencing of that hardware is controlled by microcode, and in particular by generic microcode, eg the vertical or VLIW microcode of many machines that could actually be used for general purpose programming if it were exposed to the user, then I will usually make the distinction "partially or substantially implemented in hardware, with microcode control", or "microcode with hardware acceleration".

But some microcode implementations don't have big chunks of combinatoric logic to do the work. They might implement the operation just using ordinary microcode loads and stores and arithmetic instructions, and so on - almost exactly as it might have been implemented in a pure software implementation.

From my point of view as a computer architect, pure or generic microcode is almost the same as software.

Note that modern x86 processors have both generic microcode and dedicated hardware state machines. Dedicated hardware state machines for things like integer divide, floating point divide, add floating point (inverse) square root. Generic microcode often for transcendentals like sin and logarithm, although usually with some dedicated hardware such as look up tables. And dedicated non-iterative hardware for things like multiply.

---+ What is "generic" microcode?

What makes micro code "generic" or not? Well, some processors have microcode that could almost be exposed to the user as an ordinary instruction set.

E.g. It might be RISC-like micro-ops, uops, or rops, although frequently failing the classic definition of RISC by having micro instruction sizes like 140 bits, or more than the classic RISC 2 or 3 input / 1 output register. Often VLIW.

Microcode may also differ from software, i.e. micro instructions may differ from the macro instruction set, by having access to register that neither the user nor the operating system can access, or ordinary registers may be extended, e.g. The user visible 32 or 64 bit architectural registers may actually be 40 or 72 wide, e.g. so that condition codes are computed by the same hardware circuits as the main register bits, to be separated later.

From my point of view what makes microcode "generic" or not is whether it has very tight control of timing, the micro instructions are not really aware of timing, and flow through the machine under the control of an out of order or in order instruction scheduler. Some textbooks make the distinction between vertical micro code and horizontal micro code, and some of the original VI instruction sets can be imagined as exposing the parallel operations of horizontal microcode. I find it more useful to distinguish microcode that is timing aware from microcode that is not.

I started using this generic or vertical microcode when I started designing out of order processors: the uops (micro-ops) were the things that flowed through the micro dataflow engine. But you can use the same approach for in-order pipelines as well.

"Generic micro" may be mostly compatible between processor generations and families. Whereas timing dependent micro will obviously have to change if the pipeline depth or interlock change.

---+ PALcode

Why have I spent so much time talking about the difference between hardware and generic microcode? In part because there is another possibility, intermediate between microcode and higher-level software implementations, whether trap and emulate or ordinary software. RISC processors sometimes implement the operations that a so-called CISC machine like the Intel x86 would implement in microcode as what the DEC Alpha called PALcode: essentially the same sort of RISC instructions as are visible to ordinary user code and operating system code, but running in a special mode "underneath" the operating system. Again, PALcode may have access to resources like special registers that ordinary user or operating system code does not have. The main distinction between PALcode and classic microcode is that the classic micro instruction set is usually very much different from the ordinary user and operating system instruction sets whereas for PALcode They are much more similar.

One almost always calls PALcode "software".

Actually, I am told that the Intel i960 RISC processor's microcode was very much like Alpha PALcode - what they called it microcode.

---+ Where Implemented: Inside or Outside

@Arvo, in his answer, mentions another common definition of "in hardware" «In computer context "in hardware" [usually] means offloading some operations off from CPU to some other device».

I completely agree, except for "usually" - but that is undoubtedly a function of what community of people you are working with.

He goes on to say «which itself can use internal software to accomplish its task.» This is important: the external device may itself have hardware combinatoric circuits or state machines, or the operations may be completely implemented in software on the other device.

If implemented in software on the external device, then, if the host processor and the external device processor are comparable, the speed of the operation is probably comparable. Whereas in the common case where the external device processor is significantly slower than the host processor, the overall operation may take longer.

I.e. sometime such "offload engines" result in operations taking longer Then they would to if executed on the host processor. But even if it takes longer, overall performance might improve because the host processor and the external device processor are running in parallel. But perhaps the operation could be run in host processor software on an idle in a multi processor.

One of my friends has long been an advocate of creating accelerator devices like this. But at MIPS, for example, I frequently had to remind people that "We [MIPS] are the 'hardware' accelerator." By that time MIPS had pretty much failed in the general-purpose computer market, and was mostly being used to control such external devices. Arguably many RISC-V companies are in that space.

Note that such external devices used to implement an operation might be accessed from pure software, from trap and emulate software, from PAL code or microcode, possibly even from hardware, e.g. state machines. Yes: You might have hardware in the CPU sending and operation to an external device that implements it in software on that external devices processor.

Or... External "co-processors" might be used to implement instructions defined in the instruction set. In my experience a "co-processor" usually has a connection that is more intimate with the main processor than normal I/O devices do. A co-processor instruction might send something out across a special co-processor bus. Whereas a non-co-processor accelerator might use ordinary memory mapped I/O, just like a disk drive controlled by the operating system.

Note: implementation as an ordinary I/O device in my experience nearly always means that ordinary users cannot directly access the device. They typically have to do a system call to access it, and/or put a request on queue that the hardware will pick up later. Or both. in most modern systems ordinary user software cannot receive device interrupts. And they are often challenged to take page-faults. What this overall means is that there's a certain minimal cost that affects the granularity of operations you can send to the external device. It doesn't have to be this way: you can define an I/O device architecture that can be used from user mode. I have. It's just not very common.

Why use instructions rather than RISC primitives?

Mitch Alsup, a prominent computer architect, often said that there were three reasons to put operations inside the instructions set or in hardware rather than software RISC instructions:

  • performance

  • security,

  • and atomicity.

I sometimes add a fourth criterion: Executable machine code compatibilty. Whether across generations, or between low end and high end implementations at the same time, or simply for consistency between different vendors. E.g. I was told that 1 of the big motivations for Motorola adding floating-point instructions to the 68000 family was that the software implementations were so widely different that it was giving Motorola a bad name.

Sometimes the RISC like primitives that one might use to implement an operation can only be used in certain specific instruction sequences. If the primitives were available to ordinary user programs or even to operating systems software then there might be security problems. Yes: Even exposing something to the operating system may cause security problems, e.g. if a guest operating system is running in a virtual machine environment. Or programs might suffer atomicity issues, whether multi processor/parallel programming or atomicity such as with respect to interrupts. Security or atomicity issues can often be addressed by either microcode or non-micro hardware implementations. Performance issues are often mostly addressed by hardware machines "underneath" generic microcode.

---+ Instruction set or not

So, it can be seen that the question of whether an operation is implemented as part of the instruction set is almost orthogonal to whether the operation is implemented inside the processor or outside the processor, or whether the operation is implemented by dedicated hardware combinatoric logic, hardware state machines, microcode, trap and emulate, or pure software. Or even the more abstract intermediate languages that are compiled to the actual machine code for a particular processor ( most commonly GPU).

Why then put something inside the instruction set?

Well, if it's inside the instruction set you have all of the above implementation possibilities. Any given executable binary containing machine code can be run on implementations with various levels of hardware cost and performance. if compatibility matters to you.

But even the most trivial trap and emulate software implementation of an operation costs something for the processor manufacturer, if only testing.

And if your processor is a black box, and if trap and emulate should is definitely not acceptable performance, then a company may not really have the option of doing anything in the instruction set. Or a company that is a customer of such a processor may be willing to build external hardware that performs faster than an equivalent operation inside the processor, especially if the internal implementation inside the host processor is not very high-performance.

Note: so-called intermediate languages, which are not directly executed but which are translated at program load time to the actual physical instruction set, have a little bit more flexibility here. They can provide compatibility, with a little bit less implementation cost inside the processor. Although somebody has to validate the intermediate language implementation. Which is often harder to do than validating hardware or microcode, because the attack surface in security terminology is larger.

---+ Examples

As noted elsewhere multiplication operations were often implemented in pure software, then software with multiply step instructions, then hardware state machines around array multiplier slices, and now for the most part are pure hardware.

Similarly, divide operations have gone through similar evolution from pure software through divide steps to hardware state machines. I am not aware of anybody building a 100% combinatoric logic implementation of a divide, certainly not for 32 or 64 bits. and how many people need to do divides of 8 bit values (that are not already using logarithmic representations where logarithmic divide corresponds to integer subtract).

Similarly, IIRC some papers mentioned software implementations of floating-point as a motivation for RISC processors. But nowadays most high-end processors do floating-point add, subtract, and multiply in combinatoric albeit pipelined logic, and floating-point divide with hardware state machines controlling sequencing of the hardware multiplier array or something like divide step hardware. similarly square root, or inverse square root, which can often use almost same hardware as divide.

Some machines, however, expose a divide-init or 1/x instruction software, which can then convert a sequence of divides by the same denominator x1/y, x2/y the sequence of multiplications by 1/y... especially if exact IEEE values are not required.

Floating-point NaN or denorm handling is another example where the division between combinatoric hardware and sequencing via state machines or microcode or software trap and emulate or ... can be fuzzy. Classic CPUs such as Intel x86 or classic RISCs like PowerPC or RISC-V often let NaNs or denorms be handled by software or microcode or state machines, since denorms are not that common... but GPUs, if they support NaNs and do not simply flush denorms to 0, are actually more likely than CPUs to implement these things in (pipelined) hardware - since GPUs historically are not very good at handling traps or exceptions. Rather than a single thread of execution on a CPU trapping, in a GPU 16 or 32 or 64 SIMT threads of execution might be stopped for a trap and emulation. which is a big motivation for avoiding the trap, implementing hardware is necessary.

IMHO 1 of the classic examples of these trade-offs are block memory operations, such as memcpy or bcopy. On all of the hardware/software, combinatoric vs sequenced in hardware state machines or microcode or trap and emulate, and external hardware:

Optimizing memcopy in pure software is a right of passage for any so-called performance programmer. Taking advantage of loop unrolling, hash bypass instructions, etc. etc. In particular, taking advantage of knowledge of the use case: e.g. if you know that your code is only going to be used to memcpy 64MB memory buffers, completely aligned, you can make a lot of optimizations.

Optimizations that may not be available to an implementation of memcpy in the instruction set. Such as x86 REP MOVSx.

I consider that 1 of the biggest mistakes I made as a computer architect was implementing Intel P6 REP MOVSx in fairly generic microcode. Oh, sure, we took advantage of 64-bit loads and stores at a time when the actual integer instruction set that only had 32-bit loads and stores, and slightly special memory access uops that reduced unnecessary memory traffic by 33% for large aligned block copies. also a few gimmicky micro instructions to start off.

But IMHO it would've been much better to actually use a hardware state machine to do that REP MOVSx memcopy. There was way too much microcode overhead looking at the alignment of the source and destination memory buffers, and their sizes, before actually getting into the loops that did the memory copy. One thing that hardware combinatoric logic can do much better than software is irregular IF conditions or multiway branching. Software or microcode typically can only do binary two-way branches or 2^N way computed jumps, whereas a combinatoric logic circuit can evaluate minterms with extremely variable number of bits in one pass.

Also... software or microcode branching logic "rots", usually faster than hardware: it may have been designed to evaluate conditions N cycles in front of where they are needed, and it may be doing M-way loop unrolling, but the timing, the values of N and M, may change dramatically on different implementations. Same thing applies to hardware control, except that the hardware control can often just take one or a few cycles, rather than 5 or 15 cycles to go through a chain of if statements or multiway branches. In software, you nearly always have to optimize the branching logic at the start of memcopy to favor big buffers or small buffers or ... whereas in hardware you can branch directly to any of the above, as long as all of the bits you need to look at fit. At the very least, if you are doing such a software or microcode implementation of memcopy, you should create an auto tuning framework that regenerates a code on every different processor microarchitecture.

How about external hardware for memcopy? Sure: as mentioned above, and external DMA engine operating on physical addresses can be hard to access from ordinary user code. So you either pay operating system overhead, or work around it some other way. Or you might need to have will address is ... but then it has to be integrated with your operating system virtual memory code. if the external DMA engine is far enough out, it may have no choice except to evict the data from the processor cache. But cache bypass is often not the best thing to do.

Yada yada yada

---+ CONCLUSION

My point is that there is a spectrum from pure software User code or OS code, through trap and emulate, PALcode, generic microcode, non-generic microcode that is very tightly bound to micro architecture, hardware state machines, hardware combinatoric circuits. Different people may draw the line separating software and hardware at different places in this spectrum.

\$\endgroup\$
7
  • 2
    \$\begingroup\$ This was a magnificent answer, thank you so much for taking the time to write it. Much of it went above my head as I am more familiar with the levels of the stack well below the architecture and microarchitecture levels, but it was very helpful nevertheless. It also gives me a reference point for how much there remains to learn! \$\endgroup\$
    – EE18
    Commented Feb 26 at 15:52
  • \$\begingroup\$ @EE18: You work below the architecture and microarchitecture levels? I.e. in RTL, logic, transistors, physical devices, materials? \$\endgroup\$
    – Krazy Glew
    Commented Feb 27 at 5:23
  • \$\begingroup\$ Correct, although I should say will soon be working there. \$\endgroup\$
    – EE18
    Commented Feb 27 at 16:39
  • \$\begingroup\$ Just for what it's worth, I wrote a pure-combinatoric divider as a VHDL generic years ago. Never actually used it in a CPU, but did test the 32-bit version in an FPGA. That took up more than half of the FPGA I had handy though. I wasn't willing to buy an FPGA large enough for a 64-bit instantiation. They use a lot of gates... \$\endgroup\$ Commented Feb 27 at 19:13
  • \$\begingroup\$ @JerryCoffin: out of curiosity, what divide algorithm did you use for your pure-combinatoric divider? A cascade of SRT steps, and if so what radix? 4-bit, 8-bit... Fully unrolled Newton-Raphson? If so, I assume that you did not replicate a full 64x64 multiplier for each stage, but instead optimized with smaller multiplier at first, increasing size at each stage? A digit selection divider that uses a small multiplier or slice as part of the estimation? Something else? Pipestages? Logic depth / pipestage? E.g. FO4, but that's old-fashioned - LUT depth for your FPGA? \$\endgroup\$
    – Krazy Glew
    Commented Mar 24 at 18:05
9
\$\begingroup\$

To abstract things a little further, and possibly also muddy the waters (sorry), think about RAID in larger computer systems.

  • "Software RAID" has an apparent and visible impact on the main processor(s), potentially executing in the driver or kernel, with close to zero opportunity to offload any of the associated compute effort.
  • "Hardware RAID" is still generally software/firmware, but it's running on a fully independent processor (possibly an ASIC with acceleration for specific operations) and is thus separated from the main system's processing.

I think that in many ways, this is a matter of perspective and how something's implementation impacts the rest of the system.

I'd suggest that "in hardware" implies that there is a clear boundary and dedicated logic associated with the function - this still holds for the above RAID example, even if the "dedicated logic" is another entire processor with firmware. This is also the case for other things, including many wireless chipsets (e.g: CYW43439 as found on the Pi Pico W, which actually has a Cortex-M3 and a Cortex-M4 inside).

\$\endgroup\$
8
\$\begingroup\$

Here an example, using multiply instead of division.

Many years ago I worked on a system based on the 8086 microprocessor (I hope I have the parts right here). The 8086 came with a built-in multiply instruction MUL. But it was very slow, 10 of microseconds to execute IIRC. This was too long for our application which was closing a video feedback loop to stabilize and point a 3-axis gimbal assembly.

So we attached a TRW (an MPY-16 I believe) 16-bit multiplier to the 8086's data bus. This would do a 16-bit signed multiply in several hundred nanoseconds, more than an order of magnitude faster than the 8086's native multiply instruction. The MPY-16 contained around 3,500 bipolar gates, and used combinatorial logic to perform the multiply operation. It also consumed around 5W of power

In operation the software running on the 8086 would write the two 16-bit values to the MPY-16, do a couple of no-ops, then read the 32-bit result back in.

So, using the multiply operation on the 8086 (software) vs using the external TRW hardware multiplier chip (hardware).

\$\endgroup\$
3
  • \$\begingroup\$ The Nintendo DS, released in November 2004, also has this type of system. I don't know why they didn't instead choose a processor core with a good division unit. \$\endgroup\$ Commented Feb 25 at 9:15
  • \$\begingroup\$ @user253751 It may be that the processors available with this feature were more expensive, enough so that the added cost of the external processor was less. \$\endgroup\$
    – EvilSnack
    Commented Feb 25 at 21:26
  • 1
    \$\begingroup\$ Just a note. The TRW MPY-16 was not a processor, or co-processor (like the Intel 8087). It had one job and one job only - to multiply 2 16-bit signed integers. That's it. \$\endgroup\$
    – SteveSh
    Commented Feb 25 at 23:34
3
\$\begingroup\$

Hardware is when a function is done with materially assembled logic gates. Hardware functions can be modified only by physically modifying the assembly of logic gates or the way they are connected together. For example, an ic can be programmed by connecting some of its pin to ground or to supply through a resistor. It's hardware programmed because you made a physical modification in the circuit.

Software is anytime someone writes a code, compile it and load it to the device. To be more specific, anytime someone sends a string of bits having the effect of modifying the way the circuit works until a new string of bits is send to modify it again. Because in this case, the functions are modified without having to modify the physical construction of the circuit.

\$\endgroup\$
13
  • \$\begingroup\$ This puts FPGA/ISP in the software realm. \$\endgroup\$
    – greybeard
    Commented Feb 26 at 11:44
  • \$\begingroup\$ @greybeard The FPGA itself is a piece of hardware, But it uses a software. \$\endgroup\$
    – Fredled
    Commented Feb 26 at 11:59
  • \$\begingroup\$ Makes sense to me. If user writing software can change how this division operation behaves, then it's (at least partly) software. If I cannot, then it is hardware. If FPGA can be reprogrammed by me, then what it does is software for me. Software which takes a long time to update, but still software. If however FPGA is set in such way that I cannot reprogram it, then it is hardware. Litmus test: imagine your thingy (which you're not sure if it is software or hardware) runs on Voyager 1 space probe. Can you change what it does remotely from Earth? If so, it's software. If not, it's hardware. \$\endgroup\$ Commented Feb 26 at 15:35
  • \$\begingroup\$ @MatijaNalis On Voyager 1, they managed to update the software after 45 years, billion miles apart. ;) FPGA's which can be programmed only once are using "software" nonetheless because a code has been written and loaded to define the operations at set up. Even if it's only once. It would be hardware if there were no way to upload a software to set up the chain of operations that you want. \$\endgroup\$
    – Fredled
    Commented Feb 26 at 20:56
  • 2
    \$\begingroup\$ @Fredled: Although they're (mostly?) obsolete now, at one time you could get one-time programmable fuse-based PROMS. They used a small trace of Nichrome. You programmed it with a higher voltage that burned the nichrome trace, which was visible under a microscope (though not always easily--it often just formed a narrow crack in the nichrome that was only barely visible). But for more purposes, those have been obsolete for decades. There were also antifuse based PROMs, but most of them stacked the antifuse vertically, so although there was physical alteration, it was hard (impossible?) to see. \$\endgroup\$ Commented Feb 29 at 2:13
2
\$\begingroup\$

Going to the division example, would we say that division is done "in hardware" on a given computer if there is an assembly instruction for division, while division is done "in software" (presumably by a compiler interpreting a division instruction and producing the relevant assembly to execute a division algorithm via some number of instructions which are supported) if not?

Done "in hardware" doesn't necessarily imply a specific CPU instruction is required. Another answer mentioned using an external 16-bit multiplier connected to the data bus of an Intel 8086, so used hardware external to the CPU. This may be thought as Co-processor hardware to extend the capabilities without need to add new CPU instructions.

Another example is the Texas Instrument MSP430 Mixed-Signal Microcontrollers, which is an active product range, are similar in that while some devices supports a hardware multiplier, the multiplier is implemented as a peripheral rather than integrated into the CPU. The original F1xx/2xx/4xx family 16-bit hardware multiplier used in devices with a MSP (16-bit addressing CPU) is described as:

The hardware multiplier is realized as each other 16 bit peripheral module, and not integrated into the CPU. The CPU is unchanged through all configurations, and the instruction set is not modified. It take no extra cycle for multiplication. Both operands are loaded into the multiplier’s register and the result can be accessed immediately after loading the second operand.

The Hardware Multiplier Module expands the capabilities of the MSP430 family without changing the basic architecture. Multiplication is possible for:

  • 16 x 16 bit
  • 16 x 8 bit
  • 8 x 16 bit
  • 8 x 8 bit

As an example the MSP430F14x, MSP430F14x1, MSP430F13x Mixed-Signal Microcontrollers datasheet which covers the MSP430F149, MSP430F148, MSP430F147, MSP430F1491, MSP430F1481, MSP430F1471, MSP430F135 and MSP430F133 devices shows the Hardware Multiplier is only present in the MSP430F14x and MSP430F14x1 devices. I.e. customers can chose to select a device from the range without or without a builtin hardware multiplier. This fits in with the MSP430 family having devices with different amounts of internal FLASH, RAM and peripherals allowing customers to select the combination of features .vs. unit cost per device.

The MSP430 Optimizing C/C++ Compiler v21.6.0.LTS User’s Guide shows the --use_hw_mpy option which can be used to select if the compiler uses the hardware multiplier or not:

The optional argument indicates which version of the hardware multiply is being used and must be one of the following:

  • 16 uses the F1xx/2xx/4xx family 16-bit hardware multiplier (default)
  • 32 uses the F4xx 32-bit hardware multiplier
  • F5 uses the F5xx/6xx family 32-bit hardware multiplier
  • none = does not use a hardware multiplier For more information regarding the hardware multiplier, see the Family User’s Guide for the MSP430x1xx, MSP430x3xx, MSP430x4xx, and MSP430x5xx.

I.e. even though the hardware multiplier is a peripheral, rather than integrated into the CPU, the C compiler integrates support for the hardware multiplier.

\$\endgroup\$
3
  • \$\begingroup\$ Nice piece of anecdotal evidence. Does it answer What do we really mean Describing a specific operation as being done "in hardware" versus "in software"? \$\endgroup\$
    – greybeard
    Commented Feb 25 at 12:01
  • \$\begingroup\$ @greybeard I have tried to improve the answer in response to your comment. Does that address your comment? \$\endgroup\$ Commented Feb 25 at 12:22
  • \$\begingroup\$ In my eyes the edit does improve the motivation to give this example in response to the question. \$\endgroup\$
    – greybeard
    Commented Feb 25 at 13:24
1
\$\begingroup\$

In computer context "in hardware" usually means offloading some operations off from CPU to some other device, which itself can use internal software to accomplish its task. Think about network adapters, graphic cards, audio controllers and so on - all those operations could be done on main CPU ("in software"), but it is much more efficient to use specialised hardware for them.

\$\endgroup\$
0
\$\begingroup\$

At some point the distinction becomes a bit shady. It is possible to construct quite a complex state machine with nothing more than a PROM, a latch (from a subset of the outputs to the address inputs), and a clock. This is a programmable device, you can radically alter behaviour by altering the PROM contents. Is this hardware or software? I'd say a bit of both, but what if you replace the PROM with something more old fashioned, like a diode matrix and some gates? And does it really matter?

\$\endgroup\$
0
\$\begingroup\$

As a former hardware, firmware and software engineer, the difference between hardware and software is perhaps down to an individual’s perspective.

A hardware engineer might view hardware as what they have to design and connect. The act of programatically configuring said hardware via one or more addressable registers or instructions is the realm of software. That’s the distinction.

A firmware engineer might look at the issue a little differently. If firmware is reading a value from an IO port, then that IO port is hardware. If the engineer is aware that the IO port is provided by an MCU and code on the MCU is generating that IO value, then the IO port is software configurable hardware.

A high-level software developer would view their Java bytecode running on a JVM as software. But the layers below that, which they may have no real awareness of, are often deemed to be hardware. A developer tends to see the platform that their code is running on as "the hardware", especially when it's containerised.

A few suggestions emerge from this line of thinking:

  1. Whether something is hardware or software is often relative to the perspective of the viewer.

  2. A black box that actually contains both hardware and software, but which behaves in a predictable manner, tends to be viewed as purely hardware by someone who has no awareness of what is happening inside that box.

  3. Hardware and Software are not always mutually interchangeable terms, as they have slightly different attributes. So trying to define if something is either one or the other isn't always possible or appropriate.

Perhaps hardware is something that normally behaves in a predictable manner, from the perspective of the viewer, while software can programmatically change that behavior? This seems to align more to the “hard” and “soft” epithets.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.