Many small CPUs available and used today for embedded designs do not have an onboard floating point unit - most of the AVR and PIC series, MCS51, some ARM ...
8 bit single-chip microprocessors were meant at least as much, if not more, for the same market that microcontrollers and embedded CPUs target today. In that market, cost and power efficiency are paramount, and a lot of such applications are about controlling equipment and not doing heavy computing.
The same might even apply to early 16/32 bit CPUs - the users that could afford the early examples would not that often have built general purpose desktop computers with them - but had aircraft, industrial equipment, laboratory equipment to equip.
The first CPUs that seem to be tailored ALMOST ONLY to desktop/server/workstation computing are probably those of the 486/early Pentium era: Complex electrical and cooling requirements, multimedia optimizations, consumer-focused marketing...
Mind that floating point can ALWAYS be done in software on an integer only CPU, but with significantly lower performance.
Also, in an embedded application, you would usually have all your data inputs (eg from ADCs driven by sensors, or video data from a digitizer) represented as integers, and try to scale all your calculations accordingly so that floating point math is not NEEDED - and once you can break the problem down to fixed point math, you can simply scale your values by multiplication with either powers of 10 or powers of 2 (the latter being very efficient, but needing some more computing to make end result numbers presentable to a decimal-system user) and work with integers.
Using floating point numbers for controlling equipment is sometimes a bad idea anyway, since there are more complex, bug-prone rules about which exact values a floating point data type can actually represent - which can give you interesting surprises when trying to make decisions by comparing numbers: Two seemingly different inputs (or a too-large number vs the same number given a too-small increment) might suddenly be represented identically, and always compare as identical, which is just perfect if nonequality is your loop exit condition. Also, some floating point formats know one or more "undefined" values, which also might be treated in ways that break expectations like "(a<b && a>b) will always be false"
. Add some of these features being subtly implementation dependent, and the fact that you are using someone else's implementation when you use hardware floating point, and try to make code reliably portable... and there is your perfect storm.
The same is true of early PC applications - games could work with fixed point math since they did not really have to work with arbitrary number inputs or high precision, database handling software did not need floating point for anything, spreadsheets were usually not used to handle large AMOUNTS of floating point numbers - and currency can be well expressed in fixed point formats (->integer calculations behind the scenes). Same applied to early multimedia formats.
Users that DID need the capability - eg for precision CAD, scientific software, large spreadsheets - bought math coprocessors.
Also, there was likely a chicken-and-egg effect for a while: Since coprocessors weren't that widespread, software authors (of software that did not NEED the performance) often did not bother including support for them even if there was some performance advantage - so no one bought a coprocessor since most of their software made no use of it.
Also, increasing the die size of an IC can cause you big problems when your processes are not yet really high yield. Say you put 10 combined chips on one wafer. You put 10 CPUs and 10 coprocessors each on another. Now you discharge a shotgun at both wafers, each with a load that will blast 10 random holes through each wafer (and silicon defects seem to work just like that). You will not be selling many combined chips.