TL;DR
As a pure answer to your question, I would say yes, the medium does introduce harmonics, but they are of negligible energy compared to the distortion induced by other pieces of the chain.
You've already got a good answer but I'd like to add a bit of details here.
Digital domain
First of all, let's clarify the fact that your software and, up to a certain point, your hardware do not generate a sinewave. They do generate, store and transmit the digital (discrete time and amplitude) representation of a sinewave with some specific coding scheme (there's quite some of them). So, until the "sinewave" reaches the Digital-to-Analogue converter it does not constitute a sinewave and to this extend, it characteristics ([central] frequency, bandwidth, envelope/temporal evolution) are described solely by the sample values in the digital domain.
Just to provide some insight, the bandwidth is constrained by the duration of the envelope, the central frequency by the sampling rate/clock and more (please note that these are not the only factors affecting the aforementioned parameters).
Analogue domain
Once the digital signal reaches the Digital-to-Analogue converter it will "become" an analogue signal (most probably voltage) to be transmitted to the amplifier that will "feed" the loudspeaker the current necessary to move the cone (please note again the simplification of the whole process).
The digital signal does possess all the information needed for the analogue signal to be an exact representation of the "intended" signal. Alas, in order to create an exact analogue representation of the signal described in the digital domain you'd need to use a brick-wall reconstruction filter (see Wikipedia page for more information on reconstruction filters). This filter has an infinite impulse response both in the positive and negative direction in time (non-causal), making it non-realisable. Thus, we have to resort to other solutions, with two common ones being a step reconstruction converter (zeroth-order reconstruction) or Pulse Code Demodulation (PDM) techniques. Although their imperfect frequency response is somewhat compensated before this stage is reached (their transfer function is known during the design process and is inverted), they do exhibit non-linearities both in the software/firmware and hardware components.
The results here are, distortion (add spectral components, not necessarily harmonics) due to non-linearities, noise due to finite word precision in the digital representation which translates to (hopefully "nicely" distributed) noise and the probability of allowing aliased frequencies in the resulting spectrum (depends highly on the spectral content of the digital signal and is most often of no significant margin - if it exists).
On the road to sound
The next step is amplification to give it to the loudspeaker cone for "further transmission". To my (limited) knowledge, there's no completely linear amplifier today. There's very well designed amplifiers, (class A, AB, H and even D is getting better nowadays) but still none of them is completely linear. Thus, in this stage the signal will get distorted and the "level" (I do not refer to the audio level) of distortion depends highly on the hardware (design, topology, etc.) and the signal (if it clips your in bad luck).
Next step is the notorious loudspeaker! This, most probably is the most non-linear piece of the signal chain. There's a whole bunch of non-linearities here, in the magnetic induction (not the entire length of the voice-coil is in the magnetic field all the time), there's highly non-linear behavior in the mechanical, moving parts of the assembly very prominent in the high-level/excursion regimes. Lately, the use of "exotic" materials (or meta-materials) has complicated things a bit more (sometimes their use improves some things while in other cases it does not). Like this was not enough, there's drifting behavior to the materials due to excessive heat (the temperature of the voice coil can reach $200 ^{o}C$ to $300 ^{o}C$ and its impedance can exhibit significant changes), which causes even more complex, non-linear behavior that changes with time
Sound at last
The final step is for the signal to be transformed to movement of air particles and pressure variations. Now, air is generally treated as a linear medium, which means the spectral content of the signals travelling in the medium cannot change. Of course, no linear material/medium exists. For example, consider the particle velocity representation of a travelling monochromatic plane wave. The particles on the positive half-period will have higher speed than the rest and the pressure would be accumulated on this positive part, steepening the positive part of the wave, leading to a discontinuity in the medium (please note that this is oversimplified here just to make my point). This phenomenon, although true, has negligible effect on the resulting wave, as its effects are countered by other dissipative mechanisms.
All in all, the air does not have a significant effect on the spectral content of a signal, except maybe for the frequency dependent attenuation, which is apparent for rather large distances (in the order of $100 ~ m$).
Summary
From all the pieces in the signal's chain, air is not the one that has the most detrimental effects. The Digital-to-Analogue conversion, the amplification and the loudspeaker are the parts that affect the signal the most. If I had to pick one of those it would be the electro-mechano-acoustical transduction of the loudspeaker with closed eyes (you see that there's two transductions taking place traversing three different domains - electrical, mechanical and acoustical - resulting in distorted signals).
As a pure answer to your question, I would say yes, the medium does introduce harmonics, but they are of negligible energy compared to the distortion induced by other pieces of the chain.