
I am short of 1 UART in my MCU. I needed 4 but so far what I have found suitable is an MCU that has 3 UARTs in it, STM32F103. So the 4th one I will have to implement in SW.

Each of my individual UARTs are either doing RX or TX operations only. So now I have an option to implement the SW UART for ONLY RX operation or ONLY TX operation.

Which of the two should be implemented in SW, a UART which is only doing RX operation or a UART which is only doing TX operation?

  • 24
    \$\begingroup\$ Why just not reuse TX or RX side of hw uart that's not used in other ports? \$\endgroup\$
    – Vlad
    Commented Oct 14, 2019 at 9:24
  • 2
    \$\begingroup\$ Wowowooo thats a lovely thing. Good, Thumbs UP. If you can make it an answer that would be even better. \$\endgroup\$
    – alt-rose
    Commented Oct 14, 2019 at 9:32
  • 1
    \$\begingroup\$ It will work if the speeds on TX and RX channels are equal, and it's already a part of the @MichelKeijers answer. \$\endgroup\$
    – Vlad
    Commented Oct 14, 2019 at 9:37
  • 1
    \$\begingroup\$ There are many STM32 models with 4 (or more) USART's (an USART has an additional optional clock pin; if disabled, it is equivalent to UART). For example, the STM32F072RB has 4 USARTs. \$\endgroup\$
    – Erlkoenig
    Commented Oct 16, 2019 at 7:35

4 Answers 4


I'm not an electronics engineer, but I would go for using the TX operation as a software UART.

For an RX operation, buffering is needed, and interrupts are needed not to miss information. This is typically handled by a hardware UART.

For a TX operation, you only need to send information, which is happening when you want it (for receiving you don't know beforehand when data will be received).See hooskworks's comment (the right term is if the call can be blocked, it's easy by a software UART).

In case the UART speeds are equal, you can use one UART both for RX and TX. Even if the speeds are not equal and you know you don't receive anything while you send, you could switch speeds meanwhile probably.

  • 3
    \$\begingroup\$ That would be my analysis of the problem too so i think TX'ing on demand, especially if you can TX in a blocking way when you need to, is easier from a software point of view. \$\endgroup\$
    – hooskworks
    Commented Oct 14, 2019 at 9:26
  • 3
    \$\begingroup\$ Luckily the speeds on RX and TX UARTs are same. So it will be good to use an un-used TX pin as my 4th UART port. \$\endgroup\$
    – alt-rose
    Commented Oct 14, 2019 at 9:52
  • 1
    \$\begingroup\$ Yes, you can even just continue receiving for the shared UART while you are sending. \$\endgroup\$ Commented Oct 14, 2019 at 9:58

It's far simpler to implement a UART transmission in software because you just bit-bang the output port until the bytes are sent. To implement a receiver, you have to do multiple checks on the bits as they arrive (such as waiting for the start bit) and parity checking and usually, you have to run at a much higher processing rate to ensure you can cope with clock speed variations between the remote transmission source and your local receiver.

This latter part is to avoid the misreading of data; a typical receive system will run its clock at approximately 16x the known baud rate and sample the data stream mid-symbol to ensure maximum data integrity. The transmit and receive clocks need to be reasonably similar in case there are few data transitions. Data transitions help re-sync the mid-symbol counter. Below are examples of good similarity in RX-TX clock frequencies followed by a scenario where the receive clock runs much too slow: -

enter image description here

In the latter example you should be able to see that symbol sampling has drifted off to a point where the 4th bit is sampled instead of the 3rd bit. Of course, if there are plenty of bit transitions, the receive clock can be re-synchronized on the fly and this problem is reduced but, with a UART transmission, you cannot avoid all the bits (apart from start) being high or, worst still, all the bits being low.

  • \$\begingroup\$ It seems to me that resyncing part way through a frame is going to be impossible for bytes where there are no bit transitions in the early bits, meaning you'll still corrupt such bytes so it wouldn't be worth it. Do any hardware UARTs actually do it? If so I'd be eager to learn more about how it works if you can link to data sheets or textual rundowns \$\endgroup\$
    – tom r.
    Commented Jul 10, 2021 at 14:25
  • \$\begingroup\$ @tomr. what do you mean by frame? Note also that each byte in a UART transmission contains a start bit and at least one stop bit hence, right at the beginning of a byte there is a transition. \$\endgroup\$
    – Andy aka
    Commented Jul 10, 2021 at 14:30
  • \$\begingroup\$ By frame I mean the entire sequence of a start bit, data bits and any parity and stop bits. You can't synchronise to the transmit rate on a single transition such as from idle high to start bit low, you'd need another transition to be able to measure the time between. If the next transition doesn't happen until late in the frame the sync may already be so far out that it's out by a whole bit. It seems that if the sync is out by enough that it can skip or repeat a bit during receive, all bets are off and resyncing part way through the frame would only work for some byte values. \$\endgroup\$
    – tom r.
    Commented Jul 10, 2021 at 14:38
  • \$\begingroup\$ I'm not sure what point you are trying to make then. UARTs need to know the baud rate else all bets are off. That's the point of my answer and I'm unsure what you are driving at @tomr. \$\endgroup\$
    – Andy aka
    Commented Jul 10, 2021 at 14:42
  • \$\begingroup\$ My point is, I'm trying to understand how a UART that is receiving could correct for sync mismatches of the magnitude shown in the bottom diagram based on measuring the timing between bit transitions mid-frame. To me it seems impossible because it wouldn't always have enough information. If the clocks are out by the magic ~5% value required for a sample to jump a half bit either way, and you detect a bit transition occuring at the "wrong" time late in the frame, it wouldn't be possible to know for sure which direction to correct it. Do you know of UARTs that do this? \$\endgroup\$
    – tom r.
    Commented Jul 10, 2021 at 15:01

In my experience there is little difference between implementing UART RX or TX via bit-banging. UART is the hardest protocol to implement via bit-banging because it is very time sensitive. SPI and I2C allow clock stretching, but in UART every bit has to be sent or read at the a precise time to avoid corruption. Interrupts, pipeline stalls and cache misses may screw your timing and cause corruption.

The most reliable way to implement UART via bit-banging is to disable interrupts and sit in a tight loop. For TX the loop will poll some high precision timer, like DWT_CYCCNT, then toggle GPIO at required times. For RX the loop will do the same, except it will read GPIO at required times. TX has one advantage in this case: it can re-enable interrupts between bytes, but RX has to stay in the reading loop forever. But keeping interrupts disabled for such a long time is a very bad idea. Your system will not be able to do anything else other than reading/writing UART.

You may try to run RX/TX loop with interrupts enabled and then, once in a while, an interrupt will happen and that will result in a corrupted byte. When sending, you may detect such bad cases (by measuring time that it took to send entire byte). If that time exceeds an expected time, you know that you were interrupted and then can resend the entire packet. You will need to design a protocol to detect corruption.

On the Web I see samples that implement UART via interrupts. For TX they use a timer interrupt and write bits from ISR. For RX they enable edge detect interrupt and record the time in ISR. But that approach can fail in so many ways: other interrupts happening, code executing with interrupts disabled, etc.

But all of the above is not needed since you are using STM32. On STM32 you can use timers together with DMA to simulate UART. For example: "Implementing an emulated UART on STM32F4 microcontrollers" https://www.st.com/content/ccc/resource/technical/document/application_note/1d/61/52/64/ea/ee/42/4e/DM00110292.pdf/files/DM00110292.pdf/jcr:content/translations/en.DM00110292.pdf Here they use timer to drive DMA to send/receive bits at precise moments of time.

There is another way: configure timer capture/compare channels and attach DMA. For RX: timer will capture exact moments of time when input signal was toggled and DMA will save these time values to a buffer. For TX: DMA will read time values from a buffer and timer will toggle output at these moments. Personally I did not try that latest approach yet, but I did something similar, so this should work as well.


Soft UART For STM32:

In this example we virtualize 6 UART full duplex in baud rate 9600. All UART work together parallelly!



Not the answer you're looking for? Browse other questions tagged or ask your own question.