8
\$\begingroup\$

In my UART communication I need to know the start byte and the stop byte of the message sent. The start byte is easy but the stop byte, not so much. I have implemented two stop bytes at the end of my message, that is \n and \r (10 and 13 decimal). UART only works on bytes 0-255 values so how fail-safe is this? I can imagine, though low probability, that my message might contain the values "10 and 13" after each other when they are not the stop bytes.

Is there a better way to implement this?

\$\endgroup\$
16
  • 8
    \$\begingroup\$ To send arbitrary data you either have to go to using packets or byte stuffing. In your case the probability of the pattern appearing in a certain location is 1/65536. Which gets to 1 if you have a long enough random data stream. \$\endgroup\$
    – Oldfart
    Commented Apr 18, 2019 at 10:36
  • 4
    \$\begingroup\$ Can you provide context please. Stop bits are part of UART communication but stop bytes? This sounds like a pure software issue and depends what has been agreed by the sender and receiver. \$\endgroup\$ Commented Apr 18, 2019 at 10:39
  • 2
    \$\begingroup\$ @MariusGulbrandsen if your data is truly arbitrary and not strictly text (think ASCII) then null termination will not work; you will have to implement a packet. \$\endgroup\$ Commented Apr 18, 2019 at 10:47
  • 5
    \$\begingroup\$ BTW: That common practice is to put the carriage return before the line feed: "\x0D\x0A". \$\endgroup\$ Commented Apr 18, 2019 at 21:16
  • 4
    \$\begingroup\$ @AdrianMcCarthy I think the point of reversing it is to minimize the odds of it being a valid sequence. That said, two Windows line-endings in a row would give you \r\n\r\n which contains the \n\r sequence in the middle... \$\endgroup\$
    – Mike Caron
    Commented Apr 19, 2019 at 16:05

5 Answers 5

15
\$\begingroup\$

There are different ways to prevent this:

  • Make sure you never send a 10/13 combination in your regular messages (so only as stop bytes). E.g. to send 20 21 22 23 24 25:

20 21 22 23 24 25 10 13

  • Escape 10 and 13 (or all non ASCII characters with an escape character e.g. . So to send 20 21 10 13 25 26 send: (see comment of/credits for: DanW)

20 21 1b 10 1b 13 25 26

  • Define a packet when sending messages. E.g. if you want to send message 20 21 22 23 24 25 than instead add the number of bytes to sent, so the package is:

< nr_of_data_bytes > < data >

If your messages are max 256 bytes send:

06 20 21 22 23 24 25

So you know after receiving 6 data bytes that is the end; you don't have to send a 10 13 afterwards. And you can send 10 13 inside a message. If your messages can be longer, you can use 2 bytes for the data size.

Update 1: Another way of defining packets

Another alternative is to send commands which have a specific length and can have many variances, e.g.

10 20 30 (Command 10 which always has 2 data bytes)

11 30 40 50 (Command 11 which always has 3 data bytes)

12 06 10 11 12 13 14 15 (Command 12 + 1 byte for the number of data bytes that follow)

13 01 02 01 02 03 ... (Command 13 + 2 bytes (01 02 for 256 + 2 = 258 data bytes that follow)

14 80 90 10 13 (Command 14 that is followed by an ASCII string ending with 10 13)

Update 2: Bad connection/byte losses

All of the above only work when the UART line is sending bytes correctly. If you want to use more reliable ways of sending, there are also many possibilities. Below are a few:

  1. Sending a checksum within the package (check google for CRC: Cyclic Redundancy Check). If the CRC is ok, the receiver knows the message has been sent ok (with high probability).
  2. If you need a message to be resent, than an acknowledgement (ACK/reply) mechanism needs to be used (e.g. sender sends something, receiver receives corrupt data, sends a NACK (not acknowledged), sender can than send again.
  3. Timeout: In case the receiver does not get an ACK or NACK in time, a message needs to be resend.

Note that all above mechanism can be simple or as complicated as you want (or need) to be. In case of resending message, also a mechanism for identifying messages is needed (e.g. adding a sequence number into the package).

\$\endgroup\$
6
  • 1
    \$\begingroup\$ "Make sure you never send a 10/13 combination in your regular messages (so only as stop bytes)." – you've not said how to send data which does include a 10/13 combination – you need to escape it. So "20 10 13 23 10 13" might be sent as "20 1b 10 1b 13 23" with 1b as your escape character. \$\endgroup\$
    – Dan W
    Commented Apr 18, 2019 at 18:33
  • 1
    \$\begingroup\$ Note that using a length field as proposed, you’ll get in trouble when your serial link is bad and loses a single byte. Everything will go out of sync. \$\endgroup\$ Commented Apr 18, 2019 at 19:55
  • \$\begingroup\$ @DanW If you use the first one or 2 bytes as the number of data bytes, it does not matter if 10 or 13 are part of those data... So 20 10 13 23 10 13 can be send as 06 20 10 13 23 10 13 where 06 is the number of data bytes that follow. \$\endgroup\$ Commented Apr 18, 2019 at 22:08
  • \$\begingroup\$ @MichelKeijzers - yes, but that’s the second solution you mention. Your first solution is missing an explanation of escape sequences to prevent the stop bytes being transmitted. \$\endgroup\$
    – Dan W
    Commented Apr 18, 2019 at 22:25
  • \$\begingroup\$ Both approaches work, and are commonly used, but they have different advantages and disadvantages, which you could add if wanted, though it’s beyond what the OP asked for. \$\endgroup\$
    – Dan W
    Commented Apr 18, 2019 at 22:27
14
\$\begingroup\$

How fail-safe is \n\r as stop bytes?

If you send send arbitrary data -> probably not fail-safe enough.

A common solution is to use escaping:

Let's define that the characters 0x02 (STX - frame start) and 0x03 (ETX - frame end) need to be unique within the transmitted data stream. This way the start and the end of a message can be safely detected.

If one of these characters should be send within the message frame, it is replaced by prefixing an escape character (ESC = 0x1b) and adding 0x20 to the original character.

Original character replaced by

0x02 -> 0x1b 0x22  
0x03 -> 0x1b 0x23  
0x1b -> 0x1b 0x3b  

The receiver reverses this process: Anytime he receives an escape character, this character is dropped and the next character is subtracted by 0x20.

This only adds some processing overhead but is 100% reliable (assuming no transmission errors occur, which you could/should verify by additionally implementing a checksum mechanism).

\$\endgroup\$
5
  • 1
    \$\begingroup\$ Nice answer. The common escape character used for ASCII protocols was '\x10' DLE (Data Link Escape). Some of the Wikipedia pages suggest that DLE was often used in the opposite way: to say that the next byte was a control character rather than a data byte. In my experience, that's generally the opposite meaning for an escape. \$\endgroup\$ Commented Apr 18, 2019 at 21:23
  • 2
    \$\begingroup\$ One thing to watch our for here is that your worst case buffer size doubles. If memory is really tight that might not be the best solution. \$\endgroup\$
    – TechnoSam
    Commented Apr 18, 2019 at 21:23
  • 1
    \$\begingroup\$ @Rev What's the rationale for adding 0x20 to the original character? Wouldn't the escaping scheme work without that just as well? \$\endgroup\$ Commented Apr 19, 2019 at 21:02
  • 1
    \$\begingroup\$ @NickAlexeev: It is easier/faster to identify the actual frame boundaries if you remove any other occurrence of the reserved chars from the stream. That way, you can seperate frame reception and frame parsing (including the un-escaping). This may be especially relevant, if you have a very slow controller without FIFO and/or high data rates. So you can just copy the incoming bytes (between STX/ETX) into the frame buffer as they arrive, mark the frame as complete and do the processing with lower priority. \$\endgroup\$
    – Rev
    Commented Apr 21, 2019 at 18:08
  • \$\begingroup\$ @TechnoSam: Good point. \$\endgroup\$
    – Rev
    Commented Apr 21, 2019 at 18:08
6
\$\begingroup\$

You know, ASCII already has bytes for these functions.

  • 0x01 : start of heading -- start byte
  • 0x02 : start of text -- end headers, begin payload
  • 0x03 : end of text -- end payload
  • 0x04 : end of transmission -- stop byte
  • 0x17 : end of transmission block -- message continues in next block

It also has codes for various uses inside the payload.

  • 0x1b : escape (escape the next character -- use in payload to indicate next character is not one of the structure describing codes used in your protocol)
  • 0x1c, 0x1d, 0x1e, 0x1f : file, group, record, and unit separator, respectively -- used as simultaneous stop and start byte for parts of hierarchical data

Your protocol should specify the finest granularity of ACK (0x06) and NAK (0x15), so that negative acknowledged data can be retransmitted. Down to this finest granularity, it is wise to have a length field immediately after any (unescaped) start indicator and (as explained in other answer(s)) it is wise to follow any (unescaped) stop indicator with a CRC.

\$\endgroup\$
2
  • \$\begingroup\$ I will be sending arbitrary data, I guess it might have been confusing to use "\n\r" in my question when I'm not sending ASCII data. Even though, I like this answer, it's very informative on sending ASCII over UART \$\endgroup\$
    – C. K.
    Commented Apr 19, 2019 at 21:50
  • \$\begingroup\$ @MariusGulbrandsen : As long as your protocol establishes where payload is and which codes must be escaped in each payload section, you can send anything, not just text-ish data. \$\endgroup\$ Commented Apr 19, 2019 at 22:57
4
\$\begingroup\$

UART is not fail-safe by its very nature - we are talking about 1960s technology here.

The root of the problem being that UART only syncs once per 10 bits, allowing a lot of gibberish to pass between those sync periods. Unlike for example CAN which samples every individual bit multiple times.

Any double bit error occurring inside the data will corrupt an UART frame and pass undetected. Bit errors in start/stop bits may or may not get detected in the form of overrun errors.

Therefore, no matter if you use raw data or packets, there is always a probability that bit flips caused by EMI result in unexpected data.

There exist numerous ways of "traditional UART quackery" to improve the situation ever so slightly. You can add sync bytes, sync bits, parity, double stop bits. You could add checksums that count the sum of all bytes (and then invert it - because why not) or you could count the number of binary ones as a checksum. All of this is widely used, wildly unscientific and with a high probability of missing errors. But this was what people did from 1960s to 1990s and lots of weird things like these lives on today.

The most professional way to deal with safe transmission over UART is to have a 16 bit CRC checksum at the end of the packet. Everything else isn't very safe and has a high probability of missing errors.

Then on the hardware level you can use differential RS-422/RS-485 to drastically improve ruggedness of the transmission. This is a must for safe transmission over longer distances. TTL level UART should only be used for on-board communication. RS-232 should not be used for any other purpose but backwards compatibility with old stuff.

Overall, the closer to the hardware your error detection mechanism is, the more effective it is. In terms of effectiveness, differential signals add the most, followed by checking for framing/overrun etc errors. CRC16 adds some, and then "traditional UART quackery" adds a little bit.

\$\endgroup\$
11
  • 7
    \$\begingroup\$ This advice is fairly tangential - you haven't actually addressed the question asked. In particular, your proposed solutions may solve other problems, but they do not solve the basic problem of the question on this page, which is confusion between framing byes and payload byes. At most, your proposal would reject valid data embedding a framing byte due to CRC or similar failure, with no way to communicate such. \$\endgroup\$ Commented Apr 18, 2019 at 14:28
  • 3
    \$\begingroup\$ In fact, this answer makes it worse. The original had just data bytes and stop bytes. This adds a third category, CRC bytes. And as presented here, those can take on any value, including {10,13}. \$\endgroup\$
    – MSalters
    Commented Apr 18, 2019 at 15:53
  • 1
    \$\begingroup\$ @MSalters: The CRC can be ASCII encoded hex to prevent this issue. Another trick that I've seen on RS485 is to set bit 7 on the start / address byte. \$\endgroup\$
    – Transistor
    Commented Apr 18, 2019 at 16:03
  • \$\begingroup\$ Re "CAN which samples every individual bit multiple times.": The actual sampling of the bit value is only once per bit. What are you referring to here? Some kind of error checking, like by the sender? Clock synchronisation? \$\endgroup\$ Commented Apr 19, 2019 at 12:33
  • \$\begingroup\$ The inverting of the checksum was done so that summing the entire block of data would result in a zero, which is a bit easier to code and a bit faster to execute. Also, CRC is much better than you make it out to be, look it up in the Wikipedia. \$\endgroup\$
    – toolforger
    Commented Apr 19, 2019 at 22:04
0
\$\begingroup\$

... I can imagine, though low probability, that my message might contain the values "10 and 13" after each other when they are not the stop bytes.

A situation when a portion of data is equal to terminating sequence should be considered when designing the format of a serial data packet. Another thing to consider is that any character can get corrupted or lost during transmission. A start character, a stop character, a data payload byte, a checksum or CRC byte, a forward error correction byte aren't immune to corruption. The framing mechanism has to be able to detect when a packet has corrupt data.

There a several ways to approach all this.

I'm making the working assumption that packets are framed only with the serial bytes. Handshake lines aren't used for framing. Time delays aren't used for framing.

Send packet length

Send the length of the packet in the beginning, instead of [or in addition to] the terminating character at the end.

pros: Payload is sent in a efficient binary format.

cons: Need to know the packet length at the start of the transmission.

Escape the special characters

Escape the special characters when sending the payload data. This is already explained in a an earlier answer.

pros: Sender doesn't need to know the length of the packet at the beginning of the transmission.

cons: Slightly less efficient, depending on how many payload bytes need to be escaped.

Payload data encoded such that it can't contain start and stop characters

The payload of the packet is encoded such that it can't contain the start or stop characters. Usually, this is done by sending numbers as their ASCII or Hex-ASCII representation.

pros: Human-readable with common terminal programs. No need for code to handle escaping. No need to know the length of the packet at the start of the transmission

cons: Lower efficiency. For one byte of payload data, several bytes are sent.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.