Since you mentioned the encoding is us-ascii, we can assume each character is one byte. In regex, the '.' matches any character, except newlines, and you want each individual part of a CR/LF newline to be matched separately, since they are two bytes.
I'm also going to make the assumption that you are processing actual text data, and not a binary file that can contain bytes outside of the us-ascii character mapping.
If all of the above is true, you can use the following regex:
\x0C[^\xFF]{318}
The reason the '.' didn't work in your attempt, is because the '.' does not match newlines. You also can't use \x0C[.\r\n]{318}
, because the '.' wildcard is not available within a character class (square bracket group). The Hex value FF does not map to any valid codepoint inside the us-ascii character set, and hence when you look for "any character that is not the FF character", you will be taking bytes into consideration.
Keep in mind that this method counts windows/mac Newlines as two characters/bytes (per your request).
Hope this is what you were looking for...
EDIT - Regex explained
Full expression
\x0C[^\xFF]{318}
Let's break this down.
\x0C
This matches a Single Unicode Grapheme, you can find more information on this over here. In summary, You can consider \x the Unicode version of the dot, except that it can also match line-breaks (this is important, more on this later).
But, since you also used this, I'm guessing you're already partly familiar with this.
[^\xFF]
Everything between [] is called a Character Set (not to be confused with the same concept in Character encoding). You can read more about it on Regexp Tutorial, but in summary, it serves as an "OR" statement. [ab] simply means, "a or b". When ^ is used inside a character set, it serves as a negation. So [^a] means "not a". In our use-case, we look for any character that is not the HEX value FF.
{318}
And we look for this kind of character, 318 times. The {} syntax always applies to the Regex element just in front of it, so in this case the [^\xFF] Character set.
Why \xFF?
In Hexadecimal notation, the us-ascii character set goes from 00 up to 7E. Any value higher can not be mapped to a us-ascii codepoint. This means that any file encoded (correctly) in us-ascii, can only contain HEX values between 00 and 7E. As a result, it can't contain FF.
So, we can cleverly make use of this to search for any character including newline characters, since \x.. also matches newlines like \x0A and \x0C. When we search for any character that is not FF, we end up finding every character.
Keep in mind that this solution is dependant on the fact that your file is encoded in us-ascii, and not UTF-8.
.
so I updated and now everything works great... I don't really know what to do with my question now though.