4

How do I remove all lines containing any non-ASCII keyboard characters?

I tried so many times Regular Expressions codes but none work like it should be I even tried this code [^\x00-\x7F]+ but it didn't select all the characters

the idea come on my mind is to use this way [^a-z0-9``~!@#$%^&*()-_=+[]{}\|;:'"<>,./?] but still not work because some of this characters didn't get deselected like \ / | { } [ ] $ # ^ ( )

  1. If a line contains any characters not in the list below, I want to remove remove it or bookmark it

    0123456789`~!@#$%^&*()-_=+[]{}\/|;:'"<>,.?
    abcdefghijklmnopqrstuvwxyz
    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    
  2. Simple example: There are more characters like this found here: https://en.wikipedia.org/wiki/List_of_Unicode_characters

    0123456789`~!@#$%^&*()-_=+[]{}\|;:'"<>,./?
    abcdefghijklmnopqrstuvwxyz
    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    ¤©ª«¬¯°±²³´µ¶·¸¹º»¼½¾¿÷ÆIJŒœƔƕƋƕ
    ƜƝƢƸƾDžNJNjǽǾǼɁɀȾɎʒəɼʰʲʱʴʳʵʶʷʸˁˀˇˆ˟ˠ
    ˩˧Ͱͱͳʹͼͻͺ͵ͿΏΔΘΞΛΣΠΦΧΨΩΪΫάέήίΰαβδε
    θηκλμξπςρφχψωϊϋϏώϑϐϓϒϔϕϖϠϟϞϝϜϡϢ
    ϤϣϧϫϬϮϯϰϱ₠₡₢₣₤₥₦₧₨₩₪₫€₭₮₯₰₱₲
    ₳₴₵₶₷₸₹₺₻₼₽₾₿⅐⅑⅒⅓⅔⅕⅖⅗⅘⅙⅚⅛⅜
    ⅝⅞⅟℠℡™℣ℤ℥Ω℧ℨ℩KÅℬℭ℮ℯ⇀⇁ↀↁↂↃↄ
    ⇔⇕⇖⇗⇘⇙⇚⇛⇜⇝⇞⇟⇠⇡⇢⇣⇤⇥⇦⇧⇨⅀⅁⅂⅃⅄ⅅ
    ⅆⅇⅈⅉ⅊⅋⅌⅍ⅎ⅏ⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽ
    
  3. Expected result:

    0123456789`~!@#$%^&*()-_=+[]{}\|;:'"<>,./?
    abcdefghijklmnopqrstuvwxyz
    ABCDEFGHIJKLMNOPQRSTUVWXYZ
    
4
  • [^\x00-\x7F]+ works fine for me in Notepad++, it gives the expected result. What is your version of Npp (here, I have 7.5.1)? Did you check Regular expression?
    – Toto
    Commented Sep 28, 2017 at 11:38
  • Characters that are part of regular expressions (like [,],(,),#,^) need to be escaped. In Notepad++ you usually do this by prefixing them by a backslash. So [^a-z0-9``~!@#$%^&*()-_=+[]{}\|;:'"<>,./?] would become [\^a-z0-9``~!@\#\$%^&*\(\)-_=+\[\]{}\|;:'"<>,./?] (at east).
    – Seth
    Commented Sep 30, 2017 at 8:24
  • @Seth: The caret ^ in first position of the character class means a negation, if you escape it, it means ... a caret also parenthesis, pipe and other characters don't need to be escape but the dash - must be escaped as it means a range of characters.
    – Toto
    Commented Sep 30, 2017 at 10:25
  • @Toto Good point about the leading caret but you need to escape the others if you want to match them literally. This might be special for Notepad++ but with the above "simple example" it doesn't work if you don't escape them.
    – Seth
    Commented Sep 30, 2017 at 19:52

3 Answers 3

4

[^\x00-\x7F] works fine, but, if you want to use a long character class like [^a-z0-9``~!@#$%^&*()-_=+[]{}\|;:'"<>,./?] you have to escape characters that have a special meaning (ie. -[]\ and add linebreak \r,\n.

Your regex becomes:

 [^a-z0-9``~!@#$%^&*()\-_=+\[\]{}\\|;:'"<>,./?\r\n]
 #                    ^    ^ ^   ^            ^^^^

  • Ctrl+H
  • Find what: [^a-z0-9``~!@#$%^&*()\-_=+\[\]{}\\|;:'"<>,./?\r\n]+$ But, again, [^\x00-\x7F] works fine and is more readable
  • Replace with: LEAVE EMPTY
  • check Wrap around
  • check Regular expression
  • Replace all

Result for given example:

0123456789`~!@#$%^&*()-_=+[]{}\|;:'"<>,./?
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
  • Toto Thanks so Much you always give a good answer and helpful and match what the question talk about thanks and btw i know [^\x00-\x7F] is work fine but not with every single special characters but the first code u did helped me out to keep only what i want thanks so much helpful
    – user677589
    Commented Oct 1, 2017 at 16:13
0

If you are agnostic to the solution and not fixed to Notepad++ you could install bash for Win 10, as I showed here https://superuser.com/a/1252271/715210 (sorry I always come back to your questions with Linux workarounds ;) )

I would have a solution, where you unfortunately also will loose the apostrophe '

  1. open bash for Windows over start menu
  2. Go to the folder, where your file is located with cd /mnt/c/path/folder (the drive C: is on /mnt/c)
  3. If your file is named foo.txt you could generate a file bar.txt with this command:

    cat foo.txt | tr -cd '[:alnum:]\n\r~!@#$%^&*()-_=+{}\|;:<>,./?"`' | sed '/^$/d' > bar.txt

Explanation of the parts:

cat foo.txt outputs the text file and with the pipe | the output is redirected to the commande tr -cd which removes every char, which is not in the list after betwenn '...'. Followed by a pipe tosedto remove the empty lines. Last but not least with> bar.txt` we redirect the output to the file bar.txt

Thanks to:

3
  • i use only windows 7 and i have no option to use another windows only windows 2012 that i can use as well
    – user677589
    Commented Sep 28, 2017 at 19:06
  • Ok, then unfortunately I can't help you. If you have physical access to your Computer and you have the possibility to change the boot order, then you could create a USB stick with a live Ubuntu and run that commands with that; see that tutorial tutorials.ubuntu.com/tutorial/…
    – chloesoe
    Commented Sep 28, 2017 at 20:20
  • Related: stackoverflow.com/questions/11577720/… Commented Dec 7, 2020 at 9:25
0

In Notepad++ this is easy:

  1. menu Search > Mark...

  2. Find what: [^\x00-\x7F]
    ☑ Mark line
    (•) Regular expression

  3. Press Find All

  4. menu Search > Bookmark > Remove bookmarked lines

2
  • thanks for your trying help but i said in my question that the code [^\x00-\x7F] don't remove everything that i need because there unknown special characters this code don't read it anyway Toto helped me out thanks for ur trying
    – user677589
    Commented Oct 1, 2017 at 16:15
  • @DeathRival – no problem, for me, all the above steps worked 100%, turning sample #2 into #3. Of course, you can use what you did in accepted answer, but this one is faster and more effective. (I bet you did not try the steps above :)
    – miroxlav
    Commented Oct 1, 2017 at 20:01

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .