0

I have this html tags:

<p class="BEBE">着名的文学评论家Love有一些重要的东西来说,关于总是分享胜利的人才,转向他们的起源:</p>

<p class="BEBE">着名的文学评论家 有一些重要的东西来说,关于总是分享胜利的人才,kiss 转向他们的起源:</p>

So I must find all lines with at least one english word from the tags whose content is written in another language (cz - chinesse for example)

But I don't wanna find this: (because doesn't have english words)

<p class="BEBE">某些,真正的经济学,真正预测的是神圣的本质</p>

My regex doesn't work, seems to find all tags:

FIND: <p class="BEBE">.*[^\x00-\x7F]+.*</p>

Or, this regex finds only those html tags that contains only chinesse words, without english.

FIND: <p class="BEBE">+(?!\w+[\x00-\x7F]).*</p>

But I need only those tags that contains at least on english word

3 Answers 3

2

The Solution, thanks @Toto

FIND: <p class="BEBE">+(\w+[\x00-\x7F]).*</p>

also, If you want to skip the tags that contain <em> or </em>

FIND: <p class="BEBE">+(?!\w+</em>)+\w+(\w+[\x00-\x7F]).*</p>

or

FIND: <p class="BEBE">+(?!\w+<em>).*(\w+[\x00-\x7F]).*</p>

1

You have extra spaces in your regex:

<p class="BEBE">.* [^\x00-\x7F]+ .*</p>
#         here ___^    and   ___^

remove them:

<p class="BEBE">.*[^\x00-\x7F]+.*</p>

Screenshot:

enter image description here

enter image description here

5
  • yes @Toto, I update the post. But, the problem was that this regex seems to find all html tags that contains chinese words, but I need to find only those tags that contains at least one english word
    – user706401
    Commented Jun 4, 2021 at 10:54
  • I manage to make a regex that will find only chinesse words, but I need the oposite solution. To find only those tags that contains at least one english word: <p class="BEBE">+(?!\w+[\x00-\x7F]).*</p>
    – user706401
    Commented Jun 4, 2021 at 11:14
  • @RobRob: Works for me. Have you UNchecked . matches newline?
    – Toto
    Commented Jun 4, 2021 at 11:14
  • yes, I don't use .matches newline. But my last regex, if I can find on internet the ANSII /UTF-8 regex, I believe the problem will be fix
    – user706401
    Commented Jun 4, 2021 at 11:19
  • I find the solution, post it. Thanks ;)
    – user706401
    Commented Jun 4, 2021 at 11:22
0

some other solutions can be found HERE:

FIND: <p class="BEBE"><em>.*[\x{4E00}-\x{9FFF}\x{FF00}-\x{FFEF}]</em></p>

OR

FIND: (?<=<p class="BEBE"><em>)[\x00-\x7F]+?(?=</em>)

You must log in to answer this question.