1

I have this word document

one could produce it manually, or download it from http://ge.tt/8zgZScd2 with firefox(chrome blocks that site for no good reason). Or you can download it from here

enter image description here

You see four highlighted sets of characters.

The first one is hebrew then english, in left to right mode. (LTR mode)

The second third and fourth, are english then hebrew, in right to left mode. (RTL mode)

I'm using ctrl with left shift to go to LTR mode, and ctrl with right shift to switch to RTL mode.

The hebrew letters are e.g. unicode \u05D0 א but any will do.

Here is the find and replace

enter image description here

So in the 'find' section, I have put in to find highlighted text. (that option exists if one clicks format..highlight, in the bottom left of that dialog box)

In the replace section I wrote XXX^&XXX in ms word ^& is ms word's way of saying what most would call \0 i.e. it's the text that was found. So my find and replace should be finding highlighted text and keeping it but putting an XXX before and after it.

Here is the result of that find/replace

enter image description here

As you can see, the first one worked. That's the one with hebrew then english and LTR mode.

The second third and fourth, failed. Those are the ones with english then hebrew in right to left mode.

I would like to get the find/replace to work for the last three, i.e. those second third and fourth, those with english then hebrew in right to left mode.

added

It looked a bit like it was wrapping the XXX around the hebrew but wasn't clear why. Though that isn't what is happening / how / why it's doing this.

Scott suggests using a replacement string of FOO^&BAR to try to determine more what's happening for the purposes of troubleshooting, and it does clarify what's happening. If you have אאאabc in LTR mode and you replace it with FOO^&BAR ms word sees that and shows as FOOאאאabcBAR which is all fine. If you put that in RTL mode it doesn't make FOOabcאאאBAR or BARabcאאאFOO. It makes abcBARאאאFOO(i.e. it mixed the FOO BAR into it). What it is doing, and I guess one can't fault ms word for doing this, it sees abcBAR as a chunk. Similarly(And more clearly), if you have abcאאא in RTL and you replace it with FOO^&BAR then it replaces it with abcBARאאאFOO because the end if where the abc is and the end of the abc is after the 'c', so it sticks the BAR there. I will consider what to do re this, but that's what's happening. Maybe there's some kind of null hebrew character or right to left character that I can put after 'c' that will 'fix' this so I can wrap FOO..BAR around it.

6
  • You may be able to debug this better if you try a more intelligent replacement string, like FOO^&BAR.  I guess that you will get “RABאאאFOOabcd”, but I don’t know — it might be “BARאאאFOOabcd”.  Either way, it’s sort-of doing what you asked — putting “FOO” before the “abcd” and putting “BAR” after the “אאא”.  Seeing which one it does might be educational.  P.S. What are you expecting to get?  Something that looks like “FOOאאאabcdBAR”?  What character sequence are you hoping to get that will look like that? Commented Aug 22, 2016 at 6:08
  • you say (among your possibilities for a replacement string of FOO^&BAR ) that it might be "BARאאאFOOabcd" It is. But I don't see how that helps in troubleshooting - if anything, that complicates it a little bit. I don't mind what string it puts on the left and what string it puts on the right. And as you see, It doesn't change the order of letters, so FOO remains FOO, אבג would remain אבג and BAR remains BAR, and I wouldn't want it to change the order of the letters.
    – barlop
    Commented Aug 22, 2016 at 8:55
  • @Scott you make an interesting point, re troubleshooting.. it has clarified what's happening, i've added an edit re that.
    – barlop
    Commented Aug 22, 2016 at 10:29
  • may be related- stackoverflow.com/questions/9613613/…
    – barlop
    Commented Aug 22, 2016 at 14:52
  • could save as xml and figure something out re that, including understanding the structure and using an xml parser
    – barlop
    Commented Aug 23, 2016 at 23:21

1 Answer 1

0

The behavior described above is correct by design and consistent with the implementation of bi-directional text support.

First, a solution to your requirement as I understand it.

If you want to wrap those RTL examples with some prefix and suffix and have them visually appear to the right and to the left of your original highlighted text, you can place an RLM control character after your prefix and it will behave the way you want.

You can add an RLM by clicking the "Special" button in the find/replace dialog and choosing RTL Mark, or you can just type ^r manually. I'm going to use the texts PRE and POST (instead of the XXX in your examples):

enter image description here

If you use this "Replace with" text with one of your RTL examples:

enter image description here

Then you get this result, which I think is the result you were seeking:

enter image description here

So what's happening?

In your RTL examples you have a text that consists of two parts or two "directional runs". The first is a LTR run (the "abcd" part) and the second is a RTL run (the Hebrew "אאא" part), all within a paragraph that has a RTL base direction.

When you add an LTR prefix (the first "XXX" in your replace example) to the LTR run, you're just making that run a bit longer, like adding a few more letters to the first word. Since LTR runs are drawn from left to right, those new characters appear where they should. If instead of "abcd" you used the word "stand" and then added the prefix "UNDER" the resulting word would have been "UNDERstand" (not "standUNDER").

To help deal with situations like this and manually solve some ambiguities, Unicode provides "control characters" which are invisible markers, each with its own role or effect.

The Right-to-Left Mark (RLM) behaves like a RTL character (imagine a zero width letter "א"). If we place that character right after our prefix, we effectively break that LTR run I described above with a Hebrew character. Now the text renders visually as you require, with the prefix appearing first, at the very right edge, then our invisible Hebrew character, and then the original "abcd".

3
  • I notice a few things A)When copy/pasting 200E or 200F from charmap, then ms word inserts a new line so it's easier to experiment with some things in notepad than word. B)my experiments to figure this out are simplified by the knowledge that when the base direction is LTR, then when writing a LTR language, no marker can make a difference, and when writing a RTL language, in LTR mode then only a LTR marker can make a difference. So, LTR marker is the only marker that can make a difference in LTR mode, and it only applies to RTL languages.
    – barlop
    Commented Dec 1, 2016 at 22:37
  • Similarly, when in RTL mode, the only marker that makes a difference is LTR marker, and when writing a RTL language.Those rules and e.g.(the ones following "B" make experiments much simpler.I also found that to figure this out it helps to(in notepad),type the find string,and then in the replace section,to type the replace string without even a ^& or ^r or ^s so manually typing it and copy/pasting the RTL or LTR marker. And u can do ctrl-right shift or ctrl-left shift in notepad's find-replace. With that figured out then i'm able2understand what is going on in word,or at least to do it in word
    – barlop
    Commented Dec 1, 2016 at 22:39
  • would you agree with what I wrote, particularly, about when the RTL and LTR markers apply(have an effect) and don't apply(don't have an effect)?
    – barlop
    Commented Dec 1, 2016 at 22:40

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .