3

I try to hard-code subtitles in a video, by re-encoding the video with the subtitles as graphics (and NOT as an additional stream).

The subtitles are in Hebrew / Arabic.

(All texts are UTF-8 encoded. This is NOT a text-encoding / character-set problem.)

I'm open to other command-line tools, or programmatic solutions.

The problem:

I want to be able to set sentence direction, for RTL subtitles. ffmpeg doesn't seem to consider the directionality, so the result text is broken. It is being written visually, but not logically:

  1. punctuations are written at the wrong side of the sentence,
  2. sentense parts are broken: e.g. <hebrew-text-A> <LATIN-TEXT-B> <heb-text-C> is written: <heb-text-C> <LATIN-TEXT-B> <hebrew-text-A>

The ffmpeg command:

ffmpeg -i "nosubs.avi" -vf "subtitles='subs.srt'" withsubs.mp4

What I tried:

  • I have tried the following subtitle formats: srt, ass, ssa, vtt, ttml
  • I tried to find RTL-related attributes or tags in the subtitles format; So far I have found none; TTML manuals do mention complex directionality-related, but I couldn't find examples, or where/how to implement them
  • I have tried to set the language in the subtitles, where the format allows (ttml)

partial success:

  • I added manually a unicode control character, x200F, at the end of sentences/clauses. Which fixes the end-of-sentence issue(#1), but not the broken sentence(#2).

Another problem, is that I have to find a programmatic way to insert these control characters, which might take me days to do.

Possible solution directions

  • another unicode control character which controls whole sentences(???)
  • a hidden ffmpeg switch to control langauge / directionality, RTL setting
  • specific subtitle-format setting, which is honored by ffmpeg.

examples:

  1. this video shows 1) broken sentence flow due to mixed hebrew/latin words, 2) some sentences begin with panctuation (which is wrong). 3) other sentences seem corect because i added control chars at the end of the sentence.
  2. sample subtitles file used to create the above video

Irrelevant SO questions:

  • set-a-subtitle-language-using-ffmpeg - refers to subtitles as stream
  • how-to-fix-ffmpeg-mencoder-pushing-persian-rtl-subtitles-reversed - relevant but rejected from SO, no answers given
  • how-to-properly-convert-rtl-subtitles-in-ffmpeg - talks about charset, not directionality

1 Answer 1

0

adding rtl / ktr control characters around RTL sentences solves the problem.

details + python script on a ".SRT" file, here

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .