Skip to main content
added 3 characters in body
Source Link
Neil
  • 172.9k
  • 20
  • 9

Sʨɠɠanography

Many Unicode composed characters have two forms, one being a precomposed character, while the other is an ASCII character with a combining diacritic. For instance, é has Unicode code point U+00E9, but , which looks identical, is actually the ASCII character 0x65 with the combining diacritic U+0301.

Your task is to write two programs or functions. The first will take a string of Unicode containing some composed characters and a string of printable ASCII. It will then output an identical-looking Unicode string which the second program can then decode to recover the printable ASCII.

By identical-looking, it must be the case that performing your choice of NFC or NFD normalisation on the input and output Unicode strings returns an identical string. If necessary, you may require that the input Unicode string be already NFC or NFD normalised (please specify).

If the input string does not contain sufficient composed characters for you to use, you mayshould repeat it until its length is sufficient. You may also join the repeats with newlines should you so wish.

Your score is the sum of the byte lengths of your two programs divided by the number of composed characters you support in the input Unicode string. For instance, you may only wish to support those composed characters that expand to two characters under NFD. In this case your code's behaviour for other composed characters must be consistent (i.e. one of always unchanged, always NFC, always NFD). (Note that the Unicode string may also contain characters that do not change under NFC or NFD. You obviously can't use these to encode the ASCII string but they must still appear in the output.)

If your language supports arbitrary-precision integers, you may also elect to provide a demonstration of your algorithm on integers rather than printable ASCII strings.

Sʨɠɠanography

Many Unicode composed characters have two forms, one being a precomposed character, while the other is an ASCII character with a combining diacritic. For instance, é has Unicode code point U+00E9, but , which looks identical, is actually the ASCII character 0x65 with the combining diacritic U+0301.

Your task is to write two programs or functions. The first will take a string of Unicode containing some composed characters and a string of printable ASCII. It will then output an identical-looking Unicode string which the second program can then decode to recover the printable ASCII.

By identical-looking, it must be the case that performing your choice of NFC or NFD normalisation on the input and output Unicode strings returns an identical string. If necessary, you may require that the input Unicode string be already NFC or NFD normalised (please specify).

If the input string does not contain sufficient composed characters for you to use, you may repeat it until its length is sufficient. You may also join the repeats with newlines should you so wish.

Your score is the sum of the byte lengths of your two programs divided by the number of composed characters you support in the input Unicode string. For instance, you may only wish to support those composed characters that expand to two characters under NFD. In this case your code's behaviour for other composed characters must be consistent (i.e. one of always unchanged, always NFC, always NFD).

If your language supports arbitrary-precision integers, you may also elect to provide a demonstration of your algorithm on integers rather than printable ASCII strings.

Sʨɠɠanography

Many Unicode composed characters have two forms, one being a precomposed character, while the other is an ASCII character with a combining diacritic. For instance, é has Unicode code point U+00E9, but , which looks identical, is actually the ASCII character 0x65 with the combining diacritic U+0301.

Your task is to write two programs or functions. The first will take a string of Unicode containing some composed characters and a string of printable ASCII. It will then output an identical-looking Unicode string which the second program can then decode to recover the printable ASCII.

By identical-looking, it must be the case that performing your choice of NFC or NFD normalisation on the input and output Unicode strings returns an identical string. If necessary, you may require that the input Unicode string be already NFC or NFD normalised (please specify).

If the input string does not contain sufficient composed characters for you to use, you should repeat it until its length is sufficient. You may also join the repeats with newlines should you so wish.

Your score is the sum of the byte lengths of your two programs divided by the number of composed characters you support in the input Unicode string. For instance, you may only wish to support those composed characters that expand to two characters under NFD. In this case your code's behaviour for other composed characters must be consistent (i.e. one of always unchanged, always NFC, always NFD). (Note that the Unicode string may also contain characters that do not change under NFC or NFD. You obviously can't use these to encode the ASCII string but they must still appear in the output.)

If your language supports arbitrary-precision integers, you may also elect to provide a demonstration of your algorithm on integers rather than printable ASCII strings.

Source Link
Neil
  • 172.9k
  • 20
  • 9

Sʨɠɠanography

Many Unicode composed characters have two forms, one being a precomposed character, while the other is an ASCII character with a combining diacritic. For instance, é has Unicode code point U+00E9, but , which looks identical, is actually the ASCII character 0x65 with the combining diacritic U+0301.

Your task is to write two programs or functions. The first will take a string of Unicode containing some composed characters and a string of printable ASCII. It will then output an identical-looking Unicode string which the second program can then decode to recover the printable ASCII.

By identical-looking, it must be the case that performing your choice of NFC or NFD normalisation on the input and output Unicode strings returns an identical string. If necessary, you may require that the input Unicode string be already NFC or NFD normalised (please specify).

If the input string does not contain sufficient composed characters for you to use, you may repeat it until its length is sufficient. You may also join the repeats with newlines should you so wish.

Your score is the sum of the byte lengths of your two programs divided by the number of composed characters you support in the input Unicode string. For instance, you may only wish to support those composed characters that expand to two characters under NFD. In this case your code's behaviour for other composed characters must be consistent (i.e. one of always unchanged, always NFC, always NFD).

If your language supports arbitrary-precision integers, you may also elect to provide a demonstration of your algorithm on integers rather than printable ASCII strings.