2
$\begingroup$

The documentation has an example importing a VCF address book file which works fine:

Import[ "ExampleData/wolfram.vcf" ]

{{FormattedName->Wolfram Research, Inc.,Organization->Wolfram Research, Inc.,Email->[email protected],Phone->217-398-0700,Fax->217-398-0747,Address1->100 Trade Center Drive,City->Champaign,State->IL,ZIPCode->61820,Country->USA}}

But in my case:

Import["F:\\mathematica\\send_contact.vcf"]

{{NameLast->=E6=B5=8B=E8=AF=95,FormattedName->=E6=B5=8B=E8=AF=95,Phone->12345 678 9}}

How about the =E6=B5=8B=E8=AF=95? A bug, or am I using this wrong?

You can get my .vcf file from this link.


The ".vcf" file is my test file from a cellphone export. If you import that file into your cell phone you will get a number like this picture, and if we use Import, the following answer is obtained:

{{NameLast -> =试, FormattedName -> ="I don't know this item", Phone -> 12345 678 9}}

Since @bill s mentioned that it could be a missing font issue, I made another test vcf file with only characters from the English alphabet. The output is normal this time.

{{"NameLast" -> "test name", "FormattedName" -> "test name","Phone" -> "12345 678 9"}}

So is the problem caused by the VCF file not being compatible with Chinese characters? How can we interpret =E6=B5=8B=E8=AF=95 to obtain the original Chinese characters?

$\endgroup$
5
  • $\begingroup$ It may be a text encoding issue. VCF files are plain text: do you know how yours is encoded? I'm on a tablet so I can't check your file for myself... Unfortunately, however, I couldn't find reference to how one could specify an encoding when importing either. $\endgroup$
    – MarcoB
    Commented Apr 1, 2016 at 12:51
  • $\begingroup$ @george2079 But if you search for "VCF address book" (OP's description) then it's unambiguous. $\endgroup$
    – C. E.
    Commented Apr 1, 2016 at 13:00
  • $\begingroup$ @george2079 It is exported by my cellphone. $\endgroup$
    – yode
    Commented Apr 1, 2016 at 13:42
  • $\begingroup$ @MarcoB Sorry,actually I don't know it. $\endgroup$
    – yode
    Commented Apr 1, 2016 at 13:48
  • $\begingroup$ @MarcoB Thanks for your edit. $\endgroup$
    – yode
    Commented Apr 1, 2016 at 17:04

3 Answers 3

3
$\begingroup$

As the @george2079 's suggetion,I post my solution from a friend as an answer,but I'm sure there are more better method can do this.I accept myself answer just for reader.If anyone have post better solution,I'll change the acceptance.

$Version

"10.3.1 for Microsoft Windows (64-bit) (December 21, 2015)"

string = First@Import["file address"];
Rule @@@ Transpose@{Keys[string], 
   URLDecode[StringReplace[Values[string], "=" -> "%"], 
    CharacterEncoding -> "UTF-8"]}

{NameLast->测试,FormattedName->测试,Phone->12345 678 9}

$\endgroup$
1
  • $\begingroup$ Very nice! Thanks for posting this as an answer. (+1) $\endgroup$
    – MarcoB
    Commented Apr 1, 2016 at 16:55
1
$\begingroup$

Here is the plain text of the VCF file from your link:

BEGIN:VCARD VERSION:2.1 N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=E6=B5=8B=E8=AF=95;;;; FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=E6=B5=8B=E8=AF=95 TEL;HOME:12345 678 9 END:VCARD

Given this, Mathematica's answer is not surprising. Perhaps the odd characters are representatives of a font that is not installed on your computer?

$\endgroup$
6
  • $\begingroup$ I don't think the problem root in the "odd characters".The character I have input is very common. $\endgroup$
    – yode
    Commented Apr 1, 2016 at 13:50
  • $\begingroup$ What I intended to suggest is that a coding like "=E6=B5=8B=E8=AF=95" might be a representation from a font that is not being displayed properly. $\endgroup$
    – bill s
    Commented Apr 1, 2016 at 13:53
  • $\begingroup$ Bill, @yode, This is Quoted-Printable encoding, as suggested by the ENCODING tag in the VCF. This is called "PrintableASCII" in Mathematica. It is a way of encoding 8-bit characters to transmit on a 7-bit transmission line (e.g. the Internet). Yode, you will need some post-processing of your chinese characters. See e.g. mathematica.stackexchange.com/q/25867/27951. $\endgroup$
    – MarcoB
    Commented Apr 1, 2016 at 14:21
  • 1
    $\begingroup$ URLDecode[StringReplace["=E6=B5=8B=E8=AF=95=E4=B8=80=E4=B8=8B","="->"%"],CharacterEncoding->"UTF-8"] work well,Thanks all of you.@george2079 @MarcoB @bill s $\endgroup$
    – yode
    Commented Apr 1, 2016 at 15:57
  • 1
    $\begingroup$ you should make that an answer. (The CharacterEncoding option throws a warning for me by the way, but it works. possible version issue) $\endgroup$
    – george2079
    Commented Apr 1, 2016 at 16:01
0
$\begingroup$

out of curiosity I worked out the encoding, at least partly. It takes the last 4 bits of the first byte and the last 6 bits of the remaining two from each triplet, so we can directly decode like this:

cdecode[s_String] :=
 FromCharacterCode@FromDigits[
     Join @@ MapThread[IntegerDigits[FromDigits[#1, 16], 2][[#2 ;;]] &,
       {StringTake[#,Array[{3 # - 1, 3 #} &, 3]], {-4, -6, -6}}],
      2] & /@
  StringTake[s,Array[{9 # - 8, 9 #} &, Floor[StringLength@s/9] ]]//StringJoin
cdecode["=E6=B5=8B=E8=AF=95=E4=B8=80=E4=B8=8B"]

same string

No doubt URLDecode is the more robust way to go. Note there are 8 bits that have been ignored here. Presumably the 'E' signifies the start of a 3-byte code - that's ignored here an should be checked.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.