4

Continuing What's the best way to view changes that emerged due to upgrades of TeX Live? , let's assume we have two files, old.pdf and new.pdf, obtained from the same LaTeX sources (a two-column paper with small fonts, some maths, and a few figures) but with slightly different installations of TeX Live 2020. Here's some data on the old file:

$ pdfinfo old.pdf
Title:           …
Subject:         
Keywords:        
Author:          …
Creator:         LaTeX with hyperref
Producer:        pdfTeX-1.40.21
CreationDate:    Sat Jan  1 03:21:23 2022 CET
ModDate:         Sat Jan  1 03:21:23 2022 CET
Custom Metadata: yes
Metadata Stream: no
Tagged:          no
UserProperties:  no
Suspects:        no
Form:            none
JavaScript:      no
Pages:           41
Encrypted:       no
Page size:       595.276 x 841.89 pts (A4)
Page rot:        0
File size:       873455 bytes
Optimized:       no
PDF version:     1.5

Judging by the file date and the much earlier pdftex version and some insider knowledge, old.pdf was generated by a stock Debian or Ubuntu TeX-Live distribution (which lags behind the version from TUG at any time) on January 1, 2022. We don't have these distributions any longer and cannot easily regenerate the file old.pdf. (One option would be to try to install older Debian and/or Ubuntu versions somewhere, which would bring in a completely new set of issues – and even then it'd be unknown whether we get the same TeX Live installations and outputs as on 2022-01-01. Anywhere to grab the latest Debian/Ubuntu live that either already has stock TeX Live packages with pdfTeX-1.40.21 or allows for installing stock TeX Live packages with pdfTeX-1.40.21?)

The file new.pdf has just been produced from the same old sources using TeX Live 2020 final (which, following David's comment, is the last TeX Live version holding pdfTeX-1.40.21 -- please correct us if we're wrong here):

$ pdfinfo new.pdf
Title:           …
Subject:         
Keywords:        
Author:          …
Creator:         LaTeX with hyperref
Producer:        pdfTeX-1.40.21
CreationDate:    Sun Dec 10 19:54:05 2023 CET
ModDate:         Sun Dec 10 19:54:05 2023 CET
Custom Metadata: yes
Metadata Stream: no
Tagged:          no
UserProperties:  no
Suspects:        no
Form:            none
JavaScript:      no
Pages:           41
Encrypted:       no
Page size:       595.276 x 841.89 pts (A4)
Page rot:        0
File size:       1104321 bytes
Optimized:       no
PDF version:     1.5

There are no differences between these two pieces of metadata except the dates and the file sizes. There are, however, differences in the fonts:

$ diff <(pdffonts old.pdf) <(pdffonts new.pdf)
3,60c3,59
< [none]                               Type 3            Custom           yes no  no      22  0
< [none]                               Type 3            Custom           yes no  no      23  0
< [none]                               Type 3            Custom           yes no  no      24  0
< [none]                               Type 3            Custom           yes no  no      25  0
< [none]                               Type 3            Custom           yes no  no      26  0
< [none]                               Type 3            Custom           yes no  no      27  0
< [none]                               Type 3            Custom           yes no  no      30  0
< [none]                               Type 3            Custom           yes no  no      31  0
< [none]                               Type 3            Custom           yes no  no      32  0
< [none]                               Type 3            Custom           yes no  no      33  0
< [none]                               Type 3            Custom           yes no  no      36  0
< HHFZSL+CMR10                         Type 1            Builtin          yes yes no      37  0
< AUZFDF+CMMI10                        Type 1            Builtin          yes yes no      70  0
< CHXOKT+CMSY10                        Type 1            Builtin          yes yes no      71  0
< EJRJQW+CMR7                          Type 1            Builtin          yes yes no      72  0
< [none]                               Type 3            Custom           yes no  no      74  0
< [none]                               Type 3            Custom           yes no  no      75  0
< [none]                               Type 3            Custom           yes no  no      76  0
< [none]                               Type 3            Custom           yes no  no      78  0
< [none]                               Type 3            Custom           yes no  no     113  0
< RQSHZS+CMTT10                        Type 1            Builtin          yes yes no     230  0
< UFOCKA+BBOLD10                       Type 1            Builtin          yes yes no     233  0
< WRFUCI+CMTI10                        Type 1            Builtin          yes yes no     235  0
< SCGGLO+CMMI7                         Type 1            Builtin          yes yes no     236  0
< [none]                               Type 3            Custom           yes no  no     247  0
< CYIPZU+MSAM10                        Type 1            Builtin          yes yes no     266  0
< NQPTHU+CMEX10                        Type 1            Builtin          yes yes no     268  0
< BLWZBZ+CMSY7                         Type 1            Builtin          yes yes no     269  0
< OYQULM+EUFM10                        Type 1            Builtin          yes yes no     270  0
< HCIFCV+CMR5                          Type 1            Builtin          yes yes no     271  0
< NLGAUZ+CMSS10                        Type 1            Builtin          yes yes no     272  0
< LUAFXG+rsfs10                        Type 1            Builtin          yes yes no     273  0
< HQDOIX+CMMI12                        Type 1            Builtin          yes yes no     283  0
< ZLOCQH+CMR8                          Type 1            Builtin          yes yes no     284  0
< LSMQVG+MSBM10                        Type 1            Builtin          yes yes no     305  0
< SFQIJW+CMSS9                         Type 1            Builtin          yes yes no     307  0
< ECGVGR+CMR9                          Type 1            Builtin          yes yes no     308  0
< BHDBNY+CMMI9                         Type 1            Builtin          yes yes no     309  0
< JIXOQS+CMSY9                         Type 1            Builtin          yes yes no     312  0
< [none]                               Type 3            Custom           yes no  no     332  0
< AUZEEC+TeX-mathb10                   Type 1            Builtin          yes yes no     377  0
< GZMRBQ+CMSS8                         Type 1            Builtin          yes yes no     410  0
< BNEURD+stmary10                      Type 1            Builtin          yes yes no     416  0
< LFVJTS+CMMI8                         Type 1            Builtin          yes yes no     425  0
< KKQPKP+CMMI5                         Type 1            Builtin          yes yes no     448  0
< MBBTWW+TeX-mathb7                    Type 1            Builtin          yes yes no     449  0
< GKIQMS+CMSY5                         Type 1            Builtin          yes yes no     450  0
< BHHCMF+CMSY8                         Type 1            Builtin          yes yes no     455  0
< ORHOJI+CMR6                          Type 1            Builtin          yes yes no     456  0
< [none]                               Type 3            Custom           yes no  no     617  0
< IPQGLW+BBOLD7                        Type 1            Builtin          yes yes no     658  0
< WWCTIW+CMEX7                         Type 1            Builtin          yes yes no     678  0
< [none]                               Type 3            Custom           yes no  no     745  0
< [none]                               Type 3            Custom           yes no  no    1013  0
< RRPOFI+MSBM7                         Type 1            Builtin          yes yes no    1078  0
< RKHRIA+CMEX8                         Type 1            Builtin          yes yes no    1086  0
< [none]                               Type 3            Custom           yes no  no    1135  0
< [none]                               Type 3            Custom           yes no  no    1136  0
---
> KYESCI+SFRM1728                      Type 1            Custom           yes yes no      22  0
> YHWNWE+SFCC1200                      Type 1            Custom           yes yes no      23  0
> UFXXMG+SFCC0800                      Type 1            Custom           yes yes no      24  0
> TQJZEA+SFTI0700                      Type 1            Custom           yes yes no      25  0
> OPFKQH+SFTI0900                      Type 1            Custom           yes yes no      26  0
> WXDDIU+SFBX0900                      Type 1            Custom           yes yes no      27  0
> KAKEVM+SFBX1000                      Type 1            Custom           yes yes no      30  0
> NVPASK+SFCC1000                      Type 1            Custom           yes yes no      31  0
> GXRPYC+SFRM1000                      Type 1            Custom           yes yes no      32  0
> URWOJO+SFTI1000                      Type 1            Custom           yes yes no      35  0
> HHFZSL+CMR10                         Type 1            Builtin          yes yes no      36  0
> AUZFDF+CMMI10                        Type 1            Builtin          yes yes no      69  0
> CHXOKT+CMSY10                        Type 1            Builtin          yes yes no      70  0
> EJRJQW+CMR7                          Type 1            Builtin          yes yes no      71  0
> CHUDXO+SFRM0700                      Type 1            Custom           yes yes no      73  0
> GXRPYC+SFRM1000                      Type 1            Custom           yes yes no      74  0
> ADNPKL+SFRM0600                      Type 1            Custom           yes yes no      75  0
> LMFIFZ+SFRM0800                      Type 1            Custom           yes yes no      77  0
> ZYIGPF+SFTT1000                      Type 1            Custom           yes yes no     112  0
> RQSHZS+CMTT10                        Type 1            Builtin          yes yes no     229  0
> UFOCKA+BBOLD10                       Type 1            Builtin          yes yes no     232  0
> WRFUCI+CMTI10                        Type 1            Builtin          yes yes no     234  0
> SCGGLO+CMMI7                         Type 1            Builtin          yes yes no     235  0
> RIDLKK+SFRM0900                      Type 1            Custom           yes yes no     246  0
> CYIPZU+MSAM10                        Type 1            Builtin          yes yes no     265  0
> NQPTHU+CMEX10                        Type 1            Builtin          yes yes no     267  0
> BLWZBZ+CMSY7                         Type 1            Builtin          yes yes no     268  0
> OYQULM+EUFM10                        Type 1            Builtin          yes yes no     269  0
> HCIFCV+CMR5                          Type 1            Builtin          yes yes no     270  0
> NLGAUZ+CMSS10                        Type 1            Builtin          yes yes no     271  0
> LUAFXG+rsfs10                        Type 1            Builtin          yes yes no     272  0
> HQDOIX+CMMI12                        Type 1            Builtin          yes yes no     282  0
> ZLOCQH+CMR8                          Type 1            Builtin          yes yes no     283  0
> LSMQVG+MSBM10                        Type 1            Builtin          yes yes no     304  0
> SFQIJW+CMSS9                         Type 1            Builtin          yes yes no     306  0
> ECGVGR+CMR9                          Type 1            Builtin          yes yes no     307  0
> BHDBNY+CMMI9                         Type 1            Builtin          yes yes no     308  0
> JIXOQS+CMSY9                         Type 1            Builtin          yes yes no     311  0
> ECVLCM+SFIT0900                      Type 1            Custom           yes yes no     331  0
> AUZEEC+TeX-mathb10                   Type 1            Builtin          yes yes no     376  0
> GZMRBQ+CMSS8                         Type 1            Builtin          yes yes no     409  0
> BNEURD+stmary10                      Type 1            Builtin          yes yes no     415  0
> LFVJTS+CMMI8                         Type 1            Builtin          yes yes no     424  0
> KKQPKP+CMMI5                         Type 1            Builtin          yes yes no     447  0
> MBBTWW+TeX-mathb7                    Type 1            Builtin          yes yes no     448  0
> GKIQMS+CMSY5                         Type 1            Builtin          yes yes no     449  0
> BHHCMF+CMSY8                         Type 1            Builtin          yes yes no     454  0
> ORHOJI+CMR6                          Type 1            Builtin          yes yes no     455  0
> QRAQFO+SFTI0800                      Type 1            Custom           yes yes no     616  0
> IPQGLW+BBOLD7                        Type 1            Builtin          yes yes no     657  0
> WWCTIW+CMEX7                         Type 1            Builtin          yes yes no     677  0
> JPVWTM+SFIT1000                      Type 1            Custom           yes yes no     744  0
> WEZJQK+SFTT0900                      Type 1            Custom           yes yes no    1012  0
> RRPOFI+MSBM7                         Type 1            Builtin          yes yes no    1077  0
> RKHRIA+CMEX8                         Type 1            Builtin          yes yes no    1085  0
> FYOJQA+SFTT0800                      Type 1            Custom           yes yes no    1134  0
> TBNRHQ+SFSS0800                      Type 1            Custom           yes yes no    1135  0

Let's make the diff a bit smaller:

$ diff <(pdffonts old.pdf | sed '1,2d' | cut -c-79 | cut -d '+' -f 2- | sort) <(pdffonts new.pdf | sed '1,2d'| cut -c-79 | cut -d '+' -f 2- | sort)
32,54d31
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
< [none]                               Type 3            Custom           yes no 
55a33,54
> SFBX0900                      Type 1            Custom           yes yes
> SFBX1000                      Type 1            Custom           yes yes
> SFCC0800                      Type 1            Custom           yes yes
> SFCC1000                      Type 1            Custom           yes yes
> SFCC1200                      Type 1            Custom           yes yes
> SFIT0900                      Type 1            Custom           yes yes
> SFIT1000                      Type 1            Custom           yes yes
> SFRM0600                      Type 1            Custom           yes yes
> SFRM0700                      Type 1            Custom           yes yes
> SFRM0800                      Type 1            Custom           yes yes
> SFRM0900                      Type 1            Custom           yes yes
> SFRM1000                      Type 1            Custom           yes yes
> SFRM1000                      Type 1            Custom           yes yes
> SFRM1728                      Type 1            Custom           yes yes
> SFSS0800                      Type 1            Custom           yes yes
> SFTI0700                      Type 1            Custom           yes yes
> SFTI0800                      Type 1            Custom           yes yes
> SFTI0900                      Type 1            Custom           yes yes
> SFTI1000                      Type 1            Custom           yes yes
> SFTT0800                      Type 1            Custom           yes yes
> SFTT0900                      Type 1            Custom           yes yes
> SFTT1000                      Type 1            Custom           yes yes

The old file uses a mixture of Type 1 and Type 3 fonts, whereas the new file uses Type 1 fonts only.

Trying to compare the textual contents results in a nightmare, and here's an excerpt:

$ diff <(pdftotext old.pdf -) <(pdftotext new.pdf -) | head -12298 | tail -39

yields

10571c5813
< abstract interpretation. FPCA, pp. 170181. ACM Press,
---
> abstract interpretation. FPCA, pp. 170–181. ACM Press,
10573c5815
< [70] Jones, N. D. and Muchnik, S. S. (1981) Complexity of ow
---
> [70] Jones, N. D. and Muchnik, S. S. (1981) Complexity of flow
10583c5825
< [72] Perry, D. E., Jeey, R., and Notkin, D. (eds.) (1995)
---
> [72] Perry, D. E., Jeffrey, R., and Notkin, D. (eds.) (1995)
10585c5827
< Seattle, Washington, USA, April 2330, 1995, Proceedings. ACM.
---
> Seattle, Washington, USA, April 23–30, 1995, Proceedings. ACM.
10587c5829
< Softwaretechnik-Trends, .
---
> Softwaretechnik-Trends, 21.
10593c5835
< Université d'Aix-Marseille  CNRS, UMR 7279.
---
> Université d’Aix-Marseille — CNRS, UMR 7279.
10597c5839
< Structures in Computer Science, , 329366.
---
> Structures in Computer Science, 14, 329–366.
10601c5843
< Distributed Computing, , 383409.
---
> Distributed Computing, 25, 383–409.
10609,10611c5851,5852
< Proceedings, Lecture Notes in Computer Science,
< ,
< pp. 391407. Springer.
---
> Proceedings, Lecture Notes in Computer Science, 5674,
> pp. 391–407. Springer.

As you see, old.pdf probably has no en dashes and no ligatures (ff/ff, fl/fl) in the text layer as opposed to new.pdf, and therefore it's not manageable to examine the full output manually:

$ diff <(pdftotext old.pdf -) <(pdftotext new.pdf -) | wc -l
15278

15278 lines is just way too many. The tool diffpdf is not better; here's the second page of the two files side by side, and wherever diffpdf senses a difference, it colo(u)rs the background in red:

  • visual comparison marks everything as different

visual comparison

  • comparison by characters marks most of the text as different

character comparison

  • comparison by words marks much of the text as different

word comparison

Above, we blurred the images for privacy. When we actually tried to find the difference in the contents (we considered the first paragraph on the page), we discovered nothing except that the fonts in new.pdf are smoother than in old.pdf. Still, we are unsure about the rest of the document. We clearly don't wish to re-read all the pages of each document (here 41 pages and consider every symbol, line, and space) simply for the purpose of comparison (for other purposes in a distant future perhaps but not for the purpose of comparison) whether any more-important contents (actual letters and digits, references, citations, hyperlinks, tables, graphics, math symbols, self-drawn symbols, …) changed when TeX Live was upgraded.

Any help in better automating the comparison task? Can we, perhaps, anyhow equalize the fonts in one or both PDF files before comparison? (Btw., these PDF files were produced from LaTeX via pdflatex, and we do NOT have Postscript or DVI versions of the old file.) Or can we massage the outputs of pdftotext before running diff? Or can we provide any nondefault options to the tools used to make our task easier? Or, for our purposes, is the paid diffpdf now any better for than its free version? Or are there any online tools good at this?

22
  • You could try \pdfglyphtounicode=0 (only for the tests, not for the final document!). Then probably the ff ligatures no longer copy in the new pdf too and the diff gets manageable. Commented Dec 10, 2023 at 8:49
  • @cfr I'm interested in the most important differences that emerge due to the evolvement of LaTeX and packages. Most important is, of course, the contents. The correctness of the original is checked by simply reading everything, but, alas, I cannot do it for every file after each time the system is upgraded.
    – AlMa1r
    Commented Dec 10, 2023 at 11:35
  • @UlrikeFischer Are you sure? I get “ ! Missing { inserted. <to be read again> = l.1 \pdfglyphtounicode= 0 ? ” on the console.
    – AlMa1r
    Commented Dec 10, 2023 at 11:39
  • sorry \pdfgentounicode = 1. Commented Dec 10, 2023 at 11:51
  • @UlrikeFischer After doing this, the output of diffpdf is as bad as before. Also, diff <(pdftotext old.pdf -) <(pdftotext new.pdf -)| wc -l still yields 12387 lines.
    – AlMa1r
    Commented Dec 10, 2023 at 12:12

1 Answer 1

2

You won't believe it, but

$ diff <(pdffonts old.pdf | sed '1,2d' | cut -c-79 | cut -d '+' -f 2- | sort) <(pdffonts new.pdf | sed '1,2d'| cut -c-79 | cut -d '+' -f 2- | sort) | grep  "> SF" | cut -c3-10 | tr '\n' '|'
SFBX0900|SFBX1000|SFCC0800|SFCC1000|SFCC1200|SFIT0900|SFIT1000|SFRM0600|SFRM0700|SFRM0800|SFRM0900|SFRM1000|SFRM1000|SFRM1728|SFSS0800|SFTI0700|SFTI0800|SFTI0900|SFTI1000|SFTT0800|SFTT0900|SFTT1000|

We read the output and, using it as an argument to egrep -iv, we create a local file mymap.map in the directory of the LaTeX document via

$ egrep -vi "SFBX0900|SFBX1000|SFCC0800|SFCC1000|SFCC1200|SFIT0900|SFIT1000|SFRM0600|SFRM0700|SFRM0800|SFRM0900|SFRM1000|SFRM1000|SFRM1728|SFSS0800|SFTI0700|SFTI0800|SFTI0900|SFTI1000|SFTT0800|SFTT0900|SFTT1000" path_to_pdftex.map_from_TeX_Live_2020 > mymap.map

After that we add, as Ulrike suggested above,

\pdfgentounicode=0
\pdfmapfile{mymap.map}

to the preamble and recompile the document using TeX Live 2020. Finally, we run diffpdf and observe NO CHANGES in any of the modes (visual, character, and words). What a relief!

Thanks to @cfr and @Ulrike Fischer for support!

0

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .