0

I' facing a problem with exporting bookmarks from PDF files, and I tied to use pdftk for that:

pkftk.exe input.pdf dump_data output results.txt 

However, pdftk does not export the exact page numbers where the bookmarks were set, but rather the page numbers of the view. If, when setting a bookmark in Foxit Reader, I not only have the page in view where I set the bookmark but also see the lower part of the previous page, then the page number of the previous page is exported. The bookmark works correctly interactively, and both pages are displayed in the view with the exact scroll position. However, I need the page number of the page where the bookmark was actually set. This information is certainly stored in the pdf, so there should be a way to export it. How can I achieve this? Probably pdftk is not the right tool for this and there might be better approaches using scripts or other programs (preferably free and command line for usage in scripts).

Update: now I tried jpdfbookmark:

jpdfbookmarks_cli -d inputFile.pdf >results.txt

Results.txt contains one line per bookmark, like the following 2:

TESTA_pg2inViewAndMarkedOnPg2/2,Black,notBold,notItalic,open,FitWidth,813 TESTB_pg2inViewAndMargedOnPg3/2,Black,notBold,notItalic,open,FitWidth,813

After the first / is the page number. If the last value is "1", then only that page has been in view when setting the bookmark. But if the value is <>1 than it seems to denote a scroll offset from the beginning of the denoted page. That takes me one step further but isn't the final solution. The two bookmarks shown here are both set on the same view, without scrolling. For the first bookmark, something on page 2 was marked, and for the second, something on page 3. Hence, the bookmark page number is identical, as is the offset. So with these values I can't decide on which page a bookmark was set.

1 Answer 1

1

Bookmarks are not set from a page they are endemic to the whole File.

Thus they are like A Table of Contents (effectively may be seen in one cascading sidebar) but physically scattered throughout the file in any order as written.

These "bookmarks" are referenced as the "/Outlines" Destinations

You can see three in my text demo.pdf download button top right and read in notepad https://github.com/GitHubRulesOK/MyNotes/blob/master/demo.pdf

%Note objects do not need to be in order here is 9 before 7 & 8 could as easily be 1 0obj or last no problem
%09) Outline / Bookmark entry 3 
9 0 obj
<</A<</D [6 0 R /FitR 90 90 300 300]/S/GoTo>>/C[0 0 1]/F 3/Parent 4 0 R/Prev 7 0 R/Title(Hello World\041)>>
endobj
%07) Outline / Bookmark entry 1 
7 0 obj
<</A<</D [6 0 R /XYZ 0 740 2]/S/GoTo>>
/C [1 0 0]/F 1/Parent 4 0 R/First 8 0 R/Last 8 0 R/Next 9 0 R/Title(All text is "Body" text.)>>
endobj
%08) Outline / Bookmark entry 2 
8 0 obj
<</A<</D [6 0 R /XYZ 100 500.0 2.0]/S/GoTo>>/C[0 1 0]/F 2/Parent 7 0 R/Title(Jane)>>
endobj

So

  • /A is action
  • /D is destination (a page Object Ref) [6 0 R is in demo Page=0 (First page of one)
  • /XYZ a marker styled as X offset Y offset Z oom
  • /S/GoTo is the function
  • /C is the colour
  • /F is the Font style (Bold Italic etc)
  • /Parent is the nesting order
  • /Title is the display text, with oddities as to what characters may be acceptable, thus in Hello World ! needs to be written as \041

Fom JpdfBookmarks -dump we get

All text is "Body" text./1,Red,notBold,italic,open,TopLeftZoom,66,0,2.0
    Jane/1,Lime,bold,notItalic,open,TopLeftZoom,369,168,2.0
Hello World!/1,Blue,bold,italic,open,FitRect,621,151,886,505

What you should note as differences are

  • All Titles goto page /1 not page /Named A (not really a problem just using simple numbers)
  • Lime is pure Green (seems odd, but it is full bright green)
  • Values are transmuted (THIS is non comprehensible as to why using YX order & %?)
    1st entry is now 66,0 (was 0 740)
    2nd entry is now 369,168 (was 100 500)
    3rd entry is 621,151,886,505 (was 90 90 300 300)
  • GUI Viewer does not see all file contents (unclear as to why !)

I would recommend for extract outline in many ways (also add and delete) HOWEVER currently does not support colour or text style ! (so for add customised outline JpdfBookmarks may still be better)

use Coherent cpdf

cpdf -list-bookmarks -utf8  demo.pdf

For non-commercial use only
To purchase a license visit http://www.coherentpdf.com/

0 "All text is \"Body\" text." 1 "[1/XYZ 0 740 2]"
1 "Jane" 1 "[1/XYZ 100 500 2]"
0 "Hello World!" 1 "[1/FitR 90 90 300 300]"

It can also list other gotos such as annotation links cpdf -list-annotations in.pdf see comment below showing JSON command.

NOTE: the "Single" page goto in Foxit (as against some other editors or when using Page View mode) is based on.

In continuous scroll mode, the bookmark is set based on which page is mostly showing at the moment. In your example, page 1 is taking up more of the screen than page 2, so you are on page 1.

Summary

In principle, each page in a PDF is standalone (Foxit can show Page 2 above & Page 1 below) and a bookmark is a simple pointer to one numbered point (or area of only one page) even if 2 or more pages were visible at the time it was added.

3
  • Just to add to this: modern versions of cpdf also have cpdf -list-bookmarks-json as an alternative format, in UTF8 by default. Commented Jun 27, 2023 at 9:28
  • A very educative/informative answer. However, it doesn't solve my problem. In Foxit Reader, I scroll from page 1 just a little bit down to see top of page 2. I mark something there and save a bookmark. cpdf displays page 1 as the bookmark page, along with the scroll offset, as do all other tools I tried: "TEST" 1 "[1/FitH 211.487]". I would need the info that this bookmark was set on page 2.
    – jamacoe
    Commented Jun 27, 2023 at 12:53
  • In continuous scroll mode, the bookmark is set based on which page is mostly showing at the moment. In your example, page 1 is taking up more of the screen than page 2, so you are on page 1. Commented Jun 27, 2023 at 15:16

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .