6

I have a large PDF file containing a map. The PDF file was probably generated with AutoCAD.

The image consists of a coloured raster map, and a vector with lines on top of the map. (Street lines etc.)

I need to work with the raster and the vector separately. When I import it into Photoshop, it only sees one layer. When I select the layers tab on Adobe PDF Reader, it also shows only one layer. But I am sure there are multiple layers, because when it renders the file, it first draws out the map in the background, and only after starts drawing the vector on top. If I am fast enough, I can actually use "print screen" to save the background raster. I need a more reliable method to extract that image, and also the vector.

Can I use some open-source tool like Ghostscript to split up the pdf into its essential parts like text, raster, and vector data? And then put them all in a folder?

4 Answers 4

6

I've found one manual solution using Inkscape, am looking around for ways to automate it.

  1. Open the PDF in Inkscape (I too had a map like yours). Go with the default import settings.
  2. Menu > Object > Objects . (and not Layers)
  3. It opens an objects panel. This is just like layers. We can click on the left columns to toggle visibility, lock it, etc.
  4. There's one item there, but it has an arrow indicating there might be more. I click that, and it expands to show several sub-items.
  5. As I click on each one, on the image the different objects get selected. On toggling visibility (closing the eye), each object disappears from the image.
  6. Thus after hiding all the stuff I didn't want, I go to File > Export PNG image. I had to increase the size and DPI to get it to a good resolution.. the default setting have a small thumbnail.
  7. I now have the map I needed.

Automation

I found a command line way of doing this.

inkscape -z -i g2846 -j -D -d 300 test3.pdf -e 3.png

Reference doc: https://inkscape.org/sk/doc/inkscape-man.html

Explaining the parameters:

  • -z : no gui, run inkscape in command line only
  • -i g2846 : Selecting the specific group/layer id to export. I got to know this id/label by the above mentioned manual steps in the Inkscape gui.
  • -j : hide all other layers etc in the export
  • -D : Keep the export image's dimension same as the whole drawing/doc, and maintain the extracted object's position. (this is important in the event that the original object is rotated/warped and you want the output no the original, or if you're exracting multiple layers and need to maintain their positions on the canvas)
  • -d 300 : 300 DPI : the default made the output png too lossy, this setting kept it all good at my end.
  • test3.pdf : my input pdf
  • -e 3.png : export as PNG, and filename given.

Unfortunately we can only extract one object/layer at a time for now. There is a bug filed for inkscape requesting to allow multiple layers : Allow several -i (--export-id=ID) options.

[EDIT] Another workaround if you want multiple (but not all) layers: Use the inkscape command shared above to get individual layers out as: 1.png, 2.png, 3.png Then, run the following command from imagemagick:

$ convert -page +0+0 1.png \
-page +0+0 2.png \
-page +0+0 3.png \
-layers merge +repage merged.png

That should merge the layers together to merged.png.

2
  • Thanks! Very helpful. Here's the merge command for GraphicMagick: gm convert xc:transparent -compose Over *.png -mosaic merged.png Commented Dec 16, 2019 at 7:03
  • Thanks for improving the answer! <3
    – Nikhil VJ
    Commented Apr 7, 2020 at 2:18
1

posting another possible solution using ogr2ogr tool. Here's a repo where it worked like a charm to help rip out several layers from a "Geo PDF": https://github.com/draftmpd41/layers_draft_delhi_master_plan_2041

Command line ogr2ogr

Please see the .bat file, here's a sample command from there:

ogr2ogr -f "GEOJSON"  Boundaries_DDA_ZONE_Boundary_polyline.geojson draftplan.pdf Layers_Boundries_DDA_ZONE_Boundary_polyline -s_srs EPSG:32643 -t_srs EPSG:4326 --config OGR_PDF_READ_NON_STRUCTURED YES

Explained:

  • -f "GEOJSON" : output in this format
  • Boundaries_DDA_ZONE_Boundary_polyline.geojson : the output file name
  • draftplan.pdf : input pdf file name
  • Layers_Boundries_DDA_ZONE_Boundary_polyline : Layer name inside the PDF : you can get this by opening the pdf somewhere and inspecting the Layers panel or so.
  • -s_srs EPSG:32643 : source CRS or datum or srs - lots of terms and I don't make them! Initially if you don't know then just put EPSG:4326
  • t_srs EPSG:4326 : destination CRS (or datum or srs). You want EPSG:4326 if you want latitude-longitude.
  • --config OGR_PDF_READ_NON_STRUCTURED YES : it works when this is put in. IDK details.

Finding CRS

Initially we didn't know what the source CRS was, so found out by this method:

  • convert it thinking its EPSG:4326 only, and get a file with lat-longs going into super high numbers
  • Load it into QGIS. Find and press the "Zoom to layer" button in the toolbar. We can see the shape, but it's out of whack relative to say the openstreetmap background XYZ layer. No issues.
  • Locate one point in the shape whose real-world location you know.
  • Note the X,Y co-ordinates of this point (the large numbers) (note: X is longitude and Y is latitude, so you'll see long-lat, not lat-long)
  • Now, open http://projfinder.com/ and move the map to that real-world location
  • paste in the X and Y values you'd noted, and press the Find.. button
  • This site now figures out all the potential CRS systems of your layer. Pick the most appropriate one, note down the EPSG:____ code and plug it into your command line at -s_srs EPSG:____
  • Now re-run your command and load your output geojson in QGIS or other tool like https://geojson.io and hopefully it should be proper

Disclaimer

This is specific to "Geo" PDFs where your layer is something on a map. I am not sure how this method may pan out for vector graphics.. but still you can just load the output onto QGIS and press zoom to shape and should be able to see the shape - which you can then export as image / screenshot and get on with business.

Where ogr2ogr comes from

Further references

0

I just came across this article posted in Sep 2019: https://north-road.com/2019/09/03/qgis-3-10-loves-geopdf/

Apparently if it's a "Geo PDF" we can import it into QGIS and get all the vector layers etc. And then of course in QGIS one can do whatever with the layers. Might be worth a quick try.

-1

You should work with the autocad version or from there export it as separate layers. Once its exported to a PDF, it becomes a bitmap image that you can't edit as separate layers.

2
  • I don't have the autocad version :D If I did, or course I would use that. Commented Dec 15, 2015 at 18:48
  • 1
    It doesn't become a bitmap image, because when I open it with adobe-reader, if I am fast enough, I can capture the first layer rendered without the seccond with a print-screen :) So it can't be a bitmap... bitmaps don't do that. Commented Dec 15, 2015 at 18:49

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .