How can I split PDF file into layers

Question

I have a large PDF file containing a map. The PDF file was probably generated with AutoCAD.

The image consists of a coloured raster map, and a vector with lines on top of the map. (Street lines etc.)

I need to work with the raster and the vector separately. When I import it into Photoshop, it only sees one layer. When I select the layers tab on Adobe PDF Reader, it also shows only one layer. But I am sure there are multiple layers, because when it renders the file, it first draws out the map in the background, and only after starts drawing the vector on top. If I am fast enough, I can actually use "print screen" to save the background raster. I need a more reliable method to extract that image, and also the vector.

Can I use some open-source tool like Ghostscript to split up the pdf into its essential parts like text, raster, and vector data? And then put them all in a folder?

Community · Accepted Answer · 2020-06-12 13:48:39Z

I've found one manual solution using Inkscape, am looking around for ways to automate it.

Open the PDF in Inkscape (I too had a map like yours). Go with the default import settings.
Menu > Object > Objects . (and not Layers)
It opens an objects panel. This is just like layers. We can click on the left columns to toggle visibility, lock it, etc.
There's one item there, but it has an arrow indicating there might be more. I click that, and it expands to show several sub-items.
As I click on each one, on the image the different objects get selected. On toggling visibility (closing the eye), each object disappears from the image.
Thus after hiding all the stuff I didn't want, I go to File > Export PNG image. I had to increase the size and DPI to get it to a good resolution.. the default setting have a small thumbnail.
I now have the map I needed.

Automation

I found a command line way of doing this.

inkscape -z -i g2846 -j -D -d 300 test3.pdf -e 3.png

Reference doc: https://inkscape.org/sk/doc/inkscape-man.html

Explaining the parameters:

-z : no gui, run inkscape in command line only
-i g2846 : Selecting the specific group/layer id to export. I got to know this id/label by the above mentioned manual steps in the Inkscape gui.
-j : hide all other layers etc in the export
-D : Keep the export image's dimension same as the whole drawing/doc, and maintain the extracted object's position. (this is important in the event that the original object is rotated/warped and you want the output no the original, or if you're exracting multiple layers and need to maintain their positions on the canvas)
-d 300 : 300 DPI : the default made the output png too lossy, this setting kept it all good at my end.
test3.pdf : my input pdf
-e 3.png : export as PNG, and filename given.

Unfortunately we can only extract one object/layer at a time for now. There is a bug filed for inkscape requesting to allow multiple layers : Allow several -i (--export-id=ID) options.

[EDIT] Another workaround if you want multiple (but not all) layers: Use the inkscape command shared above to get individual layers out as: 1.png, 2.png, 3.png Then, run the following command from imagemagick:

$ convert -page +0+0 1.png \
-page +0+0 2.png \
-page +0+0 3.png \
-layers merge +repage merged.png

That should merge the layers together to merged.png.

Thanks! Very helpful. Here's the merge command for GraphicMagick: gm convert xc:transparent -compose Over *.png -mosaic merged.png — nathancahill, Commented Dec 16, 2019 at 7:03

Nikhil VJ · Accepted Answer · 2023-04-23 05:21:33Z

posting another possible solution using ogr2ogr tool. Here's a repo where it worked like a charm to help rip out several layers from a "Geo PDF": https://github.com/draftmpd41/layers_draft_delhi_master_plan_2041

Command line ogr2ogr

Please see the .bat file, here's a sample command from there:

ogr2ogr -f "GEOJSON"  Boundaries_DDA_ZONE_Boundary_polyline.geojson draftplan.pdf Layers_Boundries_DDA_ZONE_Boundary_polyline -s_srs EPSG:32643 -t_srs EPSG:4326 --config OGR_PDF_READ_NON_STRUCTURED YES

Explained:

-f "GEOJSON" : output in this format
Boundaries_DDA_ZONE_Boundary_polyline.geojson : the output file name
draftplan.pdf : input pdf file name
Layers_Boundries_DDA_ZONE_Boundary_polyline : Layer name inside the PDF : you can get this by opening the pdf somewhere and inspecting the Layers panel or so.
-s_srs EPSG:32643 : source CRS or datum or srs - lots of terms and I don't make them! Initially if you don't know then just put EPSG:4326
t_srs EPSG:4326 : destination CRS (or datum or srs). You want EPSG:4326 if you want latitude-longitude.
--config OGR_PDF_READ_NON_STRUCTURED YES : it works when this is put in. IDK details.

Finding CRS

Initially we didn't know what the source CRS was, so found out by this method:

convert it thinking its EPSG:4326 only, and get a file with lat-longs going into super high numbers
Load it into QGIS. Find and press the "Zoom to layer" button in the toolbar. We can see the shape, but it's out of whack relative to say the openstreetmap background XYZ layer. No issues.
Locate one point in the shape whose real-world location you know.
Note the X,Y co-ordinates of this point (the large numbers) (note: X is longitude and Y is latitude, so you'll see long-lat, not lat-long)
Now, open http://projfinder.com/ and move the map to that real-world location
paste in the X and Y values you'd noted, and press the Find.. button
This site now figures out all the potential CRS systems of your layer. Pick the most appropriate one, note down the EPSG:____ code and plug it into your command line at -s_srs EPSG:____
Now re-run your command and load your output geojson in QGIS or other tool like https://geojson.io and hopefully it should be proper

Disclaimer

This is specific to "Geo" PDFs where your layer is something on a map. I am not sure how this method may pan out for vector graphics.. but still you can just load the output onto QGIS and press zoom to shape and should be able to see the shape - which you can then export as image / screenshot and get on with business.

Where ogr2ogr comes from

https://gdal.org/programs/ogr2ogr.html - it's part of GDAL
Download it here: https://gdal.org/download.html
If you're not able to install / make it work, use the docker one: see https://github.com/OSGeo/gdal/tree/master/docker#example

Further references

https://gdal.org/drivers/vector/pdf.html says here there's a direct thing for geospatial pdfs too, but.. no sample command :(

Nikhil VJ · Accepted Answer · 2020-04-07 02:47:35Z

0

I just came across this article posted in Sep 2019: https://north-road.com/2019/09/03/qgis-3-10-loves-geopdf/

Apparently if it's a "Geo PDF" we can import it into QGIS and get all the vector layers etc. And then of course in QGIS one can do whatever with the layers. Might be worth a quick try.

answered Apr 7, 2020 at 2:47

Nikhil VJ

3611 gold badge3 silver badges9 bronze badges

Add a comment |

LPChip · Accepted Answer · 2015-12-15 18:25:53Z

-1

You should work with the autocad version or from there export it as separate layers. Once its exported to a PDF, it becomes a bitmap image that you can't edit as separate layers.

answered Dec 15, 2015 at 18:25

LPChip

62.4k10 gold badges101 silver badges146 bronze badges

I don't have the autocad version :D If I did, or course I would use that.
– Benjamin Tamasi
Commented Dec 15, 2015 at 18:48
1

It doesn't become a bitmap image, because when I open it with adobe-reader, if I am fast enough, I can capture the first layer rendered without the seccond with a print-screen :) So it can't be a bitmap... bitmaps don't do that.
– Benjamin Tamasi
Commented Dec 15, 2015 at 18:49

Add a comment |

Stack Exchange Network

How can I split PDF file into layers

4 Answers 4

Automation

Command line ogr2ogr

Finding CRS

Disclaimer

Where ogr2ogr comes from

Further references

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
pdf
adobe-reader
vector-graphics
ghostscript
.

Hot Network Questions

How can I split PDF file into layers

4 Answers 4

Automation

Command line ogr2ogr

Finding CRS

Disclaimer

Where ogr2ogr comes from

Further references

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged pdfadobe-readervector-graphicsghostscript.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
pdf
adobe-reader
vector-graphics
ghostscript
.