1

I have a scatterplot which I generated with R, it shows many thousands of overlapping points. I need to graphically annotate the generated PDF further, in PDF format, with Inkscape. However, it is simply not practicable to work with this file as there are too many points (Inkscape crashes, becomes too slow to work with anyway, points are very hard to select etc).

I want to "flatten" the PDF, ie. remove all information that is not shown anyway (points hidden beneath heaps of other points etc).

I still want to retain the vector information, I do not want to rasterize the figure.

This has to be done with freely available tools, and I do not have Acrobat X.

I searched for flattening of PDF in a bash/linux context, however I then find tools concerned with the processing of PDF forms, which is absolutely a different topic.

1 Answer 1

4

This is a perfect example of a great problem to solve, but the wrong question to ask. You're already working with the input data in R, so why not process it there? PDF is essentially binary, so you're out of luck doing anything with it as-is.

Your best bet is to pre-process the data in R before creating the PDF (this is what R was created for, after all). The best way to solve this would be to loop through your input data, and delete all other points sharing the same coordinates within a certain threshold. I would wrap it up into a function, so you could experiment with different thresholds - but I'm sure you get the idea.

Don't over-complicate things by introducing unnecessary levels of abstraction and additional file formats. You already have the data, work with the data.


I believe the following Stack Overflow questions may be of assistance:

how to remove partial duplicates from a data frame?

Identify duplicate data with a threshold


Lastly, you may want to consider using a heat-map if applicable, as this could show the same information (the colour representing the density of points found in certain areas) albeit without having to individually render every single data point.

3
  • 3
    There are no wrong questions to ask and your "answer" is just patronizing and not constructive.
    – user50105
    Commented Apr 17, 2013 at 2:29
  • 2
    @gojira as someone working with a dataset in R, don't you think the easiest way would be to simply generate a scatterplot with less points? That would literally solve every issue you outlined in your question. I believe my answer provides an optimal solution to what you want. If you're unwilling to at least consider such a solution, then I'd argue that this question is not constructive as it sits. Commented Apr 17, 2013 at 3:06
  • 1
    This wouldn't work for GWAS Manhattan plots, every point has its unique x and y position, no duplicates.
    – zx8754
    Commented Mar 28, 2018 at 21:26

You must log in to answer this question.