Skip to main content

All Questions

Tagged with
0 votes
1 answer
41 views

File Size increases on basic pdf -> raster image -> pdf roundtrip with `pdfimages` + `magick`

Why do I not succeed into making this simple pdf → raster image → pdf round-trip file-size-stable? $ # Get original file (8KB, just extracted from my scanner). $ curl "https://nextcloud.mbb.cnrs....
iago-lito's user avatar
  • 366
0 votes
2 answers
123 views

PDF Composed of Images of Text - How to Convert to text file?

I have a PDF composed of many scanned pages. An example snippet of the text is shown below (do not worry about privacy, as this is a publicly available document). As you can see, it is very difficult ...
Benyamin's user avatar
  • 101
1 vote
0 answers
28 views

Remove watermark from pdf of images [duplicate]

I have a pdf consiting of page scans and they have they have an annoying watermark that appears as a grey image in the middle of pages difficulting reading. Since the pdf is made of image files there ...
jsb's user avatar
  • 119
0 votes
0 answers
58 views

How to pre-process the background of this image in order to OCR the table properly?

I have to OCR the book of tables consisting of 500+ pages but the maker stamped every page. Can anybody help me on how should I process it in order to be able to recognize it let's say in FineReader? ...
off-signer's user avatar
3 votes
1 answer
946 views

When a PDF file only contains a scanned image, is it just a JPG image inside a PDF container?

Many scanners can scan a page into a PDF file. When this is done, is the PDF file really just a container that contains a single image? Is that image typically a JPG image, a PDF image, or a ...
End Antisemitic Hate's user avatar
1 vote
1 answer
341 views

How to extract image rotation from a PDF

I'm extracting all the images from a bunch of scanned PDFs using pdfimages, in order to process and repackage them. The problem is that some images are rotated 90° (either CW or CCW) and others are ...
Tobia's user avatar
  • 378
1 vote
1 answer
2k views

reducing the size of PDF file of scanned images

I downloaded this PDF file from a website which is 350 KB in size with 20 pages. All pages are scanned images. I extracted the images using Adobe Acrobat Pro which are 1.32 MB in size collectively (...
living being's user avatar
  • 1,086