2

On Ubuntu 16.04, pdfimages -all produces image files whose sum of storage use is greater than the PDF files from which they came.

Is there any explanation for this? How can I extract image files that are no larger than the size they're using on the .pdf without compensating on picture quality?

Note: I've tried an approach that uses the pdftohtml command (Extracting embedded images from a PDF) but the files don't seem to allow me because of some kind of permission relating to extracting text (I get the error: Permission Error: Copying of text from this document is not allowed.).

Zanna
  • 69,223
  • 56
  • 216
  • 327
Orion751
  • 83
  • 1
  • 5
  • 1
    It may be because the images are represented within the PDF as *vector graphics* whereas `pdfimage` outputs as *raster images* such as JPEG. It is possible to extract vector graphics directly (to SVG, or EPS) for example using image programs such as `inkscape` - however it may not be easy to automate – steeldriver May 23 '16 at 02:06

0 Answers0