Why are image files from pdfimages bigger than the PDFs they came from?

Asked May 22 '16 at 23:31

Active Dec 17 '18 at 10:47

Viewed 425 times

On Ubuntu 16.04, pdfimages -all produces image files whose sum of storage use is greater than the PDF files from which they came.

Is there any explanation for this? How can I extract image files that are no larger than the size they're using on the .pdf without compensating on picture quality?

Note: I've tried an approach that uses the pdftohtml command (Extracting embedded images from a PDF) but the files don't seem to allow me because of some kind of permission relating to extracting text (I get the error: Permission Error: Copying of text from this document is not allowed.).

edited Dec 17 '18 at 10:47

Zanna

69,223
56
216
327

asked May 22 '16 at 23:31

Orion751

1

It may be because the images are represented within the PDF as *vector graphics* whereas `pdfimage` outputs as *raster images* such as JPEG. It is possible to extract vector graphics directly (to SVG, or EPS) for example using image programs such as `inkscape` - however it may not be easy to automate – steeldriver May 23 '16 at 02:06

Why are image files from pdfimages bigger than the PDFs they came from?

0 Answers0