1

I'm running GPL Ghostscript 9.27 ghostscript version. I've compressed some PDF files (in Linux) through gs command probably with success (based on what I've seen).

But after checking some particular PDFs, I've seen some modifications of the color in some pages, at least in two PDFs; in fact for example some files result to have some pages with words of entire paragraphs turned in red (besides the majority of pages left like the original, that has writings in black characters). I'm talking about text data as vector symbols.

Moreover another PDF that has some images in black and white (gray scale), now with the compression has the text (as vectors) left of the same color (black), but has all the images turned in red and black. So here the problem is different, I suppose, because affects just all the raster images. So in the file where there is the raster problem there isn't the vector one.

Down here in the top, a part of the file with images affected, after (left) and before (right) compression; in the bottom left we have another file after compression with text (as vector) in red and in bottom right another page with text (as vector) in normal black color like the original.

Here an example.

The command I used left the version of PDF output like the original (that is specified in -dCompatibilityLevel). It is like this:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=original_version -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dSAFER -dBATCH -sOutputFile=file_out file_in

The program doesn't give me any errors. I tried removing the -dQUIET or the -dSAFER option, but nothing changed. Among other things I thought that -dSAFER option should have prevented that gs change the PDF integrity. The command seemed to work fine for me, until now.

So, how to avoid this unexpected changes with ghostscript? What is the cause of this problem?


I've found trying the command pdfimages -list pdf_file with the PDF that has the problem that affects raster images, that all images have image color space of type "Separation" (column color = sep). But the same command with other various PDF with just text problem (or with no problems), have image color space different (like rgb, cmyk, gray, etc.). I don't know a lot about if this difference has consequences related to raster gs problem.


Thanks to Yorik for asking some explanations and clarifications.

bonzo
  • 31
  • 6
  • This may be a problem with some of the images being nonstandard or having palette definitions etc. that gs is unsure of, or old fax-based image format etc.. You might try looking carefully at the image properties in the original document (if gs can emit "preflight info", try that). It may also be that the images are broken in a away that the original PDF does not highlight. As far as the text, it is unclear if the text is part of an image (raster) or text data displayed as vector glyphs. If raster, then it may be the same image or suffer from the same problem as the pictures in the image. – Yorik Jul 19 '22 at 17:09
  • @Yorik Regarding the text, it is not part of an image, it is text as vector; and the problem affects just some paragraphs (in one PDF); in another PDF, made of two pages, it affects all vectors, both vectors for text and the ones for lines etc., that were black. Where this problem verifies, the images aren't affected. Regarding the problem of images, it affects another entire PDF (just one) but not the vectors in it (text etc.). I searched in gs online manual but didn't find "preflight info". – bonzo Jul 19 '22 at 20:25
  • @Yorik I've found trying `pdfimages -list pdf_file` with PDF with raster problem, that all images have the column color equal to "sep", that is image color space of type "Separation". And this is different from other PDF. Maybe is rilevant. I've modified the question to include these informations. – bonzo Jul 19 '22 at 20:25
  • I am not well-versed in PDF construction, but I think that "separation" color space implies a spot color (i.e. not one of the inks of the CMYK model), possibly 1-bit per pixel. Not sure what this means for a solution, but I am assuming `gs` chokes on this or there is a bug in how it handles this. Guessing that it is trying to make a 3- or 4-color image using null data for several channels. – Yorik Jul 19 '22 at 21:09
  • clues/informational discussion(s): https://stackoverflow.com/questions/31704578/ https://stackoverflow.com/questions/51021506/ – Yorik Jul 19 '22 at 21:11
  • 1
    @Yorik I've found a solution for both problems. As the problem regards color I searched in Ghostscript manual for color options for pdf output and I found _ColorConversionStrategy_. Finally adding the option `-sColorConversionStrategy=LeaveColorUnchanged`, the output file colors of images remain the same as the original. I tried using this option in pdf with vector problem and fixed it too. I didn't thought but this opt. has effect on vector text. I'm happy, thanks for the interest. I'm going to add an answer. – bonzo Jul 20 '22 at 18:32

1 Answers1

2

I finally found one solution to both problems (raster and vectors). I searched for colors in Ghostscript online manual for PDF output and found the switch ColorConversionStrategy which changes color space (I believe). The choices are: LeaveColorUnchanged, Gray, RGB, CMYK or UseDeviceIndependentColor. Then I tried with the PDF whose images was changed the command:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=original_version -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dSAFER -dBATCH -sColorConversionStrategy=LeaveColorUnchanged -sOutputFile=file_out file_in

and the output now has the right original colors. The other options did this (the original images were grey scale): Gray the images remain like the original, RGB images changed in red and black, CMYK images changed in white and cyan, UseDeviceIndependentColor remain the same (but manual says it is less compatible with ps).

I tried the same command with all the PDF affected by vector text color change. It always fixed the problem too. I retried RGB option and caused the problem. I'm not expert in this but, this was unexpected because I thought that color spaces regard only raster.

So the best option between all in these cases I think is -sColorConversionStrategy=LeaveColorUnchanged.
I think is good to include this option as a norm, in case you'll use /screen or /ebook setting (at least in version 9.27; read P.S. below) .

The matter was that the program with /ebook setting sets automatically ColorConversionStrategy to RGB that in my case causes the problem. This switch generates more compatibility with LeaveColorUnchanged instead.


About the cause of this problem I don't know specifically, but the first suspect is a problem in the compression of images with color space Separation.
I noticed through pdfimages -list that color space "Separation" in PDF input remains in output (but /ebook should have changed it).

P.S.: Probably according to this answer (to similar problem), I discovered just now, there is a bug in Ghostscript version 9.27.
My workaround for me works, but the problem should haven't happened normally. Probably 9.50 version works well. The current release is 9.56.1, but I'm running a Debian based system so it's not much updated.

bonzo
  • 31
  • 6