How to remove watermark from pdf using pdftk?

Question

I need to remove some stupid email watermark that expands across all pages of a public domain book. I looked at pdftk man page and some examples but still can not figure out how to remove the watermarks. I appreciate your hints.

score 74 · Answer 1 · edited Feb 07 '14 at 20:24

74

Just a little add-on to Dingo's answer as it did not work for me:

I had to first uncompress the PDF document in order to be able to find the watermark and replace it with sed. The first step involves uncompressing the PDF document using pdftk:

pdftk original.pdf output uncompressed.pdf uncompress

now, the uncompressed.pdf can be used as in Dingo's answer:

sed -e "s/watermarktextstring/ /" uncompressed.pdf > unwatermarked.pdf

I then repaired and recompressed the document:

pdftk unwatermarked.pdf output fixed.pdf compress

edited Feb 07 '14 at 20:24

pabouk - Ukraine stay strong

6,568
5
40
52

answered Jan 15 '13 at 17:19

Philippe

927
7
8

You are a life-saver! Thank you!!! :) – johndodo Nov 07 '13 at 11:11
1

This is really awesome! – qed Jan 29 '14 at 14:59
5

I took this process, made it slightly fancier, and wrapped it up in a Python script. It is on github [here](https://github.com/agarden/remove-pdf-watermark/tree/master). – Alexander Garden Apr 11 '14 at 04:00
@Alexander Garden It doesn't work, `TypeError: str() takes at most 1 argument (2 given)` when used following the usage advice given – 8bitjunkie Feb 28 '16 at 19:40
@8bitjunkie Can you open a github issue with a full stack trace? – Alexander Garden Feb 29 '16 at 20:33
I was having issues with this approach due to pdftk not being able to open the unwatermarked.pdf file. What did the trick was to replace the watermarktextstring via sed using a replacement string which was just N number of space characters where N is the length of the original watermark. In other words, make sure your uncompressed.pdf and unwatermarked.pdf have the same length – gdecaso Apr 11 '17 at 17:39
+1 I used the sed command `/watermarktextstring/d` instead because my water mark string was interlaced with formatting instructions or typographic hints or something like that. – David Foerster Oct 12 '17 at 16:12
@Philippe The second command gives an error: "sed: RE error: illegal byte sequence", what should I do? – Karlo Mar 31 '18 at 17:46
Since qpdf is the default tool on many distros, [here](https://unix.stackexchange.com/a/17713/29426) is how to uncompress using qpdf. – akhan Nov 15 '18 at 17:39
@Philippe any idea on how to batch remove watermark? – Clain Dsilva Nov 20 '18 at 12:43
2

Didn't work to remove watermark added by Master PDF Editor. – fccoelho Dec 27 '18 at 12:10
Genius :) Thank you. – Simd Aug 11 '20 at 16:36
@fccoelho - Master PDF Editor inserts its watermark as a stream object, not a line of text. See my add-on to Philippe's answer. Just replace his sed command with my editing steps, taking care to use the correct input filename when you do the repair step. – JohnGH Dec 29 '22 at 16:07
sed command did not work for me, I was getting `sed: RE error: illegal byte sequence`. So I replaced this step with opening the pdf file in vim and running the following command: `:s/watermarktextstring//gc`. – wintermute Apr 07 '23 at 22:54

score 44 · Accepted Answer · answered Jul 12 '12 at 13:56

44

very simply task to perform:

use sed:

 sed -e "s/watermarktextstring/ /g" <input.pdf >unwatermarked.pdf

but, after, be sure to repair resulting output pdf

pdftk unwatermarked.pdf output fixed.pdf && mv fixed.pdf unwatermarked.pdf

all into one command:

 sed -e "s/watermarktextstring/ /g" <input.pdf >unwatermarked.pdf && pdftk unwatermarked.pdf output fixed.pdf && mv fixed.pdf unwatermarked.pdf

text watermarks are nothing else than a text between two tags inside the pdf compressed code

answered Jul 12 '12 at 13:56

Dingo

974
1
9
11

1

Fantastic! worked like a charm. please just rename the email address to a fictitious one. I don't want the guy how spoiled the book be targeted by spammers. Specially as he is probably the one who has made the pdf. Many thanks. – hnns Jul 12 '12 at 14:17
done! Changed specific string with a generic string – Jul 12 '12 at 14:28
Does anyone know how to modify this solution to get rid of a link watermark? I got rid of the text, but there's still a small square left where the text used to be. – 425nesp Oct 20 '13 at 07:43
pdftk crashed when I ran this. – Cerin Sep 03 '18 at 11:28
@Dingo how do batch process it? I mean multiple files – Clain Dsilva Nov 20 '18 at 13:11
Multiple files having same text string to replace or different strings for each file? – Dingo Nov 20 '18 at 15:15

score 2 · Answer 3 · answered Dec 29 '22 at 15:19

Another add-on to Philippe's add-on to Dingo's answer...

The watermark I needed to remove was a stream object (which is a multi-line block of code), not a single line, so a single line sed command wasn't going to work for me.

I needed to use a text editor to find and remove it.

I first used Philippe's solution to uncompress the PDF.

Then opening the uncompressed.pdf in my favourite text editor, I found a block of text more than 50 lines long which I could see was obviously the code for the watermark.

The watermark was included in the document as a PDF stream object. ** (see below)

The lines defining the stream object that I need to remove started with a line containing only:

<num> 0 obj

where <num> was a number at the start of the line identifying the specific object.

I needed to delete this line and everything from it down to and including the first instance of

endstream
endobj

that followed that obj line. i.e. the whole stream object definition.

The endobj line was followed by the next <num> 0 obj 2 more lines down.

It was easy for me to see which stream object was the watermark code because it kindly included the word "Watermark" :-)

Yours may well not have such helpful text, but if you are patient:

Back up your original uncompressed PDF
Make a temporary copy of the uncompressed PDF
Find and remove an object stream from the temporary copy
Save your changes to the temporary copy.
Open the temporary copy in a PDF viewer
Check if the object stream you just removed was the watermark

If it wasn't, go back to step 2, rinse and repeat removing a different object each time until you've removed the watermark object.

** I learned about stream objects, including seeing examples by searching for "PDF object stream" on the web. https://blog.didierstevens.com/2008/05/19/pdf-stream-objects/ has a great summary, and "Chapter 1. PDF Syntax" of "Developing with PDF" by Leonard Rosenthol which is available to view on O'Reilly's website goes into more detail.

score -3 · Answer 4 · edited Nov 13 '20 at 21:49

-3

To remove watermark from pdf

open the PDF in notepad++ or textpad
search for desired watermark text and use 'find and replace' option to replace it with nothing (blank)
save the file
Open in standard adobe reader

Will throw error like - "file damaged,repair needed"

Exit, you will be prompted to save the file

save it

edited Nov 13 '20 at 21:49

Madhubala

1,781
3
11
22

answered Jan 24 '16 at 11:54

user549273

9

1

Is this a general solution? What is www.it-ebooks.info? – Karlo Mar 31 '18 at 17:23

How to remove watermark from pdf using pdftk?

4 Answers4