145

I need a command line tool for editing metadata of pdf-files.

I'm using a Aiptek MyNote Premium tablet for writing my notes and minutes on this device, import them later and convert them to pdf automatically with a simple script using inkscape and ghostscript.

Is there any command line tool to add some categories to the pdf's metadata, so i can find the pdf later (e.g. with gnome-do) by categories?

Update: I tried the solution with pdftk and it works, but it seems that gnome-do doesn't take care of pdf-metadata. Is there a way to get gnome-do to do that?

bdr529
  • 2,888
  • 2
  • 16
  • 14

8 Answers8

175

Give exiftool a try; it is available from the package libimage-exiftool-perl in the repositories.

As an example, If you have a pdf file called drawing.pdf and you want to update its metadata, use the utility, exiftool, in this way:

exiftool -Title="This is the Title" -Author="Happy Man" -Subject="PDF Metadata" drawing.pdf

For some reason the Subject entered ends up in the keywords field of the metadata in the pdf file. not a problem in some cases, even desirable, however, this may be problematic: evince and the nautilus metadata previewer do not show this, but Adobe Acrobat viewer and PDF-XChange viewer do.

The program will create a backup of the original file if you do not use the -overwrite_original switch. This means a duplicate will exist in the folder where the updated pdf is. From the example above, a file named drawing.pdf_original will be created.

Use the overwrite switch at your own risk. My suggestion is not to use it and script something to move this file to a better location just in case.

waldyrious
  • 2,189
  • 2
  • 21
  • 36
Sabacon
  • 39,108
  • 6
  • 36
  • 42
  • This works fine, thank you. It's curious how much tools come along with ubuntu / linux. I wish to know more about all this stuff :-). Meanwhile i used pdfmod after importing my files. This is a nice little application. – bdr529 May 04 '11 at 09:05
  • 38
    Note that: *"[All metadata edits are reversible](http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/PDF.html). While this would normally be considered an advantage, it is a **potential security problem** because old information is **never actually deleted** from the file."* – nutty about natty Aug 12 '14 at 07:11
  • 10
    @nuttyaboutnatty if you want to purge all remnant and unused metadata entries, you can linearize the PDF file right after processing it with exiftool. This is described in more detail in [this Github gist](https://gist.github.com/hubgit/6078384). – Glutanimate Aug 13 '14 at 23:41
  • @Glutanimate Nice link, but not 100 % convincing/authorative IMHO. Eg, why does it cite `exiftool` without mentionin the above limitation? Because it's just a "gist" and doesn't claim to be bullet-proof / hold water? Could be that linearization does indeed go in the direction of anon., but does it go *all the way*? – nutty about natty Aug 14 '14 at 06:56
  • 13
    @nuttyaboutnatty Well, of course it's not an authoritative source but that's only because nobody ever took the time to write one. However, I can assure that the method described by the author works. Try it out yourself: 1.) Take a PDF that has some tags and "delete" all metadata with `exiftool -overwrite_original -all:all="" file.pdf`; 2.) Use `exiftool -PDF-update:all= file.pdf` to confirm that there is still old metadata present; 3.) linearize the file with `qpdf --linearize file.pdf`; 4.) Check again, like you did in 2.); all metadata should be gone; – Glutanimate Aug 14 '14 at 07:54
  • 5
    5.) confirm that the file has been purged of all metadata by looking at the PDF dictionary (`pdfinfo -meta file.pdf`) – Glutanimate Aug 14 '14 at 07:55
  • Thank you. It also worked like a charm on my Mac with macOS 10.12 and "homebrew". – Cesco Feb 09 '18 at 11:13
  • 2
    Works perfectly. I regularly want to copy the metadata from one PDF to another, in which case `exiftool -overwrite_original -tagsFromFile ` is what I need (the option `-overwrite_original` overwrites the original ``). – AstroFloyd Apr 22 '18 at 15:57
  • 1
    Do note that pretty important bit from the `exiftool` manual : *"3) Changes to PDF files by ExifTool are reversible (by deleting the update with "-PDF-update:all=") because the original information is never actually deleted from the file. So ExifTool alone may not be used to securely edit metadata in PDF files."* – Alex Aug 14 '19 at 08:12
  • A shell specific observation, but when installed from apt, the Bash _file name completion_ does not work if options are written on the command line, e.g. `exiftool -Title="Smthg" filenam[tab]`. Filename completion only works with: `exiftool finenam[tab]`. There is no completion script for exiftool in `/usr/share/bash-completion/`, so this annoying problem could (should) be fixed there. – PlasmaBinturong Jan 29 '21 at 17:03
23

You can edit PDF metadata using pdftk. Check out the update_info (or update_info_utf8 if you need accented characters) parameter. As for data file, below is an example:

InfoKey: Title
InfoValue: Mt-Djing: multitouch DJ table
InfoKey: Subject
InfoValue: Dissertation for Master degree
InfoKey: Keywords
InfoValue: DJing, NUI, multitouch, user-centered design
InfoKey: Author
InfoValue: Pedro Lopes

(Source)

waldyrious
  • 2,189
  • 2
  • 21
  • 36
Olli
  • 8,811
  • 1
  • 34
  • 40
  • 1
    Ok, this means i have to export the metadata to a textfile, edit them and reimport the textfile. Is there a way to directly set a single metadata from command-line? – bdr529 Feb 22 '11 at 06:48
  • There may be, but I couldn't find it. – Olli Feb 22 '11 at 07:26
  • 1
    `pdftk` seems to Unicode characters in the metadata. – Mechanical snail Apr 21 '13 at 21:06
  • 1
    I had some problem using `pdftk` on new pdfs (newer versions are encrypted via AESV2). Seems like it's discontinued. `exiftool` was working better. – s1lv3r Aug 26 '13 at 14:58
  • @s1lv3r but exiftool isn't allowing for custom tags, isn't it? I have a problem where pdftk "hangs" on dump_data for a PDF, but after using exiftool on it, and adding 'author' thag, it works. I should get a pdf test program – JorgeeFG Apr 14 '14 at 20:12
  • 8
    to use pdftk, what you need to do is: 1) `pdftk book.pdf dump_data output report.txt` 2) edit report.txt 3) `pdftk book.pdf update_info report.txt output bookcopy.pdf` – craq Oct 24 '17 at 03:02
  • You can add step (4) to @craq's above: so as not to mess with the file creation info: `touch -r book.pdf bookcopy.pdf`. This will give the new file the old file's creation/modification dates. However, I note that after @craq's 3 steps, my resulting file is 20% larger than the original one. – CPBL May 17 '18 at 16:56
  • The link specified for 'Source', is no longer valid; it is 404'd. – Tano Fotang Oct 19 '18 at 19:49
  • Unfortunately, this functionality is [broken as of pdftk 2.02](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=788660). – Pont Nov 21 '18 at 20:06
  • @craq your comment should be an answer as it has the clearest instructions. I would upvote it. – frederickjh Feb 01 '21 at 13:27
  • 1
    @Pont seems to be working again in `pdftk` version 3.0.9. – frederickjh Feb 01 '21 at 13:35
11

Using Ghostview

Install ghostscript with:

$ sudo apt install ghostscript

Create a file named pdfmarks with similar content:

[ /Title (Document title)
  /Author (Author name)
  /Subject (Subject description)
  /Keywords (comma, separated, keywords)
  /ModDate (D:20061204092842)
  /CreationDate (D:20061204092842)
  /Creator (application name or creator note)
  /Producer (PDF producer name or note)
  /DOCINFO pdfmark

then combine this pdfmarks file with a PDF, PS or EPS input file:

gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=output.pdf original.pdf pdfmarks

Source: http://milan.kupcevic.net/ghostscript-ps-pdf/

muru
  • 193,181
  • 53
  • 473
  • 722
Serge Stroobandt
  • 4,838
  • 1
  • 45
  • 58
9

To elaborate on the pdftk method, which is nice because it shows you everything that's being set, at the same time as allowing you to change anything you like, here is a script (for your .bashrc or other aliases file) to do it with one command. This creates a new version of the file you want to edit, opens your favourite editor with the metadatafile, and then implements your changes and sets the file creation/modification time on the modified PDF file to be the same as the original. To use it, after resourcing your .bashrc file, just type

editPDFmetadata myfile.pdf

Here's the alias:

editPDFmetadata() {
OUTPUT="${1}-new.pdf"
METADATA="tmp${1}-report.txt"
pdftk "${1}" dump_data output "$METADATA"
$EDITOR "$METADATA"
pdftk "${1}" update_info "$METADATA"  output "$OUTPUT"
touch -r "${1}" "${OUTPUT}"
}

Simply place the definition above into the .bashrc file in your home folder, then open a new terminal and it will be ready to use.

CPBL
  • 758
  • 1
  • 9
  • 20
  • 1
    This is excellent, but I recommend quoting your variables when using them (e.g.: `pdftk "${1}" dump_data ...`) in case of PDF files with spaces or other special characters in their filename. – Niayesh Isky Mar 16 '20 at 02:55
  • @NiayeshIsky Thanks! Done. Hopefully the filename does not have quotes in it? – CPBL Mar 17 '20 at 03:03
  • 1
    Thanks! About quotes: Not in my case, at least :) (Sorry - I just noticed there is one `$METADATA` that is still unquoted, on the second `pdftk` line. I don't know if AU will allow such a small edit though.) – Niayesh Isky Mar 18 '20 at 05:36
6

I needed to blank out the Author field in a PDF exported from LibreOffice. None of the solutions listed above worked for me, so I used hexedit and overwrote the Author field. Blunt instrument but effective!

In detail:

  1. Run:

    $ hexedit file.pdf
    
  2. Tab to switch to ASCII.

  3. Ctrl+S to search for "Author".

  4. Skip the <FEFF at the start of the field.

  5. Write 0 over all characters (except I preserved three 0x03 characters... YMMV) up to the closing >.

  6. Ctrl+X to save and exit.

BeastOfCaerbannog
  • 12,964
  • 10
  • 49
  • 77
Jonathan
  • 61
  • 1
  • 1
5

I have extensively tested the functionality of pdftk and exiftool. I have used exiftool both at command line and through a graphical window. These have been tested for small, medium size and very large PDF documents and found to have issues with the largest and most complex PDF documents. In my experience, the pdftk / exiftool have top functionality only for small and for simple-in-formatting PDF documents. For large and complex PDF documents (eg more than 80 pages with multiple fonts) images and/or characters may fall out from the last pages after the metadata has been edited. The solution may be in the use of Ghostview, which I saw just now. No doubt these programs will improve with time.

In the meantime, I have found a solution in using the present form of Wine in Ubuntu with a one-window tiny freeware program, which works also for these large, complex PDF documents: BeCyPDFMetaEdit (available eg from freeware libraries like SoftPedia).

Aristo T.
  • 51
  • 1
  • 1
  • It looks like BeCyPDFMetaEdit has been discontinued, as of Nov. 2018. The last available website I see for it is archived here: https://web.archive.org/web/20180929111456/http://www.becyhome.de/becypdfmetaedit/description_eng.htm – hackerb9 Jul 12 '23 at 07:40
1

Another command is ebook-meta (avaiable after installing Calibre).

To see tags:

ebook-meta file.pdf

To change title:

ebook-meta file.pdf -t "Conversations with Ambrosius"
Yarifuri
  • 11
  • 1
  • Neat! I like that it handles --date="1986 Feb 1" and converts it into the format PDFs use internally, whereas exiftool required me to use --CreateDate="1986:02:01 12:00:00Z". Unfortunately, ebook-meta's --date option changes neither the creation date nor the modification date in the PDF. Also, it has nothing like exiftool's "-preserve" option to preserve the file's modification timestamp. – hackerb9 Jul 12 '23 at 07:33
0

This is in the act library so you can edit PDF metadata from the command-line here as well.

$ npm install @lancejpollard/act -g
$ act update input.pdf --title foo --author bar --subject baz -k one -k two

You can also set -p publisher, -c creator, -t0 created date, and -tn updated date.

Lance
  • 101
  • 3