11

I have noticed only now that my Word 2010 (docx) documents that are just a single page long and include a simple WMF vector graphic and a bit of text are almost 1 MB large. The Word document is only 50 kB and a PDF file created with Bullzip PDF printer is about the same size. So what is Microsoft writing into the other 950 kB?

Update: As I keep getting answers recently that all do not apply, I'd like to save you the work. The issue has gone away after using Windows 7 instead of XP (which I did over a year ago). Something doesn't seem to be supported on the old system, I suspect it's some font subsetting or so. Also I cannot try your suggestions because the issue does not exist anymore. So I'm not able to accept answers to this.

ygoe
  • 2,178
  • 7
  • 27
  • 39
  • Does the output match? I am going to guess Word would match the PDF format closer then Bullzip ( persnally never heard of it ). – Ramhound Dec 08 '11 at 19:17
  • Could be several things -- Image size/quality, embedding fonts, etc. See: [When exporting a document as a pdf file in Word 2010, what's the difference between standard and minimum size publishing?](http://superuser.com/questions/190570/when-exporting-a-document-as-a-pdf-file-in-word-2010-whats-the-difference-betw) and [How to compress .pdfs in word 2007?](http://superuser.com/questions/120268/how-to-compress-pdfs-in-word-2007) and [compressing pdf created by office 2007](http://superuser.com/questions/244766/compressing-pdf-created-by-office-2007) for some ways to deal with it. – Ƭᴇcʜιᴇ007 Dec 08 '11 at 19:22
  • 1
    This only started happening after the latest WORD2010 updates. I have WORD2010, and Acro Reader 9.5, but one computer did not get the recent WORD updates. That one takes a heavily loaded DOCX file with images, and converts it DOWN from 4 MB to 3 MB, the other computer with recent WORD updates converts DOCX from 4 MB to 18MB. I cannot use such a large file. DO NOT UPDATE YOUR WORD programs. –  May 08 '13 at 02:53
  • 1
    It seems that Word is exporting images in very high-res bitmap format. Zoom in and compare PDFs generated by Word and Bullzip and compare the quality – tumchaaditya Mar 24 '14 at 20:03
  • 1
    Oh dear, this is old. Word 2010 started to make more reasonably-sized PDF files after switching from Windows XP to Windows 7. I assume that Windows 7 has some font subsetting API that Word uses that Windows XP has not, so that it always included the complete font, or something. – ygoe Mar 25 '14 at 07:40
  • 1
    I just had the same problem using Word 2013 on Windows 7 Pro 64 bits: I have a 14kB Word 2013 (docx) file of ONLY lorem ipsum text with default formatting, Word produces a 90kB PDF when PDFcreator generates a 22kB PDF file. And it grows quickly, the same file with some formatting (Title, headings 1 and 2), 15kB Word file (no pictures) becomes a 230kB file with Word's PDF save as (using the maximum compression), though only 30kB with PDFcreator. My problem is that PDFcreator doesn't process the links. – Thomas BDX Apr 08 '14 at 21:10
  • you can have a look at what word/excel/powerpoint put into its files by renaming them to .zip and uncompress them. hope you will see somthing there which will proof the large file size maybe it are some font files or a larger version of the image. – konqui Jun 03 '14 at 18:01
  • @konqui: Fonts are usually not embedded in .docx files and there were no raster images in my case. The .docx file isn't that large, it was only the PDF. But as I said, only on XP. Nobody uses XP anymore. (At least should.) – ygoe Jun 04 '14 at 06:39

6 Answers6

3

This is still a problem with Word 2016. Perhaps not the same as the OP had, but it's still there: start with a 1 page 20 KB document, save as PDF, get a 300 KB PDF.

I can't say why Word does this, but there is an easy way to minify these PDF files: install GhostScript, then run the following command:

gswin64c.exe -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH "-sOutputFile=%2" "%1"

where %1 is the input PDF and %2 is the output PDF. Turns that 300 KB PDF into a 40 KB PDF. Still not as small as CutePDF (that one managed about 30 KB for the same document) but a vast improvement.

Or just skip this step and print to CutePDF directly.

RomanSt
  • 9,553
  • 15
  • 56
  • 74
  • 1
    See Arjan's comment on Jakke's answer. – fixer1234 Oct 02 '16 at 21:44
  • Backing up the answer. Instead of 670k, cutePDF created a 170k file. – szako May 05 '17 at 04:34
  • Yes, calling GhostScript gives huge savings - especially if used together with "Microsoft print to PDF" instead of "Save as PDF". However GhostScript also decreases quality of embedded graphics. Is it possible to keep the quality of embedded graphics and just optimize the rest? – Pygmalion Apr 04 '20 at 13:21
  • @Pygmalion see [this](https://superuser.com/questions/360216/use-ghostscript-but-tell-it-to-not-reprocess-images) - I haven't tried those but there are some options offered. – RomanSt Apr 04 '20 at 13:36
  • 1
    Thanks. If someone comes with the same question, in my case this helped: `-dDownsampleColorImages=false` (I tried them all and this one produced the desired effect.) – Pygmalion Apr 04 '20 at 14:57
1

Many reasons.

  1. XML Styling
  2. Images converted to base64, which is 33%larger than the original
  3. Other stuff like fonts etc...
  4. A lot of stuff that seemingly doesn't do anything!
Nobody
  • 21
  • 5
1

Check your options settings in Word 2010. You may be instructing Word to embed one or several entire fonts into the your document. This causes terrible document bloat especially if you are using Unicode fonts. Uncheck that option if it is checked and Word will embed only the characters that are actually used within your document.

You should also be aware that *.docx is a compressed file format that has to be decompressed before it can be converted to a PDF file which adds to its size.

If this does not work for you, there are several PDF optimizing tools that are available through Adobe and Nuance.

Hope this helps.

0

Thought: Word is converting the vector graphic into a bitmap or PNG and embedding it in the document with limited or no compression. Check the PDF settings and see if you can adjust that.

Analysis: One way to check that is to change the file extension of the Word file to .ZIP, and see for yourself what Word is doing!

Joshua
  • 4,362
  • 3
  • 24
  • 31
  • 2
    You can't tweak Word's PDF generation at all. You can only choose from "normal" and "web" quality, but that makes only a few kB difference. I'll have to check the vector to pixel conversion, that should be visible in very high zoom factors. – ygoe Dec 09 '11 at 10:41
  • 3
    Strange, when I zoom into the PDF document, I see rastered text and graphics for a very short time. It looks like a Word window screenshot, including ClearType-smoothed text in a low resolution. After that moment, the content is replaced by high-resolution vector drawings, for graphic and text. How can I look into the PDF document to find out whether there's a hidden pixel image inside that can be removed? – ygoe Dec 09 '11 at 20:01
  • I don't believe this is the reason either. I've noticed the same bloat with docs containing no images. – HappyNomad Nov 29 '12 at 19:48
  • @LonelyPixel: It probably takes some time for your PDF reader to re-render content on zooming in.. – tumchaaditya Mar 24 '14 at 20:01
0

This is because the formatting of the PDF document will contain styles for (probably) each character. I did something like this but into HTML and it generated a 20KB html file as a 600KB file.

0

Use software that is designed for a specific purpose. Word is good in creating word documents and because a lot of other software suits add the feature, MS can't leave it out. I don't really see why they would choose to spend a lot of time and effort optimizing something that most people don't even use or care much about. The people that do care, don't use word for PDF printing.

You should look into installing a dedicated PDF printer on your computer and use the PRINT function to create a PDF file. There are many free and commercial packages available that do a perfect job and keep your PDF file compressed to a minimum.

Asking WHY exactly Word creates such huge PDF files is something you better ask the MS engineers on their forums... only they can tell. Here you'll just get a lot of guesses as to why MS does things the way they do.

Jakke
  • 990
  • 1
  • 7
  • 17
  • 1
    I very much prefer built-in PDF export functionality, as that preserves clickable tables of content or embedded URLs and the like. (As such, the export in OpenOffice is great.) Some software might work well with PDF printing options *if provided by the OS*. Like [some browsers in OS X work flawlessly with the built-in PDF printing](http://superuser.com/questions/568/how-to-print-documents-to-pdf/744#744). – Arjan May 24 '14 at 11:04