90

I am writing a VPN system which encrypts (AES256) its traffic across the net (Why write my own when there are 1,000,001 others already out there? Well, mine is a special one for a specific task that none of the others fit).

Basically I want to run my thinking past you to make sure I'm doing this in the right order.

At the moment packets are just encrypted before being sent out, but I want to add some level of compression to them to optimize the tranfer of data a little. Not heavy compression - I don't want to max out the CPU all the time, but I want to make sure the compression is going to be as efficient as possible.

So, my thinking is, I should compress the packets before encrypting as an unencrypted packet will compress better than an encrypted one? Or the other way around?

I will probably be using zlib for the compression.

Read more on the Super User blog.

Moshe Katz
  • 3,193
  • 3
  • 21
  • 42
Majenko
  • 32,128
  • 4
  • 61
  • 81
  • 5
    Writing as "programming"? Would be better suited for Stack Overflow then. – Suma Mar 15 '11 at 14:07
  • 5
    If I were asking about the programming of it, yes, but I'm not. This is a general compress then encrypt or encrypt then compress question which could apply to just working with plain files if you wanted. The programming side is just context for why I am asking the question. – Majenko Mar 15 '11 at 14:08
  • See also: http://stackoverflow.com/questions/4676095 http://stackoverflow.com/questions/4399812 – BlueRaja - Danny Pflughoeft Mar 15 '11 at 19:56
  • Probably a question best meant for http://security.stackexchange.com/ – Jeff Ferland Mar 16 '11 at 14:32
  • 1
    They know about compression there do they? – Majenko Mar 16 '11 at 14:59
  • @Majenko - They know about encryption, and most of them would know the answer is compress then encrypt. Of course they'd ask the question why you're using a block cipher instead of a stream cipher and point out that this will come at a price of speed (and that you should reconsider unless you already thought about it), and that maybe an elliptic curve cipher (http://eprints.usm.my/9413/1/ECSC-128_New_Stream_Cipher_Based_on_Elliptic_Curve_Discrete_Logarithm_Problem.pdf) would better suit. But I digress. – Everett Oct 09 '12 at 04:35
  • @JeffFerland, http://crypto.stackexchange.com – Pacerier May 18 '15 at 18:02
  • @Pacerier: Crypto.SE didn't exist at the time this question was asked. – Jeff Ferland May 18 '15 at 19:41

7 Answers7

182

If the encryption is done properly then the result is basically random data. Most compression schemes work by finding patterns in your data that can be in some way factored out, and thanks to the encryption now there are none; the data is completely incompressible.

Compress before you encrypt.

Pacerier
  • 26,733
  • 82
  • 197
  • 273
Mr Alpha
  • 6,668
  • 2
  • 24
  • 26
  • 44
    More important: compression adds entropy. Adding entropy is good for your encryption (harder to break with known-plaintext attacks). – Olli Mar 15 '11 at 10:52
  • 9
    Also, encrypting costs resources, encrypting a smaller file will take less resources. So compress before encrypt. – GAThrawn Mar 15 '11 at 16:23
  • Aren't, conceptually, encryption and compression the same thing? Or rather, if encryption is done properly, (and compression is impossible) then you've really ended up compressing the data. (I guess it depends on one's definition of 'properly') – Mitch Mar 15 '11 at 16:38
  • 1
    No. Compression reduces the file size and can be undone by anyone with the decompression program. Encryption changes the content so that it can only be read by someone with the decryption key - the file size may stay the same, or maybe grow or shrink. – Majenko Mar 15 '11 at 17:17
  • 10
    @Olli - not necessarily if the compression scheme adds known text. In the worst case imagine if it put a known 512byte header on the front of the data and you were using a block mode encryption. – Martin Beckett Mar 15 '11 at 17:25
  • @Martin: yes, that's true, it's not always good idea, I assumed "when doing it properly". – Olli Mar 15 '11 at 17:29
  • 28
    I'm not sure why @Olli's comment would get upvoted, as it is incorrect; not only is it significantly *less* important, for any half-decent encryption it should be *not important at all*. That is, the strength of the encryption should be completely unrelated to the entropy of the message. – BlueRaja - Danny Pflughoeft Mar 15 '11 at 19:51
  • 8
    If you compress at all, it can only really be done before encrypting the message, but bear in mind, this may leak information about 'compressability' of the original message, so you'll want to consider if there are any consequences to this side channel. Consider a fixed sized file that is either all 0s or a message. The all 0 file will result in a smaller payload under any reasonable compression scheme. Not likely an issue in this particular use case though. – Edward Kmett Mar 15 '11 at 20:00
  • 5
    @Olli: Compression doesn't add entropy. But it does reduce non-entropy. – user46971 Mar 16 '11 at 00:10
  • 5
    @Olli, Your [orange comment](https://archive.is/VYsIH) there is going to mislead alot of people. It's better to delete it. – Pacerier May 18 '15 at 18:09
  • @Olli, replace "entropy" with "obfuscation" and you may have something :). – galaxis Oct 17 '17 at 13:20
22

Compress before encryption. Compressed data can vary considerably for small changes in the source data, therefore making it very difficult to perform differential cryptanalysis.

Also, as Mr.Alpha points out, if you encrypt first, the result is very difficult to compress.

Juancho
  • 2,612
  • 19
  • 14
  • 12
    Well, this is correct, but was posted 2 hours before you posted... [Entropy](http://en.wikipedia.org/wiki/Entropy_(information_theory)) – Konerak Mar 15 '11 at 16:43
3

Even if it depends on the specific use-case, I would advise Encrypt-then-Compress. Otherwise an attacker could leak information from the number of encrypted blocks.

We assume a user sending a message to the server and an attacker with the possibility to append text to the user message before sending (via javascript e.g.). The user wants to send some sensible data to the server and the attacker wants to get this data. So he can try to append different messages to the data the user sends to the server. Then the user compresses his message and the appended text from the attacker. We assume a DEFLATE LZ77 compression, so the function replaces same information with a pointer to first appearance. So if the attacker can reproduce the hole plaintext, the compression-function reduces the size of the plain text to the original size and a pointer. And after the encryption, the attacker can count the number of cipher blocks, so he can see, if his appended data were the same as the data the user sent to the server. Even if this case sounds a little bit constructed, it is a serious security issue in TLS. This idea is used by an attack called CRIME to leak cookies in a TLS connection to steal sessions.

source: http://www.ekoparty.org/archive/2012/CRIME_ekoparty2012.pdf

2

My view is that when you compress a message you project it to a lower dimension and therefor there are fewer bits, which means that the compressed message (assuming lossless compressioon) has the same information in fewer bits (the ones you got rid were redundant!) So you have more information per bit and consequently more entropy per bit, but the same total entropy as you had before when the message was not compressed. Now, randomness is another matter and that is where the patterns in compression can throw a monkey wrench.

Prof
  • 21
  • 1
1

Compression should be done before encryption. a user doesn't wants to spend time waiting for the transfer of data , but he/she needs it to be immediately done without wasting any time.

sqlchild
  • 129
  • 1
  • 7
1

Compression before encryption as has been pointed out earlier. Compression looks for structure it can compress. Encryption scrambles the data so as to avoid structure being detected. By compressing first you're much more likely to have a smaller file and thus less payload to transfer. Encryption is going to do it's job regardless if it's compressed or not and, again as pointed out earlier, is likely to be more difficult to perform differential cryptanalysis on a compressed file.

  • This appears to be a repeat of the accepted and second answers. Each answer should contribute a substantively new solution to the question. – fixer1234 Jun 19 '15 at 20:24
0

Compression reduces information entropy. Maximum compression makes entropy minimum. For a perfectly encrypted data (noise) maximum and minimum entropy is the same.

AbiusX
  • 126
  • 4
  • 2
    Wait, don't you have that backwards? I thought entropy increased as redundancy decreased. Therefore compression should increase entropy. – Zan Lynx Mar 16 '11 at 20:06
  • Nop, less entropy = more patterns. Randomness has most entropy. – AbiusX Mar 16 '11 at 20:11
  • 1
    But it is *information* entropy so it is all about meaning. Randomness doesn't mean anything so it doesn't apply. An English sentence can have letters changed and still mean the same thing so it has low entropy. A compressed English sentence might be unreadable if a single bit changes so it has the most. Or so I think. – Zan Lynx Mar 16 '11 at 20:30
  • Entropy is not about sense and ability to read or understand, its all about patterns. Compressed files are full of patterns. – AbiusX Mar 16 '11 at 22:45
  • 1
    @AbiusX: Right. Patterns. And the fewer patterns, the more entropy. Which means that compression which replaces all repeated patterns with a single copy increases entropy. – Zan Lynx Mar 16 '11 at 23:58
  • no its not about quantity. Lots of patterns is not good. Quantity increases entropy. Its all about quality. – AbiusX Mar 17 '11 at 00:41