14

In AWS CLI, how do I upload a folder as a tar.gz file without creating a tar.gz locally?

For example, I have a folder at /var/test and I want to upload it to /tests/test1.tar.gz

How do I do that without turning it into a tar.gz locally? (I want to save local space, as I don't have much space on my HDD.)

Michael Samsung
  • 141
  • 1
  • 1
  • 3

2 Answers2

24

What you're really looking for is not saving a local file. You can use pipes to send the data from tar through gzip to s3 without saving anything to disk.

tar c /var/test | gzip | aws s3 cp - "s3://tests/test1.tar.gz"

Breaking this down (where stdin and stdout refer to the standard input/output streams via the pipeline):

  • tar c /var/test creates a tar archive out of /var/test and outputs it to stdout...
  • ...which is read by gzip from stdin, and the gzipped file (.tar.gz) is output to stdout...
  • ...which is read by aws s3 cp - "s3://tests/test1.tar.gz" from stdin and sent to S3. The - tells the AWS CLI to copy from stdin.

This still performs the gzip operation locally, but does not require the creation of a temporary file, since the entire stream is sent straight over the network.

Bob
  • 60,938
  • 25
  • 191
  • 216
  • Bob, this answer looks like it's correct for SSHing files to other servers, but doesn't seem to address the question of how to upload to S3. It's probably a reasonably simple extension for someone who understands the S3 command line tools to apply this technique. – Tim Sep 08 '17 at 00:59
  • @Tim ...somehow, I completely missed that. I'll update. – Bob Sep 08 '17 at 01:04
  • 1
    @Tim Fixed. Probably only looked at the AWS bit and assumed EC2 while half asleep last night. – Bob Sep 08 '17 at 01:11
  • 1
    A few questions about this solution: - will it work with directories too? - will the entire contents of the files be loaded in memory? Doesn't this give problems with large files? - is there any way to see progress? – murze Jun 10 '18 at 09:51
  • 1
    @murze (1) of course, that's the whole point of packaging, (2) no, (3) no, (4) no. – Liz Av Sep 12 '18 at 17:50
  • Will `tar -cz /var/test | aws s3 cp - "s3://tests/test1.tar.gz"` also work? I'm passing in `-z` to gzip during the tar command rather than piping to it. – neuquen Oct 29 '18 at 15:53
  • @Kevin I don't see why not. – Bob Oct 29 '18 at 22:30
  • You can get an estimate of progress using `pv` - though if transferring a directory, you'll need to estimate the size and provide it with `-s`, otherwise you'll see transfer rate and total transferred stats. – Attie Dec 17 '18 at 18:33
  • If you see a message like `[Errno 2] No such file or directory: /path/to/your/dir/-` it means your version of AWS CLI doesn't understand how to accept content from stdin, and you need to upgrade it. This happens for the stock apt version of awscli on Ubuntu 14.04. The [aws cli bundled version](https://docs.aws.amazon.com/cli/latest/userguide/install-bundle.html) works well on older systems. – Dale C. Anderson Jun 25 '19 at 18:52
  • When the stream goes over 50GB, this won't be accepted. "--expected-size" is needed. In my case, I am pushing large files. Not sure how I can supply a size when setting up that arg. `An error occurred (InvalidArgument) when calling the UploadPart operation: Part number must be an integer between 1 and 10000, inclusive` – macetw May 09 '22 at 20:32
  • @macetw You may be able to specify a guessed size somewhere north of 50 GB, if the command merely uses the "expected size" to calculate the part size and therefore number of parts. If that doesn't work, consider using `rclone` with its [`--s3-chunk-size`](https://rclone.org/s3/#s3-chunk-size) option which lets you more manually specify this, e.g. if you set a chunk size of 10 MB you'll have a file size limit of 100 GB (10 MB * 10000). I've used chunk sizes of 512 MB before to allow up to 5 TB files. Unfortunately with streaming uploads you do need to set this upper bound ahead of time. – Bob May 09 '22 at 23:51
  • Is there a reverse command that can download the .tar.gz from S3 and decompress without having an intermediate file? – Pablote Jun 05 '22 at 19:28
  • @Pablote Sure. With `s3`, just put the `-` in place of the output filename. With `rclone`, use `rclone cat`. – Bob Jun 09 '22 at 11:59
8
tar cvfz - /var/test | aws s3 cp - s3://tests/test1.tar.gz

You don't have to separately gzip; tar does that for you with the z option.

This works both in directions.

Robv
  • 81
  • 1
  • 1