file splitting and compression in pipeline

Question

So I have a massive file something like this...

1/20/2016,somerandomdata
1/20/2016,somerandomdata
1/20/2016,somerandomdata
1/20/2016,somerandomdata
1/21/2016,somerandomdata
1/21/2016,somerandomdata
1/21/2016,somerandomdata
1/21/2016,somerandomdata
1/22/2016,somerandomdata
1/22/2016,somerandomdata
1/22/2016,somerandomdata
1/22/2016,somerandomdata

And I want to split it into a bunch of smaller files based on the first column. Easy: use awk like this:

awk -F '[,/]' '{print > filename$1$2$3".dat"}'

Here's the catch: I want the output files to be compressed. So, I could go ahead and do this after the fact...

find . -name "filename*.dat" | xargs -l xz

The problem with that is that I want the xz to be in the pipeline instead of after the data is split. Something like this:

curl "url" | grep "blah" | xz -c > filename.dat.xz

Of course, this doesn't actually split the file.

The reason I want it in the pipeline is because I am downloading the data and want to run compression at the same time as downloading instead of after. (I'm pretty sure this would make things go faster, but if I'm wrong, correct me)

So, my goal is something like....

curl "url" | grep "blah" | awk -F '[,/]' '{print > filename$1$2$3".dat"}' | xz -c > filename.dat.xz

But not, because that will obviously not work

If you have a better solution to my problem or if you think I'm doing something completely stupid, I'm flexible.

and you want all the output files to be compressed back into one file? I doubt this can be done in the pipeline. — gogoud, Jan 23 '16 at 09:03
I would like each file to be individually compressed into separate files — Jay, Jan 23 '16 at 09:05
I think awk can do pipes itself, see this A for inspiration http://superuser.com/a/485602/307834 — Xen2050, Jan 23 '16 at 09:05

Xen2050 · Accepted Answer · 2016-01-23T22:47:23.483

2

awk can do pipes "natively" itself, just like the redirections in the example. I'm not an awk quoting expert, but this matches your example & is reported to work A-OK:

awk -F '[,/]' '{print | "xz -c >" filename$1$2$3".dat.xz"}'

edited Jan 23 '16 at 22:47

answered Jan 23 '16 at 09:06

Xen2050

13,643
4
24
42

Nope, doesn't work... `cat exfile | grep "$dates" | awk -F '[,/]' '{print > "filename"$1$2$3".dat"}'` works and outputs multiple uncompressed files, but `cat exfile | grep "$dates" | awk -F '[,/]' '{print | xz -c > "filename"$1$2$3".dat.xz"}'` gives the a `syntax error` at the `>` – Jay Jan 23 '16 at 09:21
Ah, but `awk -F '[,/]' '{print | "xz -c >" "filename"$1$2$3".dat.xz"}'` does work. – Jay Jan 23 '16 at 09:26
If you update your answer, I'll mark it as correct – Jay Jan 23 '16 at 09:27

file splitting and compression in pipeline

1 Answers1