2

I'm using Ansible to set up some configuration on several nodes, and as part of this setup I need to split one big file by n lines and copy each part to a remote file without creating local copy of each chunk (like bash split command does). Ansible can't do this by default (or I just didn't find out how to do it yet), so I decided to use GNU Parallel. I found out here that copying from stdin may be easily done like this:

~$ echo "Lots of data" | ssh user@example.com 'cat > big.txt'

But I want to do this simultaneously to several hosts! So, here is an example input:

~$ cat hosts.txt
1.1.1.1
2.2.2.2
3.3.3.3

~$ cat data.txt
lots
of
...
lines

I calculate number of lines per node by doing "wc -l" and dividing second number by first. So, basically, next step would be something like this:

~$ cat data.txt | parallel -S `cat hosts.txt | tr "\n" ","` -N $LINES_PER_HOST --pipe "ssh $HOST 'cat > /data/piece.txt'"

but how can I launch one command for each host, what should I replace $HOST with? I thought about combining two inputs (one being hosts), but still no idea how to do it.

Would really appreciate any thoughts.

Enchantner
  • 123
  • 5

1 Answers1

1

Works from version 20150922:

parallel-20150922 -a bigfile --roundrobin  --pipepart --slf hosts.txt -j1 'cat > giraf'
Ole Tange
  • 4,529
  • 2
  • 34
  • 51
  • Great, thanks! I only see one limitation - I can only split file by blocks, not by lines. Will it always process line endings correctly in this case? I don't really care if one node gets a little more lines than the other, but I need lines to be complete and readable. – Enchantner Jun 27 '16 at 18:09
  • It splits on \n, so you should be safe. – Ole Tange Jun 28 '16 at 06:34
  • I launched it on file with 10000 lines and 2 nodes, using block size a little larger that 'du -b' output divided by two. One node got ~4850 lines, the other one ~4900, the rest is lost. Is there any way to ensure all lines are copied? Or I should calculate block size in some other way? – Enchantner Jun 29 '16 at 09:34
  • You should never lose lines. Can you post the exact command you wrote? If you run the command I give, --block is not needed. – Ole Tange Jun 29 '16 at 14:41