0

I was trying to copy the content of a huge file (10GB) to another skipping the first line (as in head -n1). I tried multiple ways of head, tail, awk and sed. Settled on using tail -n+2 > ./xab.1

see link1 link2 link3. But the processing is taking a log time. More than that it takes to cp.

I just want to blindly copy the content, thatz all. So I think dd will do the job but I'm at loss implementing it. Any help?

Well, to give more context the file is CSV and so I think doing, dd if=/dev/zero of=/path/to/file bs=1 seek=1 count=<<length(head -n1 /path/to/file)>> conv=notrunc should work.

But how to make it working??

EDIT: So here is what I've come up with so far, (yes, I know i'm going to loose a few records. But that doesn't matter)

#!/bin/bash
echo "Initiating xaa." `date`
head -n3 /stage/csv/dev/data/csv_huge/xaa > /stage/csv/dev/data/csv_huge/csv/header
tail -n3 /stage/csv/dev/data/csv_huge/xbc > /stage/csv/dev/data/csv_huge/csv/trailer
sed -i '$ d' /stage/csv/dev/data/csv_huge/xaa
cat /stage/csv/dev/data/csv_huge/csv/trailer >> /stage/csv/dev/data/csv_huge/xaa
mv /stage/csv/dev/data/csv_huge/xaa /stage/csv/dev/data/csv_huge/csv/xaa
echo "Completed xaa." `date`
sed -i 1d /stage/csv/dev/data/csv_huge/xab
sed -i '$ d' /stage/csv/dev/data/csv_huge/xab
cat /stage/csv/dev/data/csv_huge/csv/header /stage/csv/dev/data/csv_huge/xab > /stage/csv/dev/data/csv_huge/csv/xab
cat /stage/csv/dev/data/csv_huge/csv/trailer >> /stage/csv/dev/data/csv_huge/csv/xab
rm -f /stage/csv/dev/data/csv_huge/xab
echo "Completed xab." `date`
sed -i 1d /stage/csv/dev/data/csv_huge/xbc
cat /stage/csv/dev/data/csv_huge/csv/header /stage/csv/dev/data/csv_huge/xbc > /stage/csv/dev/data/csv_huge/csv/xbc
echo "Completed xbc." `date`
Jimson James
  • 658
  • 2
  • 11
  • 20
  • this seems like a very dangerous use of `dd` (aka Destroy Disk). defining your `if` as /dev/zero also seems way off. also your count would be the length first line, not the length of the file without the head line. anyway, you should expect this to take a while, and TBH, there is no compelling reason not to just cp the file and then delete the top line. I have to assume you are doing all this for non-practical reasons. – Frank Thomas Jan 29 '15 at 17:04
  • @FrankThomas I definitely in favor of "just cp the file and then delete the top line". But how to do that for a 10GB file is problem for me. Any idea coming to mind? – Jimson James Jan 29 '15 at 17:26
  • You've already rejected some of the most popular ideas as too slow, but there are a couple otheres here: http://superuser.com/questions/284258/remove-first-line-in-bash – Frank Thomas Jan 29 '15 at 17:51
  • http://superuser.com/questions/284258/remove-first-line-in-bash - added answer how to use `dd` – Hannu Jan 29 '15 at 18:53
  • @FrankThomas that was cool. Please see the question edited with what i have come up with based on that comment. Let me know what you guys think. – Jimson James Jan 29 '15 at 19:32
  • @Hannu Will try the `dd` approach for sure. – Jimson James Jan 29 '15 at 19:32
  • For Gigabytes of data you will be filling driver buffers quite soon, the read/write speeds of the drive(s) involved will have a limiting effect regardless of file buffering (i.e. `bs=` for `dd`). – Hannu Jan 29 '15 at 20:02
  • @Hannu thinking of doing a comparison between these two approaches though. – Jimson James Jan 29 '15 at 21:19
  • To emphasize the above, having the knowledge from doing some geek'y drive speed testing; I'd say that you *may* end up with differing results depending on which drive(s) you use. Drive technology, buffers, hardware driver, filesystem and system overhead adds complexity to the task, making the results hard to predict. – Hannu Jan 30 '15 at 22:00

0 Answers0