1

I have two xls files of 8MB and 10MB. I have to merge them and remove duplicate rows. (Both files have unique rows but after merging there will be duplicate rows).

I have tried merging the files, but the two files won't merge because of large size.

Is there any method in Ubuntu to remove the duplicates from the files, considering my situation?

Note :- If without merging the file the duplication can be removed, it would also be acceptable.

My attempt after the suggestions :- I converted both the file in csv. Then I copied the one file to another, then removed the duplication using advanced filter. Then I saved the new(combined) csv to xls format. But when I reopened the new(combined) xls file, it won't show all the data. It showed only 60% of the data.

The new(combined) csv file is 24MB and when I am saving it as xls file, the xls file size is 11MB

vidal
  • 111
  • 2
  • Export them to csv, use perl or awk to get unique lines (http://stackoverflow.com/a/11532197/2072269, http://unix.stackexchange.com/a/11941/70524), convert back to xls. – muru Feb 29 '16 at 14:10
  • Sorry but I would do that in Excel. Heck Libreoffice if need be. Both have a "remove duplicates" method. – Rinzwind Feb 29 '16 at 14:38
  • @muru... I converted both the file in csv. Then I copied the one file to another, then removed the duplication using advanced filter. Then I saved the new(combined) csv to xls format. But when I reopened the new(combined) file, it won't show all the data. It showed only 60% of the data. – vidal Mar 01 '16 at 11:26
  • @vidal why would you combine after converting to CSV? And how many unique entries do you expect to find? – muru Mar 01 '16 at 11:37
  • @muru .. That's a typo. I updated it in question. I combined the files in CSV, after that I converted the combined CSV into XLS. And 90% of the queries are unique. – vidal Mar 01 '16 at 11:42

1 Answers1

2

Libreoffice: Data → Filter → Advanced Filter → Options → Duplicate rows disabled

You can copy contents of both files into a sheet, remove the duplicates and create 2 new files if you want. That would need some kind of marker in the sheet so you can see file 2 started.

No special magic needed.

enter image description here

Rinzwind
  • 293,910
  • 41
  • 570
  • 710