22

I need remove duplicate lines from a text file, it is simple in Linux using

cat file.txt |sort | uniq

when file.txt contains

aaa
bbb
aaa
ccc

It will output

aaa
bbb
ccc

Is there a Windows equivalent? or how do this in a Windows way?

Yu Jiaao
  • 713
  • 1
  • 5
  • 13

4 Answers4

41

The Sort-Object cmdlet in PowerShell supports a -Unique switch that does the same thing as uniq:

Get-Content file.txt | Sort-Object -unique

Of course, owing to presence of aliases in PowerShell, you can also write:

type file.txt | sort -unique

Additionally, there is an undocumented /unique switch in sort.exe of Windows 10, so, this should work in Command Prompt:

type file.txt | sort /unique
Yu Jiaao
  • 713
  • 1
  • 5
  • 13
  • 2
    I don't think the Windows command (`sort.exe`) supports this; it looks like a feature of the PowerShell builtin. – Ben Voigt Apr 23 '18 at 04:11
  • 2
    type unsorted.txt | sort -unique > sorted.txt This really work under win10 and writed unique values to new file – Lixas Apr 23 '18 at 05:52
  • 7
    @BenVoigt surprisingly, `type file.txt | sort /unique` works with _undocumented_ switch `/unique` of `sort.exe` utility (at least on Windows 10). On the other side, you are right that provided example is PowerShell `Get-Content file.txt | Sort-Object -unique`, in fact. – JosefZ Apr 23 '18 at 05:57
  • @Lixas `type unsorted.txt | sort -unique` returns `-uniqueThe system cannot find the file specified` with errorlevel `1` if run from an open `cmd` prompt under Windows 10! – JosefZ Apr 23 '18 at 06:02
  • 3
    `sort /unique` errors with `Invalid switch.` on Windows 7 Enterprise. – Don Cruickshank Apr 23 '18 at 12:00
  • 1
    @JosefZ , the answer specifies the switch using "/" (forward-slash) and not dash; the forward-slash is Windows standard for commands in CMD, and not all commands allow substituting a dash for a slash on command switches. https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/windows-commands for a quick reference consistently shows slashes. The above was a great answer, sharing a tidbit not commonly known, though I can't imagine why the "/unique" switch is undocumented since it's so useful. – Debra Jan 07 '19 at 14:23
  • @Debra sure. Microsoft said in their [archived _Windows NT Command Shell_ article](https://docs.microsoft.com/en-us/previous-versions//cc723564(v=technet.10)): _command switches always begin with a slash `/` character… Occasionally, switches begin with a `+` or `-` character._ – JosefZ Jan 07 '19 at 18:51
  • Ummm, yes, thank you Microsoft for stating "always", well, except when not so. – Debra Jan 08 '19 at 15:36
  • well what it file it's for example over 1 gb ? – Cornea Valentin Jan 23 '20 at 23:44
  • 4
    @JosefZ Well, I opened a [bug](https://github.com/MicrosoftDocs/windowsserverdocs/issues/3858) for that, and now it *is* [documented](https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/sort) – Tsahi Asher Jan 09 '22 at 14:48
  • 1
    @TsahiAsher Great! Would be even better (discoverable) if `/?` would show it... – Michel de Ruiter Aug 24 '22 at 11:42
6

There's ports of uniq that work identically to the gnu/coreutils versions. I personally use the variation from GOW but git for windows has a significantly newer version. No cygwin required though for the latter you need to look in /usr/bin

Since these packages also contain cat, sort and uniq - your workflow should be mostly identical, and cat file.txt |sort | uniq should work mostly identically

Journeyman Geek
  • 127,463
  • 52
  • 260
  • 430
2

You can easily write the command "uniq" by yourself. Save this in a batch file "uniq.cmd" somewhere in your %path% can find it (e.g. in %windir%\system32). This version is NOT case sensitive:

@echo off
setlocal DisableDelayedExpansion
set "prev="
for /f "delims=" %%F in ('sort %*') do (
    rem "set" needs to be done without delayed expansion
    set "line=%%F"
    setlocal EnableDelayedExpansion
        set "line=!line:<=<!"
        if /i "!prev!" neq "!line!" echo(!line!
        set "prev=!line!"
    endlocal
)

This works with "uniq mytextfile" as well as "cat mytextfile | uniq"; as all input and arguments are simply passed to the sort command.

Starting with Windows 7, you may want a really case sensitive version (the difference ist undocumented switch "sort /C" and no "if /i"):

@echo off
setlocal DisableDelayedExpansion
set "prev="
for /f "delims=" %%F in ('sort /C %*') do (
    rem "set" needs to be done without delayed expansion
    set "line=%%F"
    setlocal EnableDelayedExpansion
        set "line=!line:<=<!"
        if "!prev!" neq "!line!" echo(!line!
        set "prev=!line!"
    endlocal
)
Tom Stein
  • 149
  • 5
  • Nice, but it has some flaws. It currently fails with content like `/?`, `ON`, `one ^ caret` or `bang!`. But that can be solved by using the [toggling delayed expansion technic](https://stackoverflow.com/a/31133848/463115) and `echo(` see: [Dostips: ECHO. FAILS to give text or blank line](https://www.dostips.com/forum/viewtopic.php?p=4554) – jeb Jan 14 '19 at 10:13
  • Thanks, the reason for using the toggling delayed expansion technic had not been obvious nor marked. I edited my examples to be (almost) perfect now. – Tom Stein Jan 17 '19 at 15:48
1

Addition to Yu Jiaao's answer. You can invoke the sort-object powershell cmdlet in a command prompt like:

type file.txt | powershell -nop "$input | sort -unique"
snipsnipsnip
  • 171
  • 1
  • 5