177

Examining the output from

perl -e 'use Term::ANSIColor; print color "white"; print "ABC\n"; print color "reset";'

in a text editor (e.g., vi) shows the following:

^[[37mABC
^[[0m

How would one remove the ANSI color codes from the output file? I suppose the best way would be to pipe the output through a stream editor of sorts.

The following does not work

perl -e 'use Term::ANSIColor; print color "white"; print "ABC\n"; print color "reset";' | perl -pe 's/\^\[\[37m//g' | perl -pe 's/\^\[\[0m//g'
user001
  • 3,474
  • 7
  • 24
  • 32
  • Not an answer to the question, but you can also pipe the output to `more` or `less -R` which can interpret the escape codes as color instead of a text editor. – terdon Jul 03 '13 at 13:50

18 Answers18

247

The characters ^[[37m and ^[[0m are part of the ANSI escape sequences (CSI codes).  See also these specifications.

Using GNU sed

sed -e 's/\x1b\[[0-9;]*m//g'
  • \x1b (or \x1B) is the escape special character
    (GNU sed does not support alternatives \e and \033)
  • \[ is the second character of the escape sequence
  • [0-9;]* is the color value(s) regex
  • m is the last character of the escape sequence

Using the macOS default sed

Mike suggests:

sed -e $'s/\x1b\[[0-9;]*m//g'

The macOS default sed does not support special characters like \e as pointed out by slm and steamer25 in the comments.

To install gsed.

brew install gnu-sed

Example with OP's command line

(OP means Original Poster)

perl -e 'use Term::ANSIColor; print color "white"; print "ABC\n"; print color "reset";' | 
      sed 's/\x1b\[[0-9;]*m//g'

Improvements

Flag -e is optional for GNU sed but required for the macOS default sed:

sed 's/\x1b\[[0-9;]*m//g'           # Remove color sequences only

Tom Hale suggests to also remove all other escape sequences using [a-zA-Z] instead of just the letter m specific to the graphics mode escape sequence (color):

sed 's/\x1b\[[0-9;]*[a-zA-Z]//g'    # Remove all escape sequences

But [a-zA-Z] may be too wide and could remove too much. Michał Faleński and Miguel Mota propose to remove only some escape sequences using [mGKH] and [mGKF] respectively.

sed 's/\x1b\[[0-9;]*[mGKH]//g'      # Remove color and move sequences
sed 's/\x1b\[[0-9;]*[mGKF]//g'      # Remove color and move sequences
sed 's/\x1b\[[0-9;]*[mGKHF]//g'     # Remove all
Last escape
sequence
character   Purpose
---------   -------------------------------
m           Graphics Rendition Mode (including color)
G           Horizontal cursor move
K           Horizontal deletion
H           New cursor position
F           Move cursor to previous n lines

Britton Kerin indicates K (in addition to m) removes the colors from gcc error/warning. Do not forget to redirect gcc 2>&1 | sed....

Using perl

The version of sed installed on some operating systems may be limited (e.g. macOS). The command perl has the advantage of being generally easier to install/update on more operating systems. Adam Katz suggests to use \e (same as \x1b) in PCRE.

Choose your regex depending on how much commands you want to filter:

perl -pe 's/\e\[[0-9;]*m//g'          # Remove colors only
perl -pe 's/\e\[[0-9;]*[mG]//g'
perl -pe 's/\e\[[0-9;]*[mGKH]//g'
perl -pe 's/\e\[[0-9;]*[a-zA-Z]//g'
perl -pe 's/\e\[[0-9;]*m(?:\e\[K)?//g' # Adam Katz's trick

Example with OP's command line:

perl -e 'use Term::ANSIColor; print color "white"; print "ABC\n"; print color "reset";' \
      | perl -pe 's/\e\[[0-9;]*m//g'

Usage

As pointed out by Stuart Cardall's comment, this sed command line is used by the project Ultimate Nginx Bad Bot (1000 stars) to clean up the email report ;-)

oHo
  • 3,093
  • 2
  • 17
  • 13
  • 3
    Thanks for the `sed` command and the explanation. :) – Redsandro Feb 05 '13 at 14:15
  • 4
    Some color codes (e.g. Linux terminal) contain a prefix, e.g. `1;31m` so better add `;` to your regex: `cat colored.log | sed -r 's/\x1b\[[0-9;]*m//g'` or they won't be stripped. – Redsandro Mar 03 '14 at 13:11
  • 1
    this is great used it in https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/blob/master/update-ngxblocker to clean up the email report. – Stuart Cardall Jun 07 '17 at 18:59
  • 1
    In your perl example you have a command that filters out the colours. But what are the other commands? They additionally filter out the `mG` and `mGKH` and then `a-zA-Z`, can you add a comment next to each one? – CMCDragonkai Nov 05 '18 at 03:10
  • 1
    Relevant to when you're not just removing the codes but rather observing them: `grep` appends `\x1b[K` (erase to end of line) to all color codes, so I prefer the perl/PCRE regex `\e\[[0-9;]*m(?:\e\[K)?` (in perl/PCRE _but not sed_, `\e` is the same as `\x1b`) – Adam Katz Dec 29 '18 at 22:37
  • 2
    Keep in mind that the OSX version of `sed` didn't work w/ the example shown, the `gsed` version however does. – slm Mar 01 '19 at 21:50
  • 3
    More context for slm's comment about OSX sed: it doesn't support control characters like \x1b. E.g., https://stackoverflow.com/a/14881851/93345 . You can get the gsed command via `brew install gnu-sed` . – steamer25 May 07 '19 at 15:48
  • Thank you @AdamKatz for your comment. I have just edited the answer. Is it OK for you? Have fun – oHo Aug 28 '19 at 00:36
  • Sure. See also [my answer](https://superuser.com/a/1388860/300293) below for more detail and instructions to remove _every_ escape sequence (and, optionally, some other non-printing sequences) rather than just colors. Another note: I'm not sure I've seen a version of sed that accepts `\x1b` but not `\033` – Adam Katz Aug 28 '19 at 15:08
  • 3
    On mac `sed -e $'s/\x1b\[[0-9;]*m//g'` works without gsed @slm @steamer25 – Mike Mar 11 '20 at 13:25
  • 1
    (OP means Original Poster) <--- + 1 !!!! hahahah I'm a >2K and I'm still wondering wth it was. I always thougth it was "original petition" Thanks @olibre !!! – Alejandro Teixeira Muñoz Apr 14 '20 at 12:38
  • Even with all of the comments over 10 years, this isn't right. An ECMA-48 CSI control sequence is CSI P...P I..I F. The P(arameter) characters can range from `\x30` to `\x3F`. The I(ntermediate) characters can range from `\x20` to `\x2F`. The F(inal) character is in the range `\x40` to `\x7E`. – JdeBP May 29 '23 at 22:29
36

I have found out a better escape sequence remover if you're using MacOS. Check this:

perl -pe 's/\x1b\[[0-9;]*[mG]//g'

JohnnyLambada
  • 361
  • 1
  • 5
  • 17
user204331
  • 361
  • 3
  • 2
20

ansi2txt

https://unix.stackexchange.com/a/527259/116915

cat typescript | ansi2txt | col -b
  • ansi2txt: remove ANSI color codes
  • col -b: remove ^H or ^M


update: about col handle tabs and space //mentioned by @DanielF

〇. about col handle spaces and tabs

col -bx replace '\t' to ' ', col -bh replace ' ' to '\t'.

// seems col can't keep space/tabs as it is, it's a pity.


0. orig string

$ echo -e '        ff\tww' | hd
00000000  20 20 20 20 20 20 20 20  66 66 09 77 77 0a        |        ff.ww.|

1. -h repace spaces to tab

$ echo -e '        ff\tww' | col -b | hd
00000000  09 66 66 09 77 77 0a                              |.ff.ww.|
$ echo -e '        ff\tww' | col -bh | hd
00000000  09 66 66 09 77 77 0a                              |.ff.ww.|
$ echo -e '        ff\tww' | col -bxh | hd
00000000  09 66 66 09 77 77 0a                              |.ff.ww.|

2. -x repace tab to spaces

$ echo -e '        ff\tww' | col -bx | hd
00000000  20 20 20 20 20 20 20 20  66 66 20 20 20 20 20 20  |        ff      |
00000010  77 77 0a                                          |ww.|
$ echo -e '        ff\tww' | col -bhx | hd
00000000  20 20 20 20 20 20 20 20  66 66 20 20 20 20 20 20  |        ff      |
00000010  77 77 0a                                          |ww.|

3. seems col can't keep spaces and tabs as it is.

yurenchen
  • 375
  • 2
  • 9
12

What is displayed as ^[ is not ^ and [; it is the ASCII ESC character, produced by Esc or Ctrl[ (the ^ notation means the Ctrl key).

ESC is 0x1B hexadecimal or 033 octal, so you have to use \x1B or \033 in your regexes:

perl -pe 's/\033\[37m//g; s/\033[0m//g'

perl -pe 's/\033\[\d*(;\d*)*m//g'
user001
  • 3,474
  • 7
  • 24
  • 32
u1686_grawity
  • 426,297
  • 64
  • 894
  • 966
12

If you prefer something simple, you could use my strip-ansi-cli package (Node.js required):

$ npm install --global strip-ansi-cli

Then use it like this:

$ strip-ansi < colors.o

Or just pass in a string:

$ strip-ansi '^[[37mABC^[[0m'
Sindre Sorhus
  • 454
  • 3
  • 7
  • 17
11

A more thorough removal for ANSI escape sequences (not 100% comprehensive; see below):

perl -pe '
  s/\e\[[\x30-\x3f]*[\x20-\x2f]*[\x40-\x7e]//g;
  s/\e[PX^_].*?\e\\//g;
  s/\e\][^\a]*(?:\a|\e\\)//g;
  s/\e[\[\]A-Z\\^_@]//g;'

(Please note that perl, like many other languages (but not sed), accepts \e as the escape character Esc, \x1b or \033 by code, shown in terminals as ^[. I'm using it here because it seems more intuitive.)

This perl command, which you can run all on one line if you prefer, has four replacements in it:

The first goes after CSI sequences (escape code sequences that begin with the "Control Sequence Introducer" of Esc[, which covers a lot more than the Select Graphic Rendition sequences that make up the color codes and other text decorations).

The second replacement removes the remaining sequences that involve trailing characters and terminate with ST (the String Terminator, Esc</kbd>). The third replacement is the same thing but also allows Operating System Command sequences to end with a BEL (\x07, \007, often \a).

The fourth replacement removes the remaining escapes.

Also consider removing other zero-width ASCII characters such as BEL and other more obscure C0 and C1 control characters. I've been using s/[\x00-\x1f\x7f-\x9f\xad]+//g, which also includes Delete and Soft Hyphen. This excludes Unicode's higher coded zero-width characters but I believe it's exhaustive for ASCII (Unicode \x00-\xff). If you do this, remove these last since they can be involved in longer sequences.

Adam Katz
  • 340
  • 2
  • 12
  • 1
    Very nice. This is actually the only one in this thread that successfully parsed a raw terminal log generated from sudossh2 without leaving any residual/partial sequences that seem to common in `PS1` bash prompts, etc. – Kevin Jun 21 '21 at 00:00
  • 1
    It isn't authoritative, though. Read actual ECMA-35 and ECMA-48, not Wikipedia. CSI can come in as the actual C1 character, not just its escape sequence 7-bit alias. And it can potentially be UTF-8 encoded, too, in several modern terminal emulators. The same with OSC and ST. And some C0 characters either cancel the sequence, restart a new sequence, or even _take effect_ in the middle of control sequences. – JdeBP May 29 '23 at 22:44
  • Thanks @JdeBP, I've added a note that it's not 100% comprehensive. – Adam Katz Jul 27 '23 at 18:19
8

commandlinefu gives this answer which strips ANSI colours as well as movement commands:

sed "s,\x1B\[[0-9;]*[a-zA-Z],,g"

For just colours, you want:

 sed "s,\x1B\[[0-9;]*m,,g"
Tom Hale
  • 2,274
  • 2
  • 24
  • 35
5

There's also a dedicated tool for the job: ansifilter. Use the default --text output format.

ref: https://stackoverflow.com/a/6534712

Juan
  • 519
  • 6
  • 7
2

"tput sgr0" left this control character ^(B^[
Here is a modified version to take care of that.

perl -pe 's/\e[\[\(][0-9;]*[mGKFB]//g' logfile.log
  • Thanks for this... this worked for me to get rid of that `tput sgr0` that the other solutions never seem to be able to get rid of. – TxAG98 Jul 26 '19 at 22:28
2

Combining @Adam-Katz @Mike answers I get:

sed -E $'s|\x1b\\[[0-\\?]*[ -/]*[@-~]||g;
         s|\x1b[PX^_][^\x1b]*\x1b\\\\||g;
         s:\x1b\\][^\x07]*(\x07|\x1b\\\\)::g;
         s|\x1b[@-_]||g'

This should work on macos, linux, and mingw64x (Git for Windows)

Note: On super old GNU sed (pre 4.2), the -E flag needs to be replaced with -r (like CentOS 6.0 old)

Explanation of regexs

1st: An ANSI CSI Code consists of (in order)

  1. One \x1b
  2. One [
  3. Zero or more parameter bytes 0x30-0x3f
  4. Zero or more intermediate bytes 0x20-0x2f
  5. One final byte 0x40-0x7f

2nd and 3rd: I'm unfamiliar with with in practice, but have read about them in the linked page.

4th: Just a catch all to get all remaining escape codes, assuming there are zero extra bytes. As these codes could do anything they want, it's possible data bytes get left behind, but extremely unlikely as they aren't used much in practice.

Andy
  • 148
  • 4
2

The "answered" question didn't work for me, so I created this regex instead to remove the escape sequences produced by the perl Term::ANSIColor module.

cat colors.o | perl -pe 's/\x1b\[[^m]+m//g;

Grawity's regex should work fine, but using +'s appears to work ok too.

  • 4
    (1) What do you mean by `The "answered" question`?  Do you mean the accepted answer?  (2) This command does not work — it does not even execute — because it has an unmatched (unbalanced) quote.  (3) This a useless use of `cat` ([UUOC](http://en.wikipedia.org/wiki/cat_%28unix%29#useless_use_of_cat)) — it should be possible to do `perl -pe ` *`command `* `colors.o`.  (4) Who ever said anything about the codes being in a `.o` file? – Scott - Слава Україні Feb 11 '16 at 05:35
1

Python port of Adam Katz's excellent and comprehensive perl answer:

    def escape_ansi(line):
        re1 = re.compile(r'\x1b\[[\x30-\x3f]*[\x20-\x2f]*[\x40-\x7e]')
        re2 = re.compile(r'\x1b[PX^_].*?\x1b\\')
        re3 = re.compile(r'\x1b\][^\a]*(?:\a|\x1b\\)')
        re4 = re.compile(r'\x1b[\[\]A-Z\\^_@]')
        # re5: zero-width ASCII characters
        # see https://superuser.com/a/1388860
        re5 = re.compile(r'[\x00-\x1f\x7f-\x9f\xad]+')

        for r in [re1, re2, re3, re4, re5]:
            line = r.sub('', line)

        return line

This includes the C0/C1 sequence removal, so remove that if you don't need it. I realize this is not optimized since it's multiple regex passes, but it did the trick for me and optimization wasn't a concern for me.

Kevin
  • 111
  • 4
  • Just to clarify, the only thing actually changed here from the referenced submission is replacing the shorthand `\e` which python's `re` module doesn't seen to know about, with the long form `\xb1`. – Kevin Jun 21 '21 at 00:16
0

This is what worked for me (tested on Mac OS X)

perl -pe 's/\[[0-9;]*[mGKF]//g'
Miguel Mota
  • 111
  • 3
0

I've had to look this up too many times, so I decided to make a free online tool for it. No need to remember sed commands for this!

Hope it works well for you, too: https://maxschmitt.me/ansistrip/

Macks
  • 111
  • 4
0

This simple awk solution worked for me, try this:

str="happy $(tput setaf 1)new$(tput sgr0) year!"; #colored text
echo $str | awk '{gsub("(.\\[[0-9]+m|.\\(..\\[m)","",$0)}1'; #remove ansi colors
0

Also consider using the colorstrip function from this module.

colorstrip(STRING[, STRING ...]) colorstrip() removes all color escape sequences from the provided strings, returning the modified strings separately in array context or joined together in scalar context. Its arguments are not modified.

Elvin
  • 21
  • 3
0

I know there have been many answers on how to deal with the situation once you have files with those characters in it. @oHo answer help me a lot with those.

Problem:

cat sometext.txt > ansi_codes_in_file.txt

In case anyone else have the same root cause issue, where cat outputs properly to the STDOUT (with colors) but it writes the ANSI Color Codes to file and you want to avoid that completely, this is what worked for me:

I had to review my .bashrc and .bash_profile files and found that my .bash_profile had the following line:

export GREP_OPTIONS='--color=always'

After seeing this answer: Different results in grep results when using --color=always option

Never use --color=always, unless you know the output is expected to contain ANSI escape sequences - typically, for human eyeballs on a terminal.

If you're not sure how the input is processed, use --color=auto, which - I believe - causes grep to apply coloring only if its stdout is connected to a terminal.

It was clear that I needed to change that in my .bash_profile to:

export GREP_OPTIONS='--color=auto'

After updating my .bash_profile and loading the config into my terminal (source ~/.bash_profile) doing the following works without ANSI Codes in the output file

cat sometext.txt > no_ansi_codes_in_file.txt

Note:

  • In my case, I didn't have any alias set to my cat command
0

I had similar problem with removing characters added from collecting interactive top output via putty and this helped:

cat putty1.log | perl -pe 's/\x1b.*?[mGKH]//g'