7

I encountered a problem with variable substitution in the BASH shell.
Say you define a variable a. Then the command

    $> echo ${a//[0-4]/}

prints its value with all the numbers ranged between 0 and 4 removed:

    $> a="Hello1265-3World"
    $> echo ${a//[0-4]/}
    Hello65-World

This seems to work just fine, but let's take a look at the next example:

    $> b="你1265-3好"
    $> echo ${b//[0-4]/}
    你1265-3好

Substitution did not take place: I assume that is because b contains CJK characters. This issue extends to all cases in which square brackets are involved. Surprisingly enough, variable substitution without square brackets works fine in both cases:

    $> a="Hello1265-3World"
    $> echo ${a//2/}
    Hello165-3World
    $> b="你1265-3好"
    $> echo ${b//2/}
    你165-3好

Is it a bug or am I missing something?

I use Lubuntu 12.04, terminal is lxterminal and echo $BASH_VERSION returns 4.2.24(1)-release.

EDIT: Andrew Johnson in his comment stated that with gnome-terminal 4.2.37(1)-release the command works fine. I wonder whether it is a problem of lxterminal or of its specific 4.2.24(1)-release version.

EDIT: I tried it with gnome-terminal on Lubuntu 12.04 but the problem is still there...

AndreasT
  • 341
  • 3
  • 11
  • 1
    I tried all of your examples in Ubuntu 12.10 and they worked like I'd expect: `b="你1265-3好"`, `echo ${b//[0-4]/}` produces `你65-好`. `echo $BASH_VERSION` for me returns 4.2.37(1)-release. This was in gnome-terminal by the way. – Andrew Johnson Dec 14 '12 at 02:17
  • @AndrewJohnson Thank you for your reply! So I am led to think that this actually _is_ a bug, either with the version or with `lxterminal` (Lubuntu's default) itself. – AndreasT Dec 14 '12 at 09:30
  • Yeah it does seem likely that it's a bug. I'd lean towards it being a `bash` bug, but you could try installing a different terminal emulator (like `xterm`) to see if you have the same problem there. – Andrew Johnson Dec 14 '12 at 22:49
  • 1
    This is definitely not a bash bug. I tried it on several Debians (with gnome-terminal and xterm) and it works very well. I tried it on several Ubuntus (with gnome-terminal) and it always fails. – gniourf_gniourf Dec 15 '12 at 14:35
  • 1
    tried it on lubuntu 12.10 and ubuntu 10.10, no error. it's possible some locale files were missing – marinara Jan 16 '13 at 23:53
  • @marinara Thank you for trying it out! Do you tried it on `lxterminal`? And what about your `$BASH_VERSION`? In case you are right I don't know where to start to look for missing files... any suggestion? – AndreasT Jan 17 '13 at 18:47
  • 1
    i tried urxvt and lxterminal on lubuntu 12.04 both had the problem – marinara Jan 19 '13 at 08:09

1 Answers1

3

Short answer:

set LC_ALL=C for the behaviour you expect

pauhel@permafrost:~$ b="你1265-3好"
paul@permafrost:~$ echo ${b//[0-2]/}
你1265-3好
paul@permafrost:~$ export LC_ALL=C
paul@permafrost:~$ echo ${b//[0-2]/}
你65-3好

Long answer:

The behaviour you expect relies on collation ordering which is locale/OS implementation dependent. The POSIX standard leaves it specifically undefined except for the C locale. (Bash calls an external library for this and, at a guess, it looks like that falls back to ASCII ordering if only ASCII characters are present).

Later versions of bash have a shell option that lets you specify something like you expect.

See:

https://groups.google.com/forum/#!topic/gnu.bash.bug/S6cN9KI4vK4/discussion

for more background.

tallus
  • 259
  • 1
  • 7
  • Honestly I didn't get why it works, but it does! Thank you very much, I'll check your link for more background. – AndreasT Feb 01 '13 at 15:30
  • A side problem occurred: now setting, for instance, `a=你`, the command `echo ${#a}` (i.e. length of variable `a`) returns `3` instead of `1`. Why??? – AndreasT Feb 01 '13 at 15:54