1

Searching Excel with no luck for text sequences that I knew where there was driving me insane. I was copy-pasting search terms from a website Digi-Key (digi-key.com) and searching for them in an Excel database.

I finally figured out what was wrong when I accidentally middle-clicked into a MINGW64 window:

IMG:

When I double-clicked on the text "‎BRL2012T2R2M‎" and pasted it into the MINGW64 window, it revealed the secret: the text was actually \342\200\216‎BRL2012T2R2M‎\342\200\216 (photo linked)

What are these control codes, and why does windows pick them up even when I dump the paste into notepad then re-copy it?

Reddy Lutonadio
  • 17,120
  • 4
  • 14
  • 35
Ben
  • 11
  • 2

1 Answers1

1

Shown by bash, \342\200\216 are C-style octal escapes, which can also be written as hexadecimal \xE2\x80\x8E.

The bytes E2 80 8E (hex) are UTF-8 encoding of the Unicode codepoint value U+200E, which is an invisible character called Left-to-right mark.

It indicates that the following text is read left-to-right even if the surrounding text is normally right-to-left (as some languages are, such as Arabic). The website author most likely adds these marks to ensure the part's name won't get corrupted when the website's interface is switched to those languages.

See this W3C article for an introduction to inline bidirectional markup.

u1686_grawity
  • 426,297
  • 64
  • 894
  • 966
  • Thanks! Is there a way to make Windows not pick these up when copying? – Ben May 29 '20 at 06:34
  • That's a question for the program that you're copying from, not for Windows itself... But as you have bash available, I would just paste everything into `sed $'s/\342\200\216//g'` and get a clean version back. – u1686_grawity May 29 '20 at 06:55
  • I get those codes when retrieving Date Properies with the {GetDetailsOf( )] method in **PowerShell**. Don't know if it would work for you, but I filtered the text with a regular expression. This code will filter the clipboard text for "white-listed" chacters defined by a regular expression that ccan be easily modified if you need to allow additional characters. You could save as a one-line script & launch from a shortcut after cutting but before pasing. `(gcb).Replace('[^\(\\: ;\w\/\)]','') | scb` – Keith Miller May 29 '20 at 07:09