23

Upon reviewing the MIME source for an email (presumably containing international characters), I see stuff like this in Notepad++

enter image description here

I understand that CRLF is carriage return line feed, but what about the others? What do SOH, GS, and STX mean?

Steven
  • 27,531
  • 11
  • 97
  • 118
Mike B
  • 2,660
  • 14
  • 41
  • 59

2 Answers2

24

Notepad++ uses these symbols to represent control characters or non-printing characters.

Control character - Wikipedia

A control character or non-printing character is a code point (a number) in a character set, that does not represent a written symbol.

C0 and C1 control codes - Wikipedia

STX - Start of Text - First character of message text, and may be used to terminate the message heading.

SOH - Start of Header - First character of a message header.

GS - Group Separator - Can be used as delimiters to mark fields of data structures. If used for hierarchical levels, US is the lowest level (dividing plain-text data items), while RS, GS, and FS are of increasing level to divide groups made up of items of the level beneath it.

Steven
  • 27,531
  • 11
  • 97
  • 118
  • Hmm... I'm confused on why headers for an email may contain a non-printing character. Would it be accurate to interpret that prior description to mean "a control character or non-printing character is a code point (a number) in a character set, that does not represent a written ASCII symbol." – Mike B Jul 17 '15 at 21:58
  • I am neither familiar with the specifications for mail headers nor why they might contain non-printable characters. However, Notepad++ uses its own symbols (as you showed) to display these control characters. – Steven Jul 17 '15 at 22:40
  • 1
    The non-printable characters all appear within the X-Example header, inside quotes. That header is not part of any email standard (see http://stackoverflow.com/questions/14469110/what-do-x-headers-in-mails-stand-for). It will have been set either by the client that created the email, or perhaps one of the transports along the way. Either way, as far as processing the email is concerned, it's effectively a comment, and could be anything. It's as if someone drew a little heart on an envelope - it's not part of the postal standard, and doesn't affect delivery at all. – Randy Orrison Jul 23 '15 at 11:38
  • 1
    ... well, of course that's as long as it doesn't actually break things. An X- header with really long UNICODE strings could trigger a buffer overflow bug in a transport program, and a heart written over the address could confuse the postman. But within reason, it's just ignored. – Randy Orrison Jul 23 '15 at 11:42
10

SOH is Start Of Header;

STX is Start of TeXt;

GS is Group Separator.

Unknow0059
  • 83
  • 1
  • 12
td512
  • 5,031
  • 2
  • 18
  • 41