18

I'm writing a filename I/O procedure in x86-16 assembly language. It takes eight characters (I don't need to support long filenames) from the keyboard and prints them to an on-screen text input field.

At the moment I'm allowing numbers, upper/lower-case letters, underscores, and hyphens.

I'd like to allow all legal symbols, but I can't find an official list of banned characters. Common sense tells me that slashes are illegal, but if I had to guess, I would say that the plus character is legal. (edit: It's not!)

I'm already ignoring the period character since my code automatically handles appending the period and file extension.

phuclv
  • 26,555
  • 15
  • 113
  • 235
My life is a bug.
  • 383
  • 1
  • 2
  • 11
  • 17
    You might also find [retrocomputing.se] useful. – Bob Sep 28 '18 at 07:27
  • Try to create a folder in Windows and put a '?' in the name. A tooltip tells you which characters are forbidden. This gives you a start :)... – Mixxiphoid Sep 28 '18 at 09:49
  • @Mixxiphoid that won't work because the set of allowed characters in Windows are much larger. For example `+,;[]`, space and `a-z` are allowed in Windows but not DOS. Explorer gives me the error "A file name can't contain any of the following characters `\ / : * ? " < > |` which is just a subset of the banned characters in DOS – phuclv Sep 28 '18 at 14:44
  • 1
    @phuclv that is why I said 'This gives you a start' and also why this is a comment and not an answer. – Mixxiphoid Sep 28 '18 at 14:46
  • Why all MS-DOS symbols? Why not also consider other older OS rules? – jpmc26 Sep 30 '18 at 00:45

4 Answers4

33

A concise summary can be found on Wikipedia:

Legal characters for DOS filenames include the following:

  • Upper case letters AZ
  • Numbers 09
  • Space (though trailing spaces in either the base name or the extension are considered to be padding and not a part of the filename, also filenames with spaces in them must be enclosed in quotes to be used on a DOS command line, and if the DOS command is built programmatically, the filename must be enclosed in quadruple quotes when viewed as a variable within the program building the DOS command.)
  • ! # $ % & ' ( ) - @ ^ _ ` { } ~
  • Values 128–255 (though if NLS services are active in DOS, some characters interpreted as lowercase are invalid and unavailable)

This excludes the following ASCII characters:

  • " * + , / : ; < = > ? \ [ ] | [9]
  • Windows/MS-DOS has no shell escape character
  • . (U+002E . full stop) within name and extension fields, except in . and .. entries (see below)
  • Lower case letters az (stored as A–Z on FAT12/FAT16)
  • Control characters 0–31
  • Value 127 (DEL)[dubious – discuss]

https://en.wikipedia.org/wiki/8.3_filename#Directory_table

And here's what MS-DOS 6 user guide officially said

Naming Files and Directories

Every file and directory, except for the root directory on each drive, must have a name. The following list summarizes the rules for naming files and directories. File and directory names:

  • Can be up to eight characters long. In addition, you can include an extension up to three characters long.
  • Are not case-sensitive. It does not matter whether you use uppercase or lowercase letters when you type them.
  • Can contain only the letters A through Z, the numbers 0 through 9, and the following special characters: underscore (_), caret (^), dollar sign ($), tilde (~), exclamation point (!), number sign (#), percent sign (%), ampersand (&), hyphen (-), braces ({}), at sign (@), single quotation mark (`), apostrophe ('), and parentheses (). No other special characters are acceptable.
  • Cannot contain spaces, commas, backslashes, or periods (except the period that separates the name from the extension).
  • Cannot be identical to the name of another file or subdirectory in the same directory.

This is from PC-DOS 7:

The name you assign to a file must meet the following criteria:

  • It can contain no more than eight characters.
  • It can consist of the letters A through Z, the numbers 0 through 9, and the following special characters:

    _ underscore            ^  caret
    $ dollar sign           ~  tilde
    ! exclamation point     #  number sign
    % percent sign          &  ampersand
    - hyphen                {} braces
    @ at sign               `  single quote
    ' apostrophe            () parentheses
    

Note: No other special characters are acceptable.

  • The name cannot contain spaces, commas, backslashes, or periods (except the period that separates the name from the extension).
  • The name cannot be one of the following reserved file names: CLOCK$, CON, AUX, COM1, COM2, COM3, COM4, LPT1, LPT2, LPT3, LPT4, NUL, and PRN.
  • It cannot be the same name as another file within the directory.

User's Guide - PC DOS 7

The first byte of a name must not be 0x20 (space). Short names or extensions are padded with spaces. Special ASCII characters 0x22 ("), 0x2a (*), 0x2b (+), 0x2c (,), 0x2e (.), 0x2f (/), 0x3a (:), 0x3b (;), 0x3c (<), 0x3d (=), 0x3e (>), 0x3f (?), 0x5b ([), 0x5c (\), 0x5d (]), 0x7c (|) are not allowed.

The FAT filesystem

If you're also interested in MS-DOS 5.0 then here it is.

Peter Mortensen
  • 12,090
  • 23
  • 70
  • 90
phuclv
  • 26,555
  • 15
  • 113
  • 235
  • 11
    It might be worth noting that even though they only contain valid characters the special filenames `CON`, `PRN`, `AUX`, `NUL`, `COM1`, `COM2`, `COM3`, `COM4`, `COM5`, `COM6`, `COM7`, `COM8`, `COM9`, `LPT1`, `LPT2`, `LPT3`, `LPT4`, `LPT5`, `LPT6`, `LPT7`, `LPT8`, and `LPT9` are also not permitted (see [here](https://docs.microsoft.com/en-us/windows/desktop/fileio/naming-a-file)) – Bill Tür stands with Ukraine Sep 28 '18 at 10:42
  • 3
    @ThomasSchremser "Do not use", "Avoid" and "Not recommended" is not the same as "not permitted". – RobIII Sep 28 '18 at 14:23
  • 1
    @RobIII Yes but they linked to "Windows > Desktop" documentation, not to "MS-DOS" documentation. The [wiki for DOS](https://en.wikipedia.org/wiki/DOS#Reserved_device_names) says, "There are reserved device names in DOS that cannot be used as filenames regardless of extension as they are occupied by built-in character devices". In other words, not permitted in DOS and some versions of Windows, and not recommended in other Windows versions. – Quantic Sep 28 '18 at 18:31
  • It's interesting that the ` is referred to as a single quote. I've always heard it called a backtick, and the ' (what they call (not incorrectly) an apostrophe) as a single quote. – ale10ander Sep 28 '18 at 18:41
  • 2
    @ale10ander yeah that surprised me to. I've always hated that many people use it for the apostrophe (like in I\`m) or the begin part of the quote. For example GNU documentations always write `like this' which is very ugly and less readable to me – phuclv Sep 28 '18 at 19:06
  • Interestingly, even in Windows 7 (and possibly beyond?) an attempt to create a folder named `CON` will fail. The special filenames have been around a looong time. – smitelli Sep 29 '18 at 01:19
  • @smitelli it's just for backward compatibility when running commands in the console. You can create files/folders with those names on Windows easily without problem. [Creating a folder named "CON" in Windows](https://superuser.com/q/129141/241386), – phuclv Sep 29 '18 at 02:27
13

Strictly speaking, as an MS/PC/DR-DOS applications programmer you are supposed to ask the operating system for this information. INT 0x21 with AX=0x6505 returns a pointer to the so-called FCHAR NLS table for your country and code page. This table lists a range of characters and a further set of characters that terminate filenames.

In theory it varies by country and code page. But the fact that it was not formally carried over into the OS/2 Control Program API and the fact that FreeDOS has 1 table across all codepages and countries show that it is largely invariant in practice.

Further reading

JdeBP
  • 26,613
  • 1
  • 72
  • 103
10

I found this in a manual for MS-DOS 3.3. I'm running 6.22, but it probably still applies. I was wrong about '+' being allowed.

Enter image description here

Peter Mortensen
  • 12,090
  • 23
  • 70
  • 90
My life is a bug.
  • 383
  • 1
  • 2
  • 11
  • 2
    A manual from back-in-the-day is more reliable than Wikipedia – Stewart Sep 28 '18 at 12:53
  • @Stewart what's important are the quotes on Wikipedia, not Wikipedia itself. If in doubt just check the footnotes and references in the article \@Mylifeisabug I've just added the MS-DOS 6 manual – phuclv Sep 28 '18 at 15:52
3

If you just want to validate the filename, you may want to use INT 21H/AH=60H (TRUENAME - CANONICALIZE FILENAME OR PATH) after ensuring that the passed filename doesn't have a colon or backslash (those may be treated as drive letters and directories): the function takes your proposed filename and tries to canonicalize it by uppercasing the letters and checking for invalid characters (it also adds a drive letter/server name and path.)

In pseudocode:

If !(filename contains {"/", "\", ".", ":"})
    Canonicalize filename (INT 21H/AH=60H)
    If !(CF is set) filename is valid
Filename is not valid
ErikF
  • 259
  • 1
  • 3