92

Currently, I use a YYMMDD-NAME+PAGE name for most of my files. NAME has spaces converted to underscores.

I'd like to use the YYYY-MM-DD date format, but I am not sure how to separate it from the name. A - would look strange if the name started with a number. If I use a _, then it conflicts with the underscore representing a space.

What characters are reasonably safe in file names that would work here? I am on Linux, but I might share files with other people (Windows 7, Mac OS X).

RedGrittyBrick
  • 81,981
  • 20
  • 135
  • 205
Martin Ueding
  • 2,365
  • 8
  • 28
  • 42

5 Answers5

91

While RedGrittyBrick's answer is technically correct, safety isn't the only issue: usability is also important. I think a better question is "what characters are good to use in a filename".

Some potential guidelines:

  • [0-9a-zA-Z_] - Alphanumeric characters and the underscore are always fine to use.
  • \/:*?"<>| and the null byte are problematic on at least one system, and should always be avoided.
  • Spaces are used as argument separators on many systems, so filenames with spaces should be avoided when possible. Other whitespaces (e.g. tabs) even more so.
  • Semicolons (;) are used to separate commands on many systems. Semicolons and commas(,) are used to separate command line arguments on (some versions of?) the windows command line.
  • []()^ #%&!@:+={}'~ and [`] all have special meanings in many shells, and are annoying to work around, and so should be avoided. They also tend to look horrible in URLs.
  • Leading characters to avoid:
    • Many command line programs use the hyphen [-] to indicate special arguments.
    • *nix based systems use a full-stop [.] as a leading character for hidden files and directories.
  • Anything not in the ASCII set can cause problems on older or more basic systems (e.g. some embedded systems), and should be used with care.

That basically leaves you with:

[0-9a-zA-Z-._]

that are always safe and not annoying to use (as long as you start the filename with an alpha-numeric) :)

naught101
  • 1,373
  • 1
  • 13
  • 25
  • 1
    The braces (`[]`) are part of regular expressions and have special meaning in the shell as well. But they are not as that bad to work with except some evil corner cases. – Martin Ueding May 01 '14 at 13:20
  • 1
    Hrm... I guess the same could be said about `()`, actually. – naught101 Jul 03 '14 at 23:57
  • 6
    In zsh, characters that might be interpreted differently include `[]()^;`, so I think the right answer might actually be `[0-9a-zA-Z.,_-]` Comma could also possibly be excluded just because it's weird to see in a filename, although I can't think of an actual case where it would cause problems. – Casey Rodarmor Sep 05 '14 at 11:22
  • yeah, I removed them from the final list – naught101 Sep 05 '14 at 15:29
  • @Phil_1984_: I had some inside the braces too : http://superuser.com/revisions/748264/1 :) – naught101 Jan 27 '15 at 22:45
  • Exactly what I was looking for -- much better than the accepted answer IMO. – Franz D. Jul 23 '15 at 22:15
  • One may add that any non-printable ASCII characters (i.e., 0-31 and 127) should be avoided, too, though this is implicit in the final character set. – Franz D. Jul 23 '15 at 22:21
  • 2
    And what about `$`? – Franz D. Jul 23 '15 at 22:34
  • 2
    I don't think I'd include `.` in the list of characters that are "not annoying to use". Obviously there's potential for misinterpretation as an extension, etc. – Steve Bennett Jan 11 '16 at 22:52
  • 1
    If you're going to disallow spaces because they might be interpreted by the shell then you need to disallow semicolon ';' as well. – bronson Mar 10 '16 at 20:11
  • bronson: removed. @SteveBennet: an extension is just a convention. The "." is just one string - the stop has no special meaning except what particular software gives it (e.g. it's basically just used as a short-cut for "open with "). – naught101 Mar 11 '16 at 02:34
  • I know, but that doesn't change my view. – Steve Bennett Mar 11 '16 at 02:36
  • @SteveBennett: OK. Other than the hidden behaviour already noted in the answer, what actually makes the use of a period in a name annoying? I mean, can you give an example? – naught101 Mar 11 '16 at 02:38
  • Well, a name like `foo.tar` would be annoying because if it was gzipped, it would become `foo.tar.gz` which would be misinterpreted. – Steve Bennett Mar 12 '16 at 04:53
  • @SteveBennet: So you're saying you should avoid the use of inappropriate file extensions (assuming `foo.tar` is not a tar archive)? I completely agree. – naught101 Mar 12 '16 at 13:11
  • 1
    comma can be annoying, try `echo whereami > a,b,c` in Win10 command prompt window. – RedGrittyBrick Jul 11 '17 at 09:58
  • The comma `,` is not safe for anything that needs to use that path in a CSV. I can't think of a single CSV interpreter that can handle a path with a comma in it. – HackSlash Jan 11 '19 at 21:12
  • @RedGrittyBrick: I don't use windows. What does that do? – naught101 Jan 14 '19 at 01:48
  • @HackSlash: this isn't about CSVs. CSVs don't contain paths, they contain strings. It's up to the interpreter how to use those strings. And CSV reader/writers *should* have the ability to write strings surrounded by quote marks (most that I've used do), in which case the problem is avoided. – naught101 Jan 14 '19 at 01:48
  • @naught101: it creates a file named "a" containing the text `whereami ,b,c` – RedGrittyBrick Jan 14 '19 at 10:40
  • Wow. WTF. Does that mean that windows command line treats `,` as an argument separator? – naught101 Jan 14 '19 at 11:58
  • What if you make a filename out of those characters and eventually get a dot-beginning filename like `.git`? – Константин Ван Jul 01 '19 at 07:17
  • "Anything not in the ASCII set can cause problems on older or more basic systems" Even on some modern file-sharing systems like Google Drive or Dropbox, or on versions of Apple's HFS filesystem, the encoding of certain non-ASCII characters can get silently changed for you, causing files not to be recognized by their original name. This makes them unsafe for many uses. – LarsH Jun 25 '20 at 01:31
  • Requiring an alphanumeric to start a filename leaves you with the only option being to prefix zeroes if you want a filename to sort before others. Not the worst solution, but still a bit clunky `00-important.txt` – cjm Sep 19 '22 at 20:38
  • I don't think underscore(only) is allowed on linux => Try to give a file or a folder the name "_" – gorhal Sep 28 '22 at 10:27
70

Summary:

  • Windows: anything except ASCII's control characters and \/:*?"<>|
  • Linux, OS-X: anything except null or /

On all platforms it is best to avoid non-printable characters such as the ASCII control-characters.

Windows

In Windows, Windows Explorer does not allow control-characters or \/:*?"<>| You can use spaces. If you use spaces, you will often have to quote the filename when used from the command line (but GUI apps are unaffected so far as I know). Windows filesystem such as NTFS apparently store the encoding with the filename, but UTF-16 is standard.

Some parts of Windows are case-sensitive, other parts are case-insensitive. It is easy to create distinct filenames like "Ab" and "ab" on a Windows NTFS filesystem. These names refer to separate files which contain distinct separate content. However, although the Windows command-prompt will happily list both files using dir, you cannot easily access or manipulate one of them using commands such as type. See below.

Linux, OS-X

In Linux and OS-X only / of the printable ASCII set is prohibited I believe. Some characters (shell metacharacters like *?!) will cause problems in command lines and will require the filename to be appropriately quoted or escaped.

Linux filesystems such as ext2, ext3 are character-set agnostic (I think they just treat it more or less as a byte stream - only nulls and / are prohibited). This means you can store filenames in UTF-8 encoding. I believe it is up to the shell or other application to know what encoding to use to properly convert the filename for display or processing.

Conclusion

So you could probably safely use something like (if it weren't so hard to type)


Case-(in)sensitivity in Windows

C> dir /B
Ab
aB
аB

C> type Ab
b
b

C> type aB
b
b

C> type аB
unicode homograph

Note that we cannot type the contents of the second file, the Windows type command just returns the contents of Ab instead. The third file would be distinct from aB on Linux also.

(Windows 10 NTFS).

RedGrittyBrick
  • 81,981
  • 20
  • 135
  • 205
  • 3
    All in all a good answer, but I'd refrain from using filenames in spaces. Escaping them properly in all contexts is more trouble than it's worth. Notice that Microsoft stopped using space in system directory names. If you need to indicate word boundaries in names, CamelCase works fine. – Isaac Rabinovitch Sep 06 '12 at 05:55
  • 5
    "C:\Program files (x86)" still exists in Win8 - is that not a system directory? I agree that spaces can cause problems. – RedGrittyBrick Sep 06 '12 at 09:22
  • It is, but it can be renamed to pretty much anything. Of course, a lot of programs will freak out if you rename it to "]:\foobar", but Windows refers to it as "%programfiles(x86)%" anyway. – Marcks Thomas Sep 06 '12 at 15:27
  • @RedGrittyBrick Oops. Good point. What I should have said is that MS stopped using names with spaces for newly-invented system directories. – Isaac Rabinovitch Sep 09 '12 at 03:43
  • 3
    Something to really keep in mind here, linux system are able to consider uppercase and lower as distinct, whilst Windows considers them the same. – thecoshman Sep 10 '15 at 22:07
  • @thecoshman +1, this is very important and should be read by anyone asking this question. – Seph Reed Jul 23 '17 at 22:02
  • 1
    @IsaacRabinovitch there are also "System Volume Information", "Documents and Settings", "All Users", "Recycle Bin"... And "My Documents", "My Videos"... although they're only a system-created directories. Actually if an app doesn't support spaces in filenames in this era I would throw it away rightaway – phuclv Jul 24 '17 at 03:56
  • 3
    You'd be surprised how many programs suck at parsing. That's why there was no Windows 9. – Isaac Rabinovitch Aug 17 '17 at 03:31
  • Clarification for Mac OS, at least as of Catalina, you can create a file with `:` in the terminal, but it displays as a forward slash `/` in Finder and does not allow adding `:` through Finder. – ahong Mar 03 '21 at 20:11
  • TIL that emojis can be used as file and folder names My data is about to get a whole new makeover ⭐ – Prid Aug 25 '23 at 16:59
4

You could:

  1. replace current underscores with # (proofreader's symbol for space)
  2. underscore to 'section' date from filename (or a second hyphen - easier to type)

Alt-1. initial-caps can replace spaces: YYMMDD-HHMM-FileName.ext or YYMMDD-HHMM_FileName.ext

Minimal characters for clear display, which auto-sorts with padded zeroes for Jan-Sep (& 1st-9th ea mo).

Indrek
  • 24,204
  • 14
  • 90
  • 93
tb01
  • 41
  • 1
2

The characters have largely been addressed by other people, though I will point out an additional aspect to consider. First, I'm addressing the choice of YYMMDD, which has two problems.

The first problem with YYMMDD is that it won't work for historic data. It will sort 1997 long after 2035, instead of before. Whether that's an issue may be dependant on how widespread you want to propagate the format.

The other issue with YYMMDD relates to the calendar reliance. While the Gregorian calendar is now the most popular one in the world, not everyone uses it or is aware of the day in its reference. Fortunately, the Gregorian year is generally known and accepted, even by those who use different years, but the month/day nomenclature can be meaningless. To be more portable, a format of YYYYDDD where DDD is the day in the year, is more portable. However, for those of us who use the Gregorian calendar that is difficult because we generally are not aware of the day of the year. The MMDD format still is sortable, even if it means nothing to an individual, who themselves might create a date like 20221442 (Gregorian year and their month and day) or 20220047 (Gregorian February 16, the 47th day of the year), thinking they are complying with your format.

Continuing the theme of how widespread the format is to be used, a consideration needs to be made for the characters available around the world. The short dash '-' is available everywhere (?) because it is the minus sign, used globally. Underscore is more of a problem, even for some who use the Latin alphabet. They can normally get to it one way or another but it is not on every keyboard. In some alphabets, the underscore is a character or character modifier, so makes confusion. In many Persian languages, the underscore would be read as a kasheeda. In many alphabets, what we use an underscore for, they would use an overline: something hard to get on our keyboards. Most of the keyboards for technical people have the simple latin alphabet available on them (sometimes on the side of the key), so they can type the letters. But not always the underscore.

  • 1
    Oh, don't forget that everybody does not speak English. I'd recommend you use Latin, it's a old, dead, scholarly language and is used extensively in scientific fields. It also has the added benefit of using more endings and fewer words, so spaces are less often a problem. – Lupus Ossorum May 19 '22 at 17:16
1

also note that: not only characters can be invalid, but also filenames (combinations of characters) can be invalid.

on windows, these filenames are invalid, because they are legacy device names:

CON PRN AUX NUL
COM0 COM1 COM2 COM3 COM4 COM5 COM6 COM7 COM8 COM9
LPT0 LPT1 LPT2 LPT3 LPT4 LPT5 LPT6 LPT7 LPT8 LPT9

even worse, this limitation is case-insensitive, so for example Con is an invalid filename.

even worse, you cannot use these reserved names as basename, so for example Con.txt is an invalid filename.

see also:

a more well-known limit on filenames is the size limit of 255 bytes (not 255 unicode-characters).

See also:

milahu
  • 150
  • 9