46

While trying to find out how to get CSVs to open correctly in Excel on a Spanish PC (belonging to a customer) I found many answers recommending using sep=, at the top of the file:

On the face of it, this approach works but none of those answers give any further information on where this metadata option comes from. I have tried to search for what it means (and hence found all of those answers) but have been unable to get any further information especially given that punctuation is notoriously hard to search for.

My primary concern is whether or not this is an Excel-specific feature. I suspect it is but have found nothing definitive to confirm it.

Some related questions are:

  • What characters can be used in this setting?
  • What other settings are available (eg. line termination character, quote character, etc).
  • Are there any other tools that officially support this feature?

I'm hoping there is a piece of documentation somewhere that someone can point me to that will answer all these questions and more; I just haven't been able to find it.

Some clarification:

The sep= is not a parameter to a parser. It is meant to be placed inside the CSV. Example:

sep=|
"LETTER"|"ANIMAL"
"a"|"aardvark"
"b"|"bear"
"c"|"cow"
Burhan Ali
  • 736
  • 2
  • 7
  • 19
  • 8
    Yes, this is **excel specific**. – Raystafarian Feb 21 '15 at 13:09
  • 2
    Were you able to get anything on this? I am stumped with the same question and I am yet to find any MS specs that describes this feature. If this feature is undocumented by MS, how the heck did it get out in the first place? – toddlermenot Dec 24 '15 at 14:56
  • @toddlermenot Sadly not. All I get is people making the assertions that it is Excel specific with nothing to back up that assertion, which is disappointing. – Burhan Ali Dec 24 '15 at 14:59

5 Answers5

19

RFC 4180 is commonly recognized as the standard for the CSV format, and does not mention any such feature.

W3C Model for Tabular Data and Metadata on the Web mentions it as a feature not within their scope:

Many "CSV" files embed metadata, for example in lines before the header row of the CSV document. This specification does not define any formats for embedding metadata within CSV files, aside from the names of columns in the header row.

So while I did not find any direct proof that this feature is Excel-specific, this apparently still is the case as there seems to be no standardization document covering that feature.

Stefan Kögl
  • 290
  • 4
  • 6
  • Good sourcing. One thing I should say is that sep=| defines the seperator for use in Excel as the pipe character ('|'). It is not widely used but is not exclusive to Excel. It is a seperator directive. International versions of Excel iirc have different characters, for instance, European language on a machine running excel uses semicolons. – var firstName Jun 30 '17 at 15:24
  • 2
    I am aware of the use (maybe that was not entirely clear from my answer). I was just pointing out that it seems that Excel introduced that feature, with it being based on any kind of (public) standard. – Stefan Kögl Jun 30 '17 at 16:33
1

The instruction sep= is used in all the countries where comma is used as decimal delimiter, as Italy. Since Excel export CSV using semicolons instead of commas in those countries, if you want to be sure that the file can be read in USA or other countries, you NEED

sep=;

on top of file. This is Excel-specific and it might be ignored by other applications.

  • I honestly do not understand why my answer has been downvoted. It is correct. I do that often. It looks like again somebody is bullying me on this site. I already signaled that to admins with no result. – Dario de Judicibus Aug 23 '22 at 14:16
  • This is not very accurate. 1. "Is used", by what program or by who, Excel? I tried exporting CSV with Excel and it did not include the `sep=;` line. 2. "is used (...) where comma ..." is incorrect: what is true for Italy->USA is true USA->Italy, that is, if you export a CSV file in the USA and want Excel in Italy to open it properly, then you also need to include the `sep=,` otherwise it puts the whole row in the first cell (as it looks for `;` as separator but there is none, only `,`... it doesn't fallback). You can test this by changing your regional format in Windows' settings, try Spanish. – ANeves Feb 02 '23 at 07:50
  • 1
    I think this could have been downvoted because it doesn't answer the question(s) asked, it only explains things that the asker already seems to know based on their question. – ANeves Feb 02 '23 at 07:51
  • @ANeves 1. "it is used" means that it is recognized by Excel when opening a CSV file, not that Excel save it in CSV files. 2. No, if you use sep=; in a file you can open it in ANY country: Excel will separate correctly the values and will convert to the local standard. – Dario de Judicibus Feb 02 '23 at 14:09
  • Most applications won't "ignore" the `sep=` line, they will parse it like any other data, which may have negative side-effects. The proper solution would have been to make a NEW file extension like `ssv` (semicolon-separated values) or `csv2`, but nooo. People had to go and change the delimiters of a file format and create all this confusion and incompatibility. – MasterHD Jul 12 '23 at 10:06
0

To contribute towards an answer to one of your questions: "Are there any other tools that officially support this feature?"

This is not supported by Apple Numbers (I've tested using Numbers version 3.6.2).

It's also not supported by csv2json (https://www.npmjs.com/package/csv2json) nor csvtojson (https://www.npmjs.com/package/csvtojson).

kintel
  • 121
-1

I tested this as well on Google Sheet and Google does not support it for the moment. But Google Sheet does not process CSV files correctly if a ";" is used as separator.

I personal think it is much easier for tool vendors to analyse the CSV file and identify the separator automatically. I did not find a tool yet which do this :(

Stephan Zehrer
  • 147
  • 1
  • 1
  • 5
  • 3
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 13 '22 at 11:44
  • I tested on Google Sheets just now, and it DOES support it (user can choose a custom separator, tab, or let it auto-detect), but it also parses the `sep=` as a line of data (not ignored metadata like Excel does). – MasterHD Jul 12 '23 at 10:11
-4

Just about any tool that can import from .csv files recognises that comma is not the only possible separator, tabs have been common for a long time for example. I think this goes back to the days of punch cards even, that there needs to be some character that is recognised as end of data field - with the exception of systems that use fixed width fields or length prefixed fields just about every system has need this since the advent of computers.

The 3 essentials are a field separator, a record separator and an end of data marker, csv by default uses comma, newline and end of file for these respectively but just about any valid input can be used, you do, however, need a mechanism for escaping the special markers if they occur within your valid data.

One historic system I came across used newline for end of field, 2 blank lines for end of record and *!*!* END *!*!* for the end of the data. I have commonly come across all of ,\t|: used as a field separator along with various control characters.

One thing to be wary of is that the French and others use comma as a decimal point - this can cause all sorts of fun and games. There is a degree of standardisation for csv files in [rfc4180][1] but there are times when you need to manually edit files when exporting from one program and importing to another.

To answer the second part of your question Python, for example, includes a [csv reader/writer][2] as one of the standard libraries that offers the following options:

  • delimiter - This is the field separator, (unless escaped).
  • doublequote - If true then a " within a field will be represented as ""
  • escapechar - removes any special meaning from any following character, e.g. if it is set to \ then quote can be represented as \"
  • lineterminator - usually one of \n, \r, \r\n or \n\r
  • quotechar - character used to quote strings, usually " or '
  • quoting - flag as one of:
  • ALL - Every field is in quotes, i.e.: "1","Free Beer","ASAP"
  • MINIMAL - Only quote fields that have special characters, i.e.: 1,"Beer, (Free)", Now, Please
  • QUOTE_NONNUMERIC - Quote anything that is not a number, i.e.: 1,"Beer, (Free)", "Now", "Please"
  • NONE - Don't quote escape instead, i.e.: 1,Beer\, (Free), Now, Please
  • skipinitialspace - skip any leading space of each field so you can use , rather than just , as a field separator.

Generally the more flexible and well thought out a tool that can import csv files is the more of these options, (possibly with differing names and/or mechanisms), that it will have. [1]: https://www.rfc-editor.org/rfc/rfc4180 [2]: https://docs.python.org/2/library/csv.html

Steve Barnes
  • 274
  • 3
  • 7
  • 8
    This doesn't answer my question. It explains what CSVs are, which I am already aware of. My question is about a specific piece of syntax. Is there any background that I could add to my question to clarify things? – Burhan Ali Jun 26 '14 at 09:53
  • @BurhanAli I have expanded answer to show the options that python offers and the fact that the coverage and syntax of the options will differ from tool to tool. – Steve Barnes Jun 26 '14 at 10:09
  • 1
    Thanks but that still doesn't help. `sep=` is not a parameter to a parser. I have added some clarification to my question. – Burhan Ali Jun 26 '14 at 10:17
  • 3
    @SteveBarnes But does any tool, for example the Python library you mention, allow options as part of the content of the file? That is, I think, the main point of the question. – Randy Orrison Jun 26 '14 at 10:19
  • That PDF doesn't mention `sep=` or any similar pieces of metadata. Which section of the document were you referring to? – Burhan Ali Jun 26 '14 at 11:59
  • Because they one of the many excel oddities. – Steve Barnes Jun 26 '14 at 12:05
  • 3
    "Because they one of the many excel oddities." *Is* it though? That's part of what I am trying to work out. I don't want to know what a CSV is. I don't want to know how to parse a CSV. I want to know what `sep=` is and where it comes from. – Burhan Ali Jun 26 '14 at 13:39
  • 5
    sep= is a **Microsoft Excel Specific** hint method that is not covered in any Non-Microsoft specification that I can find. It comes from a part of Microsoft. – Steve Barnes Jun 26 '14 at 18:49