Professional Documents
Culture Documents
TABLE OF CONTENTS
CSV-1203
Introduction
Open standard
Other guidance
RFC4180
CSV-1203
Terminology
ABNF
<csvfile>
CSV-1203
1.1
1.2
The character set used by data contained in the file MUST be an 8-bit superset of the 7-bit US-ASCII.
1.3
ASCII control characters, other than payload delivery markers, MUST NOT be present in a CSV file.
1.4
<csvfile> = csvheader *datarecord [EOF] csvheader = csvrecord datarecord = csvrecord ; Literal terminals EOF = %x1A
CSV-1203
2.1
2.2
csvrecord = recordpayload EOR recordpayload = fieldcolumn COMMA fieldcolumn *(COMMA fieldcolumn) ; Literal EOR CRLF COMMA CR LF terminals = CRLF / CR / LF = CR LF = "," = %x0D = %x0A
10
3.1
Each record within the same CSV file MUST contain the same number of field columns.
3.2
11
CSV-1203
3.3
Field columns MUST be separated from each other by a single separation character.
3.4
%xA0
= = = = = = = = =
fieldcolumn COMMA fieldcolumn *(COMMA fieldcolumn) protectedfield / unprotectedfield DQUOTE [EXP] fieldpayload DQUOTE 1*anychar [EXP] rawfieldpayload safechar / (safechar *char safechar) char / COMMA / DDQUOTE / CR / LF / TAB safechar / SPACE %x21 / %x23-2B / %x2D-FF
; Literal terminals COMMA = "," EXP = "~" ; Excel protection marker CR = %x0D DQUOTE = %x22 DDQUOTE = %x22 %x22 LF = %x0A SPACE = %x20 TAB = %x09
12
The field separator MUST be a single character. Once selected, the same field separator character MUST be used throughout the entire file. The producer application SHOULD use the comma (ASCII 0x2C) as the field separator.
4.4
The consumer application MUST accept either the comma, the tab character (ASCII 0x09), the vertical bar (pipe) character (ASCII 0x7C), or semicolon (ASCII 0x3B) as the field separator. The field separator MUST NOT be taken as part of a field.
4.5
separator = "," / ";" / PIPE / TAB ; Literal terminals PIPE = %x7C TAB = %x09
13
CSV-1203
5.1
5.2
Once selected, the same end-of-record marker MUST be used throughout the file.
5.3
The EOR marker MUST NOT be taken as being part of the CSV record.
14
6.1
The CSV file MAY be logically terminated with an end-of-file (SUB - ASCII 0x1A) marker.
6.2
A consumer application MUST treat the logical EOF as if it were the physical end of the file.
6.3
6.4
The EOF marker MUST NOT be taken as being part of the last CSV record.
<csvfile>
<csvfile> = csvheader *datarecord [EOF] ; Literal terminals EOF = %x1A
15
CSV-1203
7.1
A CSV file MUST contain exactly one header record. Header labels MUST NOT be blank. Header labels MUST NOT contain variable data.
7.5
Each header label SHOULD provide a semantic clue to the nature of the data to be found in its field column.
csvheader = csvrecord
16
8.1
8.2
If a record has a record type identifier, it SHOULD be stored in the first field column in the record.
8.3
17
CSV-1203
9.1
If a field's payload contains one or more commas, doublequotes, CR or LF characters, then it MUST be delimited by a pair of double-quotes (ASCII 0x22).
9.2 9.3
If a field's payload contains a double-quote character, it MUST be represented by two consecutive double-quotes. The producer application SHOULD NOT apply double-quote protection where it is not required.
= = = = =
DQUOTE [EXP] fieldpayload DQUOTE 1*anychar char / COMMA / DDQUOTE / CR / LF / TAB safechar / SPACE %x21 / %x23-2B / %x2D-FF
; Literal terminals COMMA = "," EXP = "~" ; Excel protection marker CR = %x0D DQUOTE = %x22 DDQUOTE = %x22 %x22 LF = %x0A SPACE = %x20 TAB = %x09
18
19
CSV-1203
20
Diagram 6 : The tilde prefix can protect long-number fields from Excel
10.1
If a field requires Excel protection, its payload MUST be prefixed with a single tilde character.
10.2
If a field's payload begins with a tilde character, then Excel protection MUST be applied to that field.
10.3
Any field in any record MAY use Excel protection without requiring any other field to use the same protection.
21
CSV-1203
= = = = = = =
DQUOTE [EXP] fieldpayload DQUOTE [EXP] rawfieldpayload 1*anychar char / COMMA / DDQUOTE / CR / LF / TAB safechar / (safechar *char safechar) safechar / SPACE %x21 / %x23-2B / %x2D-FF
; Literal terminals COMMA = "," EXP = "~" ; Excel protection marker CR = %x0D DQUOTE = %x22 DDQUOTE = %x22 %x22 LF = %x0A SPACE = %x20 TAB = %x09
22
APPENDIX A
Known issues when using Excel
23
CSV-1203
Data 0 00 01234123456 01234 123456 " 0" 0 "0 " 0 0001 "0001" "1.0000" 1.0000 "1,200.00" "1,200.00" "$1,200.00" ~001.0000
Excel Edit Bar 0 0 1234123456 01234 123456 0 0 0 0 1 1 1 1 1200 1200 $1,200.00 ~001.0000
="000012.030000"
24
Excel Display 0/1 01-Jan 12-Dec 13-Dec 32/12 Dec-13 Dec-30 ~1/1
Excel Edit Bar 0/1 01-01-2012 12-12-2012 13-12-2012 32/12 01-12-2013 01-12-1930 ~1/1
Excel Save 0/1 01-Jan 12-Dec 13-Dec 32/12 Dec-13 Dec-30 ~1/1
Re-loaded Image 0/1 01-Jan 12-Dec 13-Dec 32/12 Dec-13 Dec-30 ~1/1
ID
filename
filename
25
CSV-1203
26
APPENDIX B
US-ASCII character chart
Char (nul) (soh) (stx) (etx) (eot) (enq) (ack) (bel) (bs) (ht) (lf) (vt) (ff) (cr) (so) (si) (dle) (dc1) (dc2) (dc3) (dc4) (nak) (syn) (etb) (can) (em) (sub) (esc) (fs) (gs) (rs) (us)
Dec 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Hex 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17 0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F
Dec 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Hex 0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27 0x28 0x29 0x2A 0x2B 0x2C 0x2D 0x2E 0x2F 0x30 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 0x39 0x3A 0x3B 0x3C 0x3D 0x3E 0x3F
Char @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _
Dec 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
Hex 0x40 0x41 0x42 0x43 0x44 0x45 0x46 0x47 0x48 0x49 0x4A 0x4B 0x4C 0x4D 0x4E 0x4F 0x50 0x51 0x52 0x53 0x54 0x55 0x56 0x57 0x58 0x59 0x5A 0x5B 0x5C 0x5D 0x5E 0x5F
Char ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ (del)
Dec 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
Hex 0x60 0x61 0x62 0x63 0x64 0x65 0x66 0x67 0x68 0x69 0x6A 0x6B 0x6C 0x6D 0x6E 0x6F 0x70 0x71 0x72 0x73 0x74 0x75 0x76 0x77 0x78 0x79 0x7A 0x7B 0x7C 0x7D 0x7E 0x7F
27
CSV-1203
APPENDIX C
CSV-1203 expressed as an ABNF rule set
; CSV-1203 file syntax <csvfile> csvheader datarecord csvrecord recordpayload fieldcolumn protectedfield unprotectedfield fieldpayload rawfieldpayload = = = = = = = = = = csvheader *datarecord [EOF] csvrecord csvrecord recordpayload EOR fieldcolumn COMMA fieldcolumn *(COMMA fieldcolumn) protectedfield / unprotectedfield DQUOTE [EXP] fieldpayload DQUOTE [EXP] rawfieldpayload 1*anychar safechar / (safechar *char safechar)
; Character collections anychar char EOR safechar = = = = char / COMMA / DDQUOTE / CR / LF / TAB safechar / SPACE CRLF / CR / LF %x21 / %x23-2B / %x2D-FF
; Literal terminals CRLF COMMA EXP CR DQUOTE DDQUOTE EOF LF SPACE TAB = CR LF = "," = "~" = = = = = = = %x0D %x22 %x22 %x22 %x1A %x0A %x20 %x09
28
APPENDIX V
Version Summary
Notes Initial release. Second draft released. Third draft released. E1. Revision B. Minor improvement to typography used in some diagrams. Documented issue of .CSV files which contain a single data column. E1. Revision C. Addition of new rules 6.3 and 6.4 (EOF Marker) E1. Revision D. Documented two additional operational issues relating to the use of Excel: A.8 - @Twitter names, and A.9 - The surname of "TRUE".
29
CSV-1203
GLOSSARY
30
31