You are on page 1of 31

2

TABLE OF CONTENTS

CSV-1203

Introduction

A standard with benefits

Open standard

Other guidance
RFC4180

CSV File Format Specification

Other file formats used for B2B data exchange

CSV's association with Excel

Files that have a single data column

CSV-1203

Terminology

ABNF

<csvfile>

= csvheader *datarecord [EOF]

CSV File Format Specification

The CSV-1203 standard

CSV-1203

Diagram 1 : CSV File Structure

1.1

The file type extension MUST be set to .CSV

1.2

The character set used by data contained in the file MUST be an 8-bit superset of the 7-bit US-ASCII.

1.3

ASCII control characters, other than payload delivery markers, MUST NOT be present in a CSV file.

CSV File Format Specification

1.4

A CSV file MUST contain at least one record.

<csvfile> = csvheader *datarecord [EOF] csvheader = csvrecord datarecord = csvrecord ; Literal terminals EOF = %x1A

CSV-1203

Diagram 2 : CSV Record

2.1

A CSV record consists of a payload followed by an end-ofrecord marker.

2.2

The length of the record payload MUST be greater than zero.

csvrecord = recordpayload EOR recordpayload = fieldcolumn COMMA fieldcolumn *(COMMA fieldcolumn) ; Literal EOR CRLF COMMA CR LF terminals = CRLF / CR / LF = CR LF = "," = %x0D = %x0A

10

CSV File Format Specification

Diagram 3 : Fields within a record payload

3.1

Each record within the same CSV file MUST contain the same number of field columns.

3.2

A record payload MUST contain at least two field columns.

11

CSV-1203

3.3

Field columns MUST be separated from each other by a single separation character.

3.4

A field column MUST NOT have leading or trailing whitespace.

%xA0

recordpayload fieldcolumn protectedfield fieldpayload unprotectedfield rawfieldpayload anychar char safechar

= = = = = = = = =

fieldcolumn COMMA fieldcolumn *(COMMA fieldcolumn) protectedfield / unprotectedfield DQUOTE [EXP] fieldpayload DQUOTE 1*anychar [EXP] rawfieldpayload safechar / (safechar *char safechar) char / COMMA / DDQUOTE / CR / LF / TAB safechar / SPACE %x21 / %x23-2B / %x2D-FF

; Literal terminals COMMA = "," EXP = "~" ; Excel protection marker CR = %x0D DQUOTE = %x22 DDQUOTE = %x22 %x22 LF = %x0A SPACE = %x20 TAB = %x09

12

CSV File Format Specification

4.1 4.2 4.3

The field separator MUST be a single character. Once selected, the same field separator character MUST be used throughout the entire file. The producer application SHOULD use the comma (ASCII 0x2C) as the field separator.

4.4

The consumer application MUST accept either the comma, the tab character (ASCII 0x09), the vertical bar (pipe) character (ASCII 0x7C), or semicolon (ASCII 0x3B) as the field separator. The field separator MUST NOT be taken as part of a field.

4.5

separator = "," / ";" / PIPE / TAB ; Literal terminals PIPE = %x7C TAB = %x09

13

CSV-1203

5.1

The end-of-record marker MUST be one of the following: CR+LF, LF or CR.

5.2

Once selected, the same end-of-record marker MUST be used throughout the file.

5.3

The EOR marker MUST NOT be taken as being part of the CSV record.

EOR = CRLF / CR / LF ; Literal terminals CRLF = CR LF CR = %x0D LF = %x0A

14

CSV File Format Specification

6.1

The CSV file MAY be logically terminated with an end-of-file (SUB - ASCII 0x1A) marker.

6.2

A consumer application MUST treat the logical EOF as if it were the physical end of the file.

6.3

A logical EOF marker cannot be double-quote protected.

6.4

The EOF marker MUST NOT be taken as being part of the last CSV record.

<csvfile>
<csvfile> = csvheader *datarecord [EOF] ; Literal terminals EOF = %x1A

15

CSV-1203

7.1

The header record MUST be the first record in the file.

7.2 7.3 7.4

A CSV file MUST contain exactly one header record. Header labels MUST NOT be blank. Header labels MUST NOT contain variable data.

7.5

Each header label SHOULD provide a semantic clue to the nature of the data to be found in its field column.

csvheader = csvrecord

16

CSV File Format Specification

8.1

A CSV file MAY contain zero, one, or more data records.

8.2

If a record has a record type identifier, it SHOULD be stored in the first field column in the record.

8.3

Trailer records MUST NOT be used.

csvfile = csvheader *datarecord [EOF] datarecord = csvrecord

17

CSV-1203

9.1

If a field's payload contains one or more commas, doublequotes, CR or LF characters, then it MUST be delimited by a pair of double-quotes (ASCII 0x22).

9.2 9.3

If a field's payload contains a double-quote character, it MUST be represented by two consecutive double-quotes. The producer application SHOULD NOT apply double-quote protection where it is not required.

protectedfield fieldpayload anychar char safechar

= = = = =

DQUOTE [EXP] fieldpayload DQUOTE 1*anychar char / COMMA / DDQUOTE / CR / LF / TAB safechar / SPACE %x21 / %x23-2B / %x2D-FF

; Literal terminals COMMA = "," EXP = "~" ; Excel protection marker CR = %x0D DQUOTE = %x22 DDQUOTE = %x22 %x22 LF = %x0A SPACE = %x20 TAB = %x09

18

CSV File Format Specification

Diagram 4 : A protected field within field separators.

Diagram 5 :Two-for-one translation of double-quote pairs.

19

CSV-1203

20

CSV File Format Specification

Diagram 6 : The tilde prefix can protect long-number fields from Excel

10.1

If a field requires Excel protection, its payload MUST be prefixed with a single tilde character.

10.2

If a field's payload begins with a tilde character, then Excel protection MUST be applied to that field.

10.3

Any field in any record MAY use Excel protection without requiring any other field to use the same protection.

21

CSV-1203

Diagram 7 : Arrangement of payload markers providing dual protection.

protectedfield unprotectedfield fieldpayload anychar rawfieldpayload char safechar

= = = = = = =

DQUOTE [EXP] fieldpayload DQUOTE [EXP] rawfieldpayload 1*anychar char / COMMA / DDQUOTE / CR / LF / TAB safechar / (safechar *char safechar) safechar / SPACE %x21 / %x23-2B / %x2D-FF

; Literal terminals COMMA = "," EXP = "~" ; Excel protection marker CR = %x0D DQUOTE = %x22 DDQUOTE = %x22 %x22 LF = %x0A SPACE = %x20 TAB = %x09

22

CSV File Format Specification

APPENDIX A
Known issues when using Excel

Data 12345678 123456789 1234567890 12345678901 123456789012 1234567890123

Excel Display 12345678 1.23E+08 1.23E+09 1.23E+10 1.23E+11 1.23E+12

Excel Save 12345678 123456789 1234567890 12345678901 1.23457E+11 1.23457E+12

Re-loaded Image 12345678 123456789 1234567890 12345678901 123457000000 1234570000000

23

CSV-1203

Data 0 00 01234123456 01234 123456 " 0" 0 "0 " 0 0001 "0001" "1.0000" 1.0000 "1,200.00" "1,200.00" "$1,200.00" ~001.0000

Excel Display 0 0 1234123456 01234 123456 0 0 0 0 1 1 1 1 1,200.00 1,200.00 $1,200.00 ~001.0000

Excel Edit Bar 0 0 1234123456 01234 123456 0 0 0 0 1 1 1 1 1200 1200 $1,200.00 ~001.0000

Excel Save 0 0 1234123456 01234 123456 0 0 0 0 1 1 1 1 "1,200.00" "1,200.00" "$1,200.00" ~001.0000

="000012.030000"

Data =1/0 =3/A =13/12 ~=13/12

Excel Display #DIV/0! #NAME? 1.083333 ~=13/12

Excel Edit Bar =1/0 =3/A =13/12 ~=13/12

Excel Save #DIV/0! #NAME? 1.083333333 ~=13/12

Re-loaded Image #DIV/0! #NAME? 1.083333333 ~=13/12

24

CSV File Format Specification

Data 0/1 1/1 12/12 13/12 32/12 12/13 12/30 ~1/1

Excel Display 0/1 01-Jan 12-Dec 13-Dec 32/12 Dec-13 Dec-30 ~1/1

Excel Edit Bar 0/1 01-01-2012 12-12-2012 13-12-2012 32/12 01-12-2013 01-12-1930 ~1/1

Excel Save 0/1 01-Jan 12-Dec 13-Dec 32/12 Dec-13 Dec-30 ~1/1

Re-loaded Image 0/1 01-Jan 12-Dec 13-Dec 32/12 Dec-13 Dec-30 ~1/1

ID

filename

filename

25

CSV-1203

26

CSV File Format Specification

APPENDIX B
US-ASCII character chart

Char (nul) (soh) (stx) (etx) (eot) (enq) (ack) (bel) (bs) (ht) (lf) (vt) (ff) (cr) (so) (si) (dle) (dc1) (dc2) (dc3) (dc4) (nak) (syn) (etb) (can) (em) (sub) (esc) (fs) (gs) (rs) (us)

Dec 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Hex 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17 0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F

Char (sp) ! " # $ % & ' ( ) * + , . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

Dec 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

Hex 0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27 0x28 0x29 0x2A 0x2B 0x2C 0x2D 0x2E 0x2F 0x30 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 0x39 0x3A 0x3B 0x3C 0x3D 0x3E 0x3F

Char @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _

Dec 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

Hex 0x40 0x41 0x42 0x43 0x44 0x45 0x46 0x47 0x48 0x49 0x4A 0x4B 0x4C 0x4D 0x4E 0x4F 0x50 0x51 0x52 0x53 0x54 0x55 0x56 0x57 0x58 0x59 0x5A 0x5B 0x5C 0x5D 0x5E 0x5F

Char ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ (del)

Dec 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

Hex 0x60 0x61 0x62 0x63 0x64 0x65 0x66 0x67 0x68 0x69 0x6A 0x6B 0x6C 0x6D 0x6E 0x6F 0x70 0x71 0x72 0x73 0x74 0x75 0x76 0x77 0x78 0x79 0x7A 0x7B 0x7C 0x7D 0x7E 0x7F

27

CSV-1203

APPENDIX C
CSV-1203 expressed as an ABNF rule set

; CSV-1203 file syntax <csvfile> csvheader datarecord csvrecord recordpayload fieldcolumn protectedfield unprotectedfield fieldpayload rawfieldpayload = = = = = = = = = = csvheader *datarecord [EOF] csvrecord csvrecord recordpayload EOR fieldcolumn COMMA fieldcolumn *(COMMA fieldcolumn) protectedfield / unprotectedfield DQUOTE [EXP] fieldpayload DQUOTE [EXP] rawfieldpayload 1*anychar safechar / (safechar *char safechar)

; Character collections anychar char EOR safechar = = = = char / COMMA / DDQUOTE / CR / LF / TAB safechar / SPACE CRLF / CR / LF %x21 / %x23-2B / %x2D-FF

; Literal terminals CRLF COMMA EXP CR DQUOTE DDQUOTE EOF LF SPACE TAB = CR LF = "," = "~" = = = = = = = %x0D %x22 %x22 %x22 %x1A %x0A %x20 %x09

; Excel protection marker

28

CSV File Format Specification

APPENDIX V
Version Summary

Date 17-03-2012 07-04-2012 10-04-2012 01-09-2012 09-11-2012 02-02-2013

Notes Initial release. Second draft released. Third draft released. E1. Revision B. Minor improvement to typography used in some diagrams. Documented issue of .CSV files which contain a single data column. E1. Revision C. Addition of new rules 6.3 and 6.4 (EOF Marker) E1. Revision D. Documented two additional operational issues relating to the use of Excel: A.8 - @Twitter names, and A.9 - The surname of "TRUE".

29

CSV-1203

GLOSSARY

30

CSV File Format Specification

31

You might also like