Comparison of data-serialization formats explained

This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

Overview

NameCreator-maintainerBased onStandardized?SpecificationBinary?Human-readable?Supports references?Schema-IDL?Standard APIsSupports zero-copy operations
Apache AvroApache Software FoundationApache Avro™ SpecificationC, C#, C++, Java, PHP, Python, Ruby
Apache ParquetApache Software FoundationApache ParquetJava, Python, C++
Apache ThriftFacebook (creator)
Apache (maintainer)
Original whitepaperC++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, Delphi and other languages[1]
ASN.1ISO, IEC, ITU-TISO/IEC 8824 / ITU-T X.680 (syntax) and ISO/IEC 8825 / ITU-T X.690 (encoding rules) series. X.680, X.681, and X.683 define syntax and semantics.
BencodeBram Cohen (creator)
BitTorrent, Inc. (maintainer)
Part of BitTorrent protocol specification
BSONMongoDBJSONBSON Specification
Cap%27n ProtoKenton VardaCap'n Proto Encoding Spec
CBORCarsten Bormann, P. HoffmanMessagePack[2] RFC 8949,
through tagging
Comma-separated values (CSV)RFC author:
Yakov Shafranovich
RFC 4180
(among others)
Common Data Representation (CDR)Object Management GroupGeneral Inter-ORB ProtocolAda, C, C++, Java, Cobol, Lisp, Python, Ruby, Smalltalk
D-Bus Message Protocolfreedesktop.orgD-Bus Specification
(Signature strings)
Efficient XML Interchange (EXI)W3CXML, Efficient XMLEfficient XML Interchange (EXI) Format 1.0
Extensible Data Notation (edn)Rich Hickey / Clojure communityClojureOfficial edn specClojure, Ruby, Go, C++, Javascript, Java, CLR, ObjC, Python[3]
FlatBuffersGoogleFlatbuffers GitHub
(internal to the buffer)
C++, Java, C#, Go, Python, Rust, JavaScript, PHP, C, Dart, Lua, TypeScript
Fast InfosetISO, IEC, ITU-TXMLITU-T X.891 and ISO/IEC 24824-1:2007
FHIRHealth Level 7REST basicsFast Healthcare Interoperability ResourcesHapi for FHIR[4] JSON, XML, Turtle
IonAmazonJSONThe Amazon Ion SpecificationC, C#, Go, Java, JavaScript, Python, Rust
Java serializationOracle CorporationJava Object Serialization
JSONDouglas CrockfordJavaScript syntaxSTD 90/RFC 8259
(ancillary:
RFC 6901,
RFC 6902), ECMA-404, ISO/IEC 21778:2017
, but see BSON, Smile, UBJSON
(JSON Schema Proposal, ASN.1 with JER, Kwalify, Rx, JSON-LD

(Clarinet, JSONQuery / RQL, JSONPath), JSON-LD
MessagePackSadayuki FuruhashiJSON (loosely)MessagePack format specification
NetstringsDan Bernsteinnetstrings.txt
OGDLRolf VeenSpecification
OPC-UA BinaryOPC Foundationopcfoundation.org
OpenDDLEric LengyelC, PHPOpenDDL.org
PHP serialization formatPHP Group
Pickle (Python)Guido van RossumPythonPEP 3154 – Pickle protocol version 4[5]
Property listNeXT (creator)
Apple (maintainer)
Public DTD for XML formatCocoa, CoreFoundation, OpenStep, GnuStep
Protocol Buffers (protobuf)GoogleDeveloper Guide: Encoding, proto2 specification, and proto3 specificationC++, Java, C#, Python, Go, Ruby, Objective-C, C, Dart, Perl, PHP, R, Rust, Scala, Swift, Julia, Erlang, D, Haskell, ActionScript, Delphi, Elixir, Elm, Erlang, GopherJS, Haskell, Haxe, JavaScript, Kotlin, Lua, Matlab, Mercurt, OCaml, Prolog, Solidity, Typescript, Vala, Visual Basic
John McCarthy (original)
Ron Rivest (internet draft)
Lisp, Netstrings"S-Expressions" Internet Draft, canonical representation, advanced transport representation
SmileTatu SalorantaJSONSmile Format Specification
(JSON Schema Proposal, other JSON schemas/IDLs)

(via JSON APIs implemented with Smile backend, on Jackson, Python)
SOAPW3CXML
SOAP/1.1
SOAP/1.2

(MTOM,)
Max WildgrubeRFC 3072
UBJSONThe Buzz Media, LLCJSON, BSONubjson.org
eXternal Data Representation (XDR)Sun Microsystems (creator)
IETF (maintainer)
STD 67/RFC 4506
XMLW3CSGML
1.0 (Fifth Edition)
1.1 (Second Edition)

XML-RPCDave Winer[6] XMLXML-RPC Specification
YAMLClark Evans,
Ingy döt Net,
and Oren Ben-Kiki
C, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON[7] Version 1.2
(Kwalify, Rx, built-in language type-defs)
NameCreator-maintainerBased onStandardized?SpecificationBinary?Human-readable?Supports references?Schema-IDL?Standard APIsSupports zero-copy operations

Syntax comparison of human-readable formats

FormatNullBoolean trueBoolean falseIntegerFloating-pointStringArrayAssociative array/Object
ASN.1
(XML Encoding Rules)
<foo>true</foo><foo>false</foo><foo>685230</foo><foo>6.8523015e+5</foo> true -42.1e7 A to Z We said, "no".An object (the key is a field name): true 1.85 Bob Peterson

A data mapping (the key is a data value): John 3.14 Jane 2.718

CSVnull
(or an empty element in the row)
1
true
0
false
685230
-685230
6.8523015e+5
{{nobr|"We said, ""no""."}}
true,,-42.1e7,"A to Z"
42,1
A to Z,1,2,3
ednniltruefalse685230
-685230
6.8523015e+5"A to Z", "A \"up to\" Z"[true nil -42.1e7 "A to Z"]{:kw 1, "42" true, "A to Z" [1 2 3]}
FormatNullBoolean trueBoolean falseIntegerFloating-pointStringArrayAssociative array/Object
Ionnull
null.null
null.bool
null.int
null.float
null.decimal
null.timestamp
null.string
null.symbol
null.blob
null.clob
null.struct
null.list
null.sexp
truefalse685230
-685230
0xA74AE
0b111010010101110
6.8523015e5"A to Z"

<nowiki>'''</nowiki><br>A <br>to <br>Z<br><nowiki>'''</nowiki>|[true, null, -42.1e7, "A to Z"]||-| Netstrings| 0:,
4:null,| 1:1,
4:true,| 1:0,
5:false,| 6:685230,| 9:6.8523e+5,| | 29:4:true,0:,7:-42.1e7,6:A to Z,,| |-| JSON| null| true| false| 685230
-685230| 6.8523015e+5| |[true, null, -42.1e7, "A to Z"]||-| OGDL| null| true| false| 685230| 6.8523015e+5| "A to Z"
'A to Z'
NoSpaces|
true
null
-42.1e7
"A to Z"
(true, null, -42.1e7, "A to Z")|
42
  true
"A to Z"
  1
  2
  3
42
  true
"A to Z", (1, 2, 3)
|-! Format! Null! Boolean true! Boolean false! Integer! Floating-point! String! Array! Associative array/Object|-| OpenDDL| ref {null}| bool {true}| bool {false}| int32 {685230}
int32 {0x74AE}
int32 {0b111010010101110}| float {6.8523015e+5}| string {"A to Z"}| Homogeneous array:
int32 {1, 2, 3, 4, 5}

Heterogeneous array:

array
{
    bool {true}
    ref {null}
    float {-42.1e7}
    string {"A to Z"}
}
|
dict
{
    value (key = "42") {bool {true}}
    value (key = "A to Z") {int32 {1, 2, 3}}
}
|-|PHP serialization format|N;|b:1;|b:0;|i:685230;
i:-685230;|d:685230.15;
d:INF;
d:-INF;
d:NAN;|s:6:"A to Z";|a:4:{i:0;b:1;i:1;N;i:2;d:-421000000;i:3;s:6:"A to Z";}|Associative array:
a:2:{i:42;b:1;s:6:"A to Z";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}
Object:
O:8:"stdClass":2:{s:4:"John";d:3.14;s:4:"Jane";d:2.718;}|-| Pickle (Python)| N.| I01\n.| I00\n.| I685230\n.| F685230.15\n.| S'A to Z'\n.| (lI01\na(laF-421000000.0\naS'A to Z'\na.| (dI42\nI01\nsS'A to Z'\n(lI1\naI2\naI3\nas.|-| Property list
(plain text format)[8] | | <*BY>| <*BN>| <*I685230>| <*R6.8523015e+5>| "A to Z"| (<*BY>, <*R-42.1e7>, "A to Z")|
{
    "42" = <*BY>;
    "A to Z" = (<*I1>, <*I2>, <*I3>);
}
|-| Property list
(XML format)[9] | | <true />| <false />| <integer>685230</integer>| <real>6.8523015e+5</real>| | -42.1e7 A to Z| 42 A to Z 1 2 3 |-| Protocol Buffers| | true| false| 685230
-685230| 20.0855369| {{nobr|"A to Z"}}<br>{{nobr|"sdfff2 \000\001\002\377\376\375"}}<br>{{nobr|"q\tqq<>q2&\001\377"}}|
field1: "value1"
field1: "value2"
field1: "value3
anotherfield {
  foo: 123
  bar: 456
}
anotherfield {
  foo: 222
  bar: 333
}
| thing1: "blahblah"thing2: 18923743thing3: -44thing4 enumeratedThing: SomeEnumeratedValuething5: 123.456[extensionFieldFoo]

"etc"[extensionFieldThatIsAnEnum]

EnumValue|-! Format! Null! Boolean true! Boolean false! Integer! Floating-point! String! Array! Associative array/Object|-| S-expressions| NIL
nil| T
#t
true| NIL
#f
false| 685230| 6.8523015e+5| abc
"abc"
#616263#
3:abc
{MzphYmM=}
<nowiki>|YWJj|</nowiki>| (T NIL -42.1e7 "A to Z")| ((42 T) ("A to Z" (1 2 3)))|-| YAML| ~
null
Null
NULL[10] | y
Y
yes
Yes
YES
on
On
ON
true
True
TRUE[11] | n
N
no
No
NO
off
Off
OFF
false
False
FALSE| 685230
+685_230
-685230
02472256
0x_0A_74_AE
0b1010_0111_0100_1010_1110
190:20:30[12] |6.8523015e+5
685.230_15e+03
685_230.15
190:20:30.15
.inf
-.inf
.Inf
.INF
.NaN
.nan
.NAN[13] | A to Z
"A to Z"
'A to Z'| [y, ~, -42.1e7, "A to Z"]

- y
-
- -42.1e7
- A to Z
| {"John":3.14, "Jane":2.718}
42: y
A to Z: [1, 2, 3]
|-| XML and SOAP| | true| false| 685230| 6.8523015e+5| |true-42.1e7A to Z| true |-| XML-RPC| | <value><boolean>1</boolean></value>| <value><boolean>0</boolean></value>| <value><int>685230</int></value>| <value><double>6.8523015e+5</double></value>| <value><string>A to Z</string></value>| 1 -42.1e7 A to Z | 42 1 A to Z 1 2 3 |}

Comparison of binary formats

FormatNullBooleansIntegerFloating-pointStringArrayAssociative array/object
ASN.1
(BER, PER or OER encoding)
typeMultiple valid types Data specifications (unordered) and (guaranteed order)User definable type
BSON\x0A
(1 byte)
True: \x08\x01
False: \x08\x00
(2 bytes)
int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complementDouble

little-endian binary64

UTF-8-encoded, preceded by int32-encoded string length in bytesBSON embedded document with numeric keysBSON embedded document
Concise Binary Object Representation (CBOR)\xf6
(1 byte)
(1 byte)
Efficient XML Interchange (EXI)
(Unpreserved lexical values format)
xsi:nil is not allowed in binary context.1–2 bit integer interpreted as boolean.Boolean sign, plus arbitrary length 7-bit octets, parsed until most-significant bit is 0, in little-endian. The schema can set the zero-point to any arbitrary number.
Unsigned skips the boolean flag.
Length prefixed integer-encoded Unicode. Integers may represent enumerations or string table entries instead.Length prefixed set of items.
FlatBuffersEncoded as absence of field in parent object(1 byte)Little-endian 2's complement signed and unsigned 8/16/32/64 bitsUTF-8-encoded, preceded by 32-bit integer length of string in bytesVectors of any other type, preceded by 32-bit integer length of number of elementsTables (schema defined types) or Vectors sorted by key (maps / dictionaries)
Ion[14] \x0f\xbx Arbitrary length and overhead. Length in octets.
MessagePack\xc0Typecode (1 byte) + IEEE single/doubleencoding is unspecified[15]
NetstringsLength-encoded as an ASCII string + ':' + data + ','
Length counts only octets between ':' and ','
OGDL Binary
Property list
(binary format)
Protocol BuffersUTF-8-encoded, preceded by varint-encoded integer length of string in bytesRepeated value with the same tag or, for varint-encoded integers only, values packed contiguously and prefixed by tag and total byte length
Smile\x21IEEE single/double, BigDecimalLength-prefixed "short" Strings (up to 64 bytes), marker-terminated "long" Strings and (optional) back-referencesArbitrary-length heterogenous arrays with end-markerArbitrary-length key/value pairs with end-marker
Structured Data eXchange Formats (SDXF)Big-endian signed 24-bit or 32-bit integerBig-endian IEEE doubleEither UTF-8 or ISO 8859-1 encodedList of elements with identical ID and size, preceded by array header with int16 length Chunks can contain other chunks to arbitrary depth.
Thrift

See also

External links

Notes and References

  1. https://thrift.apache.org/ Apache Thrift
  2. Web site: CBOR relationship with msgpack. Carsten. Bormann. . 2018-12-26. 2023-08-14.
  3. Web site: Implementations. .
  4. Web site: HAPI FHIR - The Open Source FHIR API for Java. hapifhir.io.
  5. https://github.com/python/cpython/blob/v3.9.0/Lib/pickle.py#L137-L144 cpython/Lib/pickle.py
  6. Web site: A Brief History of SOAP. www.xml.com.
  7. Web site: YAML Ain't Markup Language (YAML) Version 1.2. Oren . Ben-Kiki . Clark . Evans . Ingy döt . Net. 2009-10-01. The Official YAML Web Site. 2012-02-10.
  8. Web site: NSPropertyListSerialization class documentation. www.gnustep.org. 2009-10-28. https://web.archive.org/web/20110519164921/http://gnustep.org/resources/documentation/Developer/Base/Reference/NSPropertyList.html. 2011-05-19. dead.
  9. Web site: Documentation Archive. developer.apple.com.
  10. Web site: Null Language-Independent Type for YAML Version 1.1 . YAML.org . 2005-01-18 . Oren Ben-Kiki. Clark Evans. Brian Ingerson . 2009-09-12.
  11. Web site: Boolean Language-Independent Type for YAML Version 1.1 . Clark C. Evans . Oren Ben-Kiki. Clark Evans. Brian Ingerson . YAML.org . 2005-01-18 . 2009-09-12.
  12. Web site: Integer Language-Independent Type for YAML Version 1.1 . Clark C. Evans . Oren Ben-Kiki. Clark Evans. Brian Ingerson . YAML.org . 2005-02-11 . 2009-09-12.
  13. Web site: Floating-Point Language-Independent Type for YAML Version 1.1 . Clark C. Evans . Oren Ben-Kiki. Clark Evans. Brian Ingerson . YAML.org . 2005-01-18 . 2009-09-12.
  14. http://amzn.github.io/ion-docs/docs/binary.html Ion Binary Encoding
  15. Web site: MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.: msgpack/msgpack. 2 April 2019. GitHub.