| // Copyright 2015 The Vanadium Authors. All rights reserved. |
| // Use of this source code is governed by a BSD-style |
| // license that can be found in the LICENSE file. |
| |
| // Package vom implements the Vanadium Object Marshaling serialization format. |
| // |
| // Concept: https://vanadium.github.io/concepts/rpc.html#vom |
| // Specification: https://vanadium.github.io/designdocs/vom-spec.html |
| // |
| // VOM supports serialization of all types representable by v.io/v23/vdl, and is |
| // a self-describing format that retains full type information. It is the |
| // underlying serialization format used by v.io/v23/rpc. |
| // |
| // The API is almost identical to encoding/gob. To marshal objects create an |
| // Encoder and present it with a series of values. To unmarshal objects create |
| // a Decoder and retrieve values. The implementation creates a stream of |
| // messages between the Encoder and Decoder. |
| package vom |
| |
| /* |
| TODO: Describe user-defined coders (VomEncode?) |
| TODO: Describe wire format, something like this: |
| |
| Wire protocol. Version 0x80 |
| |
| The protocol consists of a stream of messages, where each message describes |
| either a type or a value. All values are typed. Here's the protocol grammar: |
| VOM: |
| (TypeMsg | ValueMsg)* |
| TypeMsg: |
| -typeID len(WireType) WireType |
| ValueMsg: |
| +typeID primitive // typeobject primitives are represented by their type id |
| | +typeID len(ValueMsg) CompositeV |
| Value: |
| primitive | CompositeV |
| CompositeV: |
| ArrayV | ListV | SetV | MapV | StructV | UnionV | OptionalV | AnyV |
| ArrayV: |
| len Value*len |
| // len is always 0 for array since we know the exact size of the array. This |
| // prefix is to ensure the decoder can distinguish NIL from the array value. |
| ListV: |
| len Value*len |
| SetV: |
| len Value*len |
| MapV: |
| len (Value Value)*len |
| StructV: |
| (index Value)* EOF // index is the 0-based field index and |
| // zero value fields can be skipped. |
| UnionV: |
| index Value // index is the 0-based field index. |
| OptionalV: |
| NIL |
| | Value |
| AnyV: |
| NIL |
| | +typeID Value |
| |
| Wire protocol. Version 0x81 |
| |
| The protocol consists of a stream of messages, where each message describes |
| either a type or a value. All values are typed. Here's the protocol grammar: |
| VOM: |
| (TypeMsg | ValueMsg)* |
| TypeMsg: |
| incompleteFlag? -typeID len(WireType) WireType |
| ValueMsg: |
| +typeID primitive // non-typeobject primitive |
| | +typeID len(RefTypes) typeID* refTypesIndex // typeobject primitive |
| | +typeID len(ValueMsg) CompositeV |
| | +typeID len(RefTypes) typeID* len(ValueMsg) CompositeV // message with typeobject but no any |
| | +typeID len(RefTypes) typeID* len(AnyMsgLens) len(anyMsg)* len(ValueMsg) CompositeV // message with any |
| Value: |
| primitive | CompositeV |
| CompositeV: |
| ArrayV | ListV | SetV | MapV | StructV | UnionV | OptionalV | AnyV |
| ArrayV: |
| len Value*len |
| // len is always 0 for array since we know the exact size of the array. This |
| // prefix is to ensure the decoder can distinguish NIL from the array value. |
| ListV: |
| len Value*len |
| SetV: |
| len Value*len |
| MapV: |
| len (Value Value)*len |
| StructV: |
| (index Value)* EOF // index is the 0-based field index and |
| // zero value fields can be skipped. |
| UnionV: |
| index Value // index is the 0-based field index. |
| OptionalV: |
| NIL |
| | Value |
| AnyV: |
| NIL |
| | refTypesIndex Value |
| |
| Wire protocol. Version 0x82 |
| |
| The protocol consists of a stream of interleaved messages, broken |
| into into atomic chunks. Each message describes either a typed |
| value or a type definition. The grammar for chunked type and value |
| messages takes the following form: |
| |
| VOM: |
| TypeMessage | ValueMessage |
| ValueMessage: |
| TypeId MessageData | |
| WireCtrlValueFirstChunk TypeId MessageData |
| (WireCtrlValueChunk MessageData)* |
| WireCtrlValueLastChunk MessageData | |
| WireCtrlValueFirstChunk TypeId ReferencedTypeIds MessageData TypeMessage* |
| (WireCtrlValueChunk ReferencedTypeIds MessageData TypeMessage*)* |
| WireCtrlValueLastChunk ReferencedTypeIds MessageData |
| TypeMessage: |
| -TypeId MessageData | |
| WireCtrlTypeFirstChunk -TypeId MessageData |
| (WireCtrlTypeChunk MessageData)* |
| WireCtrlValueLastChunk MessageData |
| ReferencedTypeIds: |
| TypeId* |
| |
| The MessageData from each TypeMessage or ValueMessage is concatenated |
| together to form the corresponding TypeMessageBody or ValueMessageBody. |
| In addition, any ReferencedTypeIds that are sent in a value message are |
| concatenated to form the ReferencedTypeLookupTable for that message. |
| Here is the grammar for the contents: |
| ValueMessageBody: |
| primitive | len CompositeV |
| TypeMessageBody: |
| WireType (handled as a Value) |
| Value: |
| primitive | CompositeV |
| CompositeV: |
| ArrayV | ListV | SetV | MapV | StructV | UnionV | OptionalV | AnyV |
| ArrayV: |
| len Value*len |
| // len is always 0 for array since we know the exact size of the array. This |
| // prefix is to ensure the decoder can distinguish NIL from the array value. |
| ListV: |
| len Value*len |
| SetV: |
| len Value*len |
| MapV: |
| len (Value Value)*len |
| StructV: |
| (index Value)* EOF // index is the 0-based field index and |
| // zero value fields can be skipped. |
| UnionV: |
| index Value // index is the 0-based field index. |
| OptionalV: |
| NIL |
| | Value |
| AnyV: |
| NIL |
| | Index into ReferencedTypeLookupTable. |
| |
| TODO(toddw): We need the message lengths for fast binary->binary transcoding. |
| |
| The basis for the encoding is a variable-length unsigned integer (var128), with |
| a max size of 128 bits (16 bytes). This is a byte-based encoding. The first |
| byte encodes values 0x00...0x7F verbatim. Otherwise it encodes the length of |
| the value, and the value is encoded in the subsequent bytes in big-endian order. |
| In addition we have space for 112 control entries. |
| |
| The var128 encoding tries to strike a balance between the coding size and |
| performance; we try to not be overtly wasteful of space, but still keep the |
| format simple to encode and decode. |
| |
| First byte of var128: |
| |7|6|5|4|3|2|1|0| |
| |---------------| |
| |0| Single value| 0x00...0x7F Single-byte value (0...127) |
| ----------------- |
| |1|0|x|x|x|x|x|x| 0x80...0xBF Control1 (64 entries) |
| |1|1|0|x|x|x|x|x| 0xC0...0xDF Control2 (32 entries) |
| |1|1|1|0|x|x|x|x| 0xE0...0xEF Control3 (16 entries) |
| |1|1|1|1| Len | 0xF0...0xFF Multi-byte length (FF=1 byte, FE=2 bytes, ...) |
| ----------------- (i.e. the length is -Len) |
| |
| The encoding of the value and control entries are all disjoint from each other; |
| each var128 can hold either a single 128 bit value, or 4 to 6 control bits. The |
| encoding favors small values; values less than 0x7F and control entries are all |
| encoded in one byte. |
| |
| The primitives are all encoded using var128: |
| o Unsigned: Verbatim. |
| o Signed : Low bit 0 for positive and 1 for negative, and indicates whether |
| to complement the other bits to recover the signed value. |
| o Float: Byte-reversed ieee754 64-bit float. |
| o Complex: Two floats, real and imaginary. |
| o String: Byte count followed by uninterpreted bytes. |
| |
| Controls are used to represent special properties and values: |
| 0xE0 // NIL - represents any(nil), a non-existent value. |
| 0xEF // EOF - end of fields, e.g. used for structs |
| ... |
| TODO(toddw): Add a flag indicating there is a local TypeID table for Any types. |
| |
| The first byte of each message takes advantage of the var128 flags and reserved |
| entries, to make common encodings smaller, but still easy to decode. The |
| assumption is that values will be encoded more frequently than types; we expect |
| values of the same type to be encoded repeatedly. Conceptually the first byte |
| needs to distinguish TypeMsg from ValueMsg, and also tell us the TypeID. |
| |
| First byte of each message: |
| |7|6|5|4|3|2|1|0| |
| |---------------| |
| |0|0|0|0|0|0|0|0| Reserved (1 entry 0x00) |
| |0|1|x|x|x|0|0|0| Reserved (8 entries 0x40, 48, 50, 58, 60, 68, 70, 78) |
| |0|0|1|x|x|0|0|0| Reserved (4 entries 0x20, 28, 30, 38) |
| |0|0|0|1|0|0|0|0| TypeMsg (0x10, TypeID encoded next, then WireType) |
| |0|0|0|0|1|0|0|0| ValueMsg bool false (0x08) |
| |0|0|0|1|1|0|0|0| ValueMsg bool true (0x18) |
| |0| StrLen|0|1|0| ValueMsg small string len (0...15) |
| |0| Uint |1|0|0| ValueMsg small uint (0...15) |
| |0| Int |1|1|0| ValueMsg small int (-8...7) |
| |0| TypeID |1| ValueMsg (6-bit built-in TypeID) |
| ----------------- |
| |1|0| TypeID | ValueMsg (6-bit user TypeID) |
| |1|1|0| Resv | Reserved (32 entries 0xC0...0xDF) |
| |1|1|1|0| Flag | Flag (16 entries 0xE0...0xEF) |
| |1|1|1|1| Len | Multi-byte length (FF=1 byte, FE=2 bytes, ..., F8=8 bytes) |
| ----------------- (i.e. the length is -Len) |
| |
| If the first byte is 0x10, this is a TypeMsg, and we encode the TypeID next, |
| followed by the WireType. The WireType is simply encoded as a regular value, |
| using the protocol described in the grammar above. |
| |
| Otherwise this is a ValueMsg. We encode small bool, uint and int values that |
| fit into 4 bits directly into the first byte, along with their TypeID. For |
| small strings with len <= 15, we encode the length into the first byte, followed |
| by the bytes of the string value; empty strings are a single byte 0x02. |
| |
| The first byte of the ValueMsg also contains TypeIDs [0...127], where the |
| built-in TypeIDs occupy [0...63], and user-defined TypeIDs start at 64. |
| User-defined TypeIDs larger than 127 are encoded as regular multi-byte var128. |
| |
| TODO(toddw): For small value encoding to be useful, we'll want to use it for all |
| values that can fit, but we'll be dropping the sizes of int and uint, and the |
| type names. Now that Union is labeled, the only issue is Any. And now that we |
| have Signature with type information, maybe we can drop the type names |
| regularly, and only send them when the Signature says "Any". This also impacts |
| where we perform value conversions - does it happen on the server or the client? |
| |
| */ |