CIS Serializer

Reviewed documentation of Compact Instruction Sequences v1

This is a description on how to create and interpret serialized data. It is not specific to any implementation.

Repository with my original Python implementation from 2019

Sequences - introduction

Each sequence represents one dictionary-like mapping object. It has an ID and a set of named parameters. Sequence is not translated into byte strings using ASCII or any other encoding method. Instead, it is built by instructions, each being 8 bits long.

Instructions - introduction

There are 4 types of instructions. The type is encoded in 2 first bits. The remaining 6 bits contain specific value (a kind of parameter; being 6 bits long it can range from 0 to 63). For convenience, some types and their most common values are assigned 3-letter tags. This is useful later in the documentation and helps standardize different implementations.

Instructions - structural

Structural type includes instructions that delimit different parts of the sequence. This type also includes number modifier instructions.

This type of instruction starts with 00

TagFull nameValueInstructionDescription
BGNBegin000 000000Sequence structure: start of the sequence
ENDEnd100 000001Sequence structure: end of the sequence
PRMParameter300 000011Sequence structure: start of the parameter name
VALValue400 000100Sequence structure: start of the parameter value
NEGNegative500 000101Mark the following number as negative
DOTDecimal600 000110Decimal sign for the numbers

Instructions - meta

Meta instructions convey information about whether and how a sequence was received. They are to be processed on their own and ignored when building sequences.

This type of instruction starts with 01 (64)

TagFull nameValueInstructionDescription
VLDValid001 000000Received a valid sequence
INVInvalid201 000010Received an invalid sequence
CRTCritical701 000111Critical error, possibly unrelated to the transfer

Instructions - letter

LTR

To encode text one has to build it from letters (and a limited set of characters). Each instruction of this type contains a single letter. Uninterrupted string of these instructions creates a word.

This type of instruction starts with 11 (192)

ContentValueInstruction
a011 000000
z2511 011001
A2611 011010
Z5111 110011
'_'5211 110100
'.'5311 110101
','5411 110110
':'5511 110111
'!'5611 111000
'@'5711 111001
'#'5811 111010
'$'5911 111011
'%'6011 111100
'^'6111 111101
'&'6211 111110
'*'6311 111111

Instructions - digit

DGT

Numbers can be created using digits. Each instruction of type conveys a base 64 digit (because each value has 6 bits). One instruction on its own can be an intiger from 0 to 63. Instruction value is the same as the digit value.

This type of instruction starts with 10 (128)

ContentValueInstruction
0010 000000
1110 000001
636310 111111

Building sequences - words

A word is an unbroken string of instruction of letter type (LTR).

Building sequences - numbers

Numbers are created with (base 64) digits and structural instructions "Negative" and "Decimal". See examples below.

NumberSequence
1DGT 1
-40NEGDGT 40
66 (64+2)DGT 1DGT 2
158 (128+30)DGT 2DGT 30
0.75 (48/64)DOTDGT 48
2.00073... (2 + 3/4096)DGT 2DOTDGT 0DGT 3

Building sequences - structure

For a sequence to be valid it has to have both delimiting tags and an ID. Its ID must be a word. The example below creates a sequence with ID "Hello"

BGNLTR HLTR eLTR lLTR lLTR oEND

Which would look like this if specific values were shown instead of tags.

00000000110001111100010011001011110010111100111000000001

or in hexadecimal

00C7C4CBCBCE01

From now in I will merge repeated letter and digit instructions for easier viewing, like below.

BGNLTR "Hello"END

This however does not convey too much information and is inefficient on its own. The point of this project is serialization of mappings. To add map entries, parameters are to be used. All parameters need a name and a value. The name has to be a word. Structural instructions are used to mark start of both name and value. Any mapping where keys are strings and values are either strings or numbers can be encoded. Just keep in mind the limited set of characters.

The example below creates a sequence with ID "Greet" and a parameter named "who" with value "World"

BGNLTR "Greet"PRMLTR "who"VALLTR "World"END

Parameter value can be either a word or a number and any number of them can be added. The example below creates a sequence with id "Shopping", text parameter "where" with value "bakery" and a numerical parameter "buns" with value 6

BGNLTR "Shopping"PRMLTR "where"VALLTR "bakery"PRMLTR "buns"VALDGT 6END

Limitations

  1. Limited set of characters

    It is not possible to encode full UTF-8 (or even full ASCII) with this method.

  2. No support for arrays and nesting

    One can not encode an array (or any other collection). Also it is not possible to nest a sequence inside another.

  3. Fractions precision

    It might not be very clear how many digits are required to encode a fraction. After decimal point each digit multiplies the precision by 64. 1 digit after the decimal point can encode all multiplies of 1/64. (2 can encode all multiplies of 1/4096 and 3 all multiplies of 1/262144 and so on)