Hessian 2.0

From Resin 3.0

(Difference between revisions)
Jump to: navigation, search
 
Line 166: Line 166:
  
 
   xf5 x01 x02 x03 x04 x05
 
   xf5 x01 x02 x03 x04 x05
 +
 +
== Objects (repeated maps) ==
 +
 +
Map which have a consistent set of fields like objects can be represented by a object defintion/object instance pair.
 +
 +
The object defintion defines the expected type (required), the number of fields, and the field names.  The object definition also includes data for the first object instances:
 +
 +
  'O'
 +
    <int>  type-name  -- length of type encoded as an integer followed by type name
 +
    <int>                    -- number of fields
 +
    (<string>)*            -- strings representing the field names
 +
    (<object>)*            -- object data for the first object instance
 +
 +
The object instance refers to an earlier object definition and then follows with the field data:
 +
 +
'o'
 +
    <int>          -- integer referencing the object definition
 +
    (<object>)*  -- field values

Latest revision as of 21:37, 23 June 2006


The current draft grammar is at Hessian 2.0 Grammar

Hessian 2.0 is in the very early stages. Feedback is welcome. Some data on efficiency vs Java serialization is at [1]forum.caucho.com/.

Snapshots with draft implementations will be available in the Resin 3.0 snapshot at http://www.caucho.com/download.

Contents

Non-Goals for Hessian 2.0

We are not planning on any semantic additions or changes for Hessian 2.0. The current datatype and object model is intended to remain the same.

The only changes planned are extra compact encodings for better serialization and performance.

Goals for Hessian 2.0

Hessian 2.0 will be interoperable with Hessian 2.0

A Hessian 1.0 client can talk to any Hessian 2.0 server and receive a Hessian 1.0 response.

A Hessian 2.0 client can use Hessian 1.0 encoding to a server, but indicate that it can upgrade to Hessian 2.0 encoding.

Small number compression

In Hessian 1.0, all 32-bit integers are encoded in 5 bytes: 'I' b3 b2 b1 b0.

Most integers in actual data tends to be small. "0" is the most common integer value and "1" is the next most common value.

Small integers will be encoded in the single lead-byte, e.g. 0x90 might represent integer 0.

Bytes can be encoded in two bytes, e.g. x51 b0. Shorts encoded in three bytes e.g. x53 b1.

Similarly, small longs will have short encodings, and integer-valued doubles also have short encodings.

Short string compression

In Hessian 1.0, strings have a 3-byte overhead, 'S' b1 b0 data.

Hessian 2.0 will encode small strings with only a 1-byte overhead, e.g.

x25 hello

Object definition and instance

Hessian 1.0 encodes objects as associative arrays, where the keys correspond to fields, e.g.

M t x00 x08 test.Car
    S x00 x05 model
    S x00 x05 Honda

    S x00 x04 make
    S x00 x05 Civic

    S x00 x05 color
    S x00 x03 red
    z

When multiple Car objects are serialized, Hessian 1.0 has unnecessary overhead of duplicating the "test.Car", the "model", the "make", and the "color" strings, even though those fields are unchanged for all Cars.

Hessian 2.0 will have an Object definition/instance, which is equivalent to the above map

O x98 test.Car  -- code and type/class
    x93           -- number of fields encoded as an integer
    xd5 model -- short string
    xd4 make
    xd5 color
    xd5 Honda -- data for first object follows immediately
    xd5 Civic
    xd3 red

A following car would look like:

o x91      -- integer representing defined object
    xda Volkswagen
    xd6 Beetle
    xd4 blue

Encodings in Current Hessian 2.0 Grammar draft

32-bit integers

Direct integers:

0x80 - 0xcf

The codes between 0x80 and 0xcf represent integers between -16 and 63, i.e. code - 0x90. For example, integer zero is represented as

0x90

Bytes, i.e. integers between -128 and 127:

0x01 b0

Shorts, i.e. integers between -32768 and 32767

0x02 b1 b0

The Hessian 1.0 encoding for integers is always available:

 'I' b3 b2 b1 b0

64-bit longs

Direct longs. A single byte representation for the smallest long values. The codes between 0x20 and 0x3f represent 64-bit longs between -16 and 15, e.g. long zero is represented by 0x30

 0x20 - 0x3f

Bytes, i.e. longs between -128 and 127

0x03 b0

Shorts, i.e. longs between -32768 and 32767

0x04 b1 b0

32-bit longs

0x05 b3 b2 b1 b0

The Hessian 1.0 encoding for longs is always available:

 'L' b7 b6 b5 b4 b3 b2 b1 b0

Doubles

Direct values. 0.0 and 1.0 are represented by a single code

0x06 - 0.0
0x07 - 1.0

Single byte integer doubles. Integer values between -127.0 and 128.0 are represented by

0x08 b0

Two byte integer doubles.

0x09 b1 b0

Four byte integer doubles

 0x0b b3 b2 b1 b0

Doubles which are equivalent to floats:

 0x0c b3 b2 b1 b0

Where b3,b2,b1,b0 are the byte-encoding of a floag

Short strings

Strings between length 0 and 31 can have the <type,length> represented by a single byte:

 0xd0 - 0xef

So, "hello, world" would look like:

 0xdc hello, world

Short binary data

Binary data between length 0 and 15 can have the <type,length> represented by a single byte:

 0xf0 - 0xff

e.g. new byte[] { 1, 2, 3, 4, 5};

 xf5 x01 x02 x03 x04 x05

Objects (repeated maps)

Map which have a consistent set of fields like objects can be represented by a object defintion/object instance pair.

The object defintion defines the expected type (required), the number of fields, and the field names. The object definition also includes data for the first object instances:

 'O'
    <int>  type-name  -- length of type encoded as an integer followed by type name
    <int>                     -- number of fields
    (<string>)*             -- strings representing the field names
    (<object>)*            -- object data for the first object instance

The object instance refers to an earlier object definition and then follows with the field data:

'o'
    <int>           -- integer referencing the object definition
    (<object>)*  -- field values
Personal tools