Split Serializer
Split Serializer was created due to frustration with the slow perormance
of JSON serializers. We were attempting to send moderatly complex data
structures to an ActionScript application and noticed that it would pause
for several seconds as it parsed the JSON data.
JSON itself was created to be a lighter-weight version of SOAP and
XMLRPC -- or even just adhoc XML data. It did this by doing away with
the verbose <tags> and replacing them with a JavaScript-like
syntax of "key":"value" pairs and brace-delimited lists and maps. In
order to get simpler than that, we turned the delimination problem on
its head. Instead of having a single set of deliminators and recursively
tracking nested levels, we decided to try varying the deliminator itself
as a tree's depth went down. The end result is that a consumer can
simply split byte arrays (strings) to parse the data.
Some of the principles that Split Serializer uses:
-
Everything is either a string, a map, or a list. We can describe
most data structures with this small set of primitives.
-
Clients know best when to do typing. It is pointless to serialize
an integer, convey that information in the precious bandwidth stream,
have the client deserialize the value to a typed variable, all only to have
it be converted back into a string for display. Type information is
important, but it often needs to be more colorful than the primitives
that JSON enforces, and it is often a waste of time to do it at all. We
let the client decide when it needs to do further typing.
-
There are no nulls. The best you can do is an empty string. Sorry.
-
Objects are maps. The Java serializer uses introspection and
treats objects like maps. The client is free to create "real" objects
out of them, because it knows when it is best to do typing (see above).
People who want very strong typing are going to choose SOAP, XML or RMI
anyway.
-
The delimiter changes, and is bounded. Split Serializer varies the
delimiter as the parse tree gets deeper. The current function it uses
is to decrement an integer value that will be inserted into the byte
stream as primitives are serialized. The parser knows when it has
reached a leaf when splitting yeilds no results. Some of the
consequences of this choice are:
-
The tree has a maximum depth, depending on what character encoding
is chosen, and what range inside that encoding that the developer
chooses to use (default is UTF-8, and characters from ordinal 30
down to 14 -- 17 levels deep).
-
Data cannot contain the delimiters. The serializer will throw
an exception if it does. Much like XML CDATA blocks, or email
attachments, binary data will need to be encoded with something
like BASE64 before it is sent.
-
Currently, the developers can pick the encoding and the range.
This meta information is not encoded in the serialization itself;
a parser will need to know these settings before a stream
can be deserialized. Certain encodings will be required by
different platforms; although web browsers that support HTTP
should all work fine.
-
Bandwidth is precious. Every character is precious, and it
would be best if we could just use one octet to deliminate a pair
of entities. We considered doing compression (gzip stream), but
decided that the transport should do that instead (HTTP compression,
for example).
-
Use the stream as data. We don't have any annoying
double-quoted strings, or open/close braces, so we can actually
use the stream itself as immutable data in the client. Obviously,
this depends on the features of the language, but at least the
encoding itself will not discourage doing so. For example, the Java
implementation has a ByteArray object that wraps a byte[] primitive and
just hands out new ByteArray objects that point to ranges inside it.
We also do not convert the string "123" to the number 123, so we can
use it directly (the client knows best when/if it will need to do the
real type conversion, as per above).
Java Example (from the JUnit test suite for this project):
@Test
public void testPublicBeanObject() throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
SplitSerializer serializer = new SplitSerializer(baos);
BeanObject bo = new BeanObject();
byte[] serialized = serializer.serialize(bo);
baos.close();
SplitDeserializer deserializer = new SplitDeserializer();
ByteArrayInputStream bais = new ByteArrayInputStream(serialized);
Object o = deserializer.deserialize(bais);
bais.close();
assert(o instanceof Map);
assert(bo.matches((Map)o));
}
And a slightly more customized example from the same test suite, using
ASCII encoding and a 55 level limit on tree depth:
@Test
public void HighASCIITest() throws Exception {
TypeObject to = new TypeObject();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
SplitSerializer serializer = new SplitSerializer(254, 200, baos, "ASCII");
serializer.serialize(to);
baos.close();
byte[] serialized = baos.toByteArray();
SplitDeserializer deserializer = new SplitDeserializer((char)254, (char)200, "ASCII");
ByteArrayInputStream bais = new ByteArrayInputStream(serialized);
Object o = deserializer.deserialize(bais);
assert(o instanceof Map);
to.matches((Map)o); // throws exception on error
bais.close();
}
All files are available on the
Source Forge
project page.
Take a look at the javadocs.
Browse all versions here.