Implementation differences
Chr.Avro was created as a flexible alternative to Apache’s C# Avro implementation. This document explains the rationale for the creation of an entirely new library and outlines some of the differences between Chr.Avro and other implementations.
Schema representation
The main architectural difference between Chr.Avro and other implementations is its abstract schema representation. One of the main drawbacks of the Apache implementation is that its schema respresentations are inextricably bound to JSON. The Parse
method on the Schema
class is the only publicly exposed factory method, which means that it’s impossible to manipulate a schema without manipulating JSON. Chr.Avro keeps its abstract, binary, and JSON components entirely separate.
Development activity
The Apache implementation is minimally maintained. Small changes are contributed occasionally; the last major changes were years ago. There doesn’t appear to be any appetite for major changes. Microsoft.Hadoop.Avro (Microsoft.Avro.Core?) was the only other open source Avro implementation for .NET. It’s been abandoned since 2016.
Type mapping
The Microsoft implementation made it extremely easy to map Avro records to existing .NET classes, something that Chr.Avro has aimed to imitate. The Apache implementation does not map to existing classes. Instead, users are given a choice between two less flexible options:
-
Use the
GenericRecord
class, essentially an untyped dictionary. This approach offers no compile-time guarantees. -
Use the avrogen tool to generate classes that implement
ISpecificRecord
. While the generated classes offer some compile-time safety, the process is cumbersome, and additional work usually has to be done to map the generated classes to actual model classes.
Undefined behaviors
The Avro specification leaves certain behaviors undefined, and in some cases Chr.Avro implements them differently than other libraries. None of these differences are correctness issues—all serialized payloads are correct, and all correct payloads can be deserialized.
Block sizes
Avro encodes arrays and maps as a series of blocks terminated by an empty block. For example, an array of length 20 could be encoded as 4 blocks with lengths 6, 10, 4, and 0. Chr.Avro doesn’t make any effort to break arrays and maps into chunks; instead, it always encodes all non-empty arrays and maps as two blocks (the first full-length, the second zero-length). This is consistent with most other implementations.
Invalid boolean values
Avro specifies that booleans should be encoded as a single byte: 0x00
(false) or 0x01
(true). If a value greater than 0x01
is encountered, Chr.Avro decodes the value as true.
The Apache Java implementation decodes all non-0x01
values as false. The Apache C# implementation throws an exception if a value other than 0x00
or 0x01
is encountered.