Types and conversions

Chr.Avro supports mapping .NET’s built-in types, as well as commonly used types like DateTime and Uri, to Avro schemas. This document is a comprehensive explanation of how that mapping works.

The serializer builder and deserializer builder generally throw AggregateException when a type can’t be mapped to a schema. Other exceptions, usually OverflowException and FormatException, are thrown when errors occur during serialization or deserialization.

Arrays

Avro specifies an "array" type for variable-length lists of items. Chr.Avro can map a .NET type to "array" if any of the following is true:

  1. The type is a one-dimensional or jagged array. Multi-dimensional arrays are currently not supported because they can’t be deserialized reliably.

  2. The type is an ArraySegment<> type, a Collection<> type, or a generic collection type from System.Collections.Generic or System.Collections.Immutable.

  3. The type implements IEnumerable<> (for serialization) and has a constructor with a single IEnumerable<> parameter (for deserialization).

Some examples:

.NET typeSerializableDeserializableNotes
int[]int[] is a one-dimensional array type.
int[,,]🚫🚫int[,,] is a multi-dimensional array type.
int[][]int[][] is a jagged array type.
IEnumerable<int>IEnumerable<int> is a generic collection type.
ISet<int>ISet<int> is a generic collection type.
List<int>List<int> is a generic collection type.
List<int[]>List<int[]> is a generic collection type and int[] is an array type.
ImmutableQueue<int>ImmutableQueue<int> is a generic collection type.
Array🚫🚫Array isn’t a generic type, so Chr.Avro can’t determine how to handle its items.

Booleans

Chr.Avro maps Avro’s "boolean" primitive type to the .NET bool type. No implicit conversions exist between bool and other types (see the .NET docs), so no other mappings are supported.

Byte arrays

In addition to byte[], Chr.Avro supports mapping the following types to "bytes" and "fixed":

.NET typeNotes
GuidThe Guid.ToByteArray method is used for serialization, and the Guid constructor is used for deserialization. ArgumentException is thrown when the length is not 16.

Dates and times

The Avro spec defines six logical types for temporal data:

  • calendar dates with no time or time zone ("date")
  • duration comprised of months, days, and milliseconds ("duration")
  • times of day with no date or time zone ("time-millis" and "time-micros")
  • instants in time ("timestamp-millis" and "timestamp-micros")

In addition to the conversions described later in this section, these logical types can be treated as their underlying primitive types:

var schema = new MillisecondTimestampSchema();
  var serializer = new BinarySerializerBuilder().BuildSerializer<DateTime>();
  var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<long>();

  var epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
  deserializer.Deserialize(serializer.Serialize(epoch)); // 0L

.NET doesn’t include any types that match calendar date or time of day semantics, so no special mappings are provided for "date", "time-millis", and "time-micros".

Durations

.NET’s TimeSpan struct is commonly used to represent durations. Under the hood, a TimeSpan stores its value as a number of ticks (1 tick = 100 ns = 0.0001 ms). However, "duration" values can’t be reliably converted to a number of ticks; there isn’t a consistent number of milliseconds in a day or a consistent number of days in a month. Also, all three components are represented as unsigned integers, so negative durations cannot be expressed.

In light of those incompatibilities, Chr.Avro prefers to map TimeSpans to ISO 8601 strings, avoiding "duration" when building schemas. XmlConvert.ToString and XmlConvert.ToTimeSpan are used for conversions. FormatException is thrown when a string cannot be parsed.

Serializing and deserializing "duration" values still work, though there are some limitations. .NET assumes a consistent number of milliseconds in a day, so Chr.Avro supports the day and millisecond components. (This may lead to minor arithmetic inconsistencies with other platforms.) All non-negative TimeSpans can be serialized without using the months component. OverflowException is thrown when serializing a negative TimeSpan and when deserializing a value with a non-zero months component.

Timestamps

Both DateTime and DateTimeOffset can be used to represent timestamps. Chr.Avro prefers to map those types to ISO 8601 strings, avoiding "timestamp-millis" and "timestamp-micros" when building schemas. This behavior is consistent with how durations are handled, and it also means that DateTime kind and timezone are retained—the round-trip (“O”, “o”) format specifier is used for serialization. FormatException is thrown when a string cannot be parsed.

Serializing and deserializing "timestamp-millis" and "timestamp-micros" values are supported as well, with a few caveats:

  • All DateTimes are converted to UTC. Don’t use DateTimes with kind unspecified.
  • .NET date types are tick-precision, so serializing to "timestamp-millis" or deserializing from "timestamp-micros" may result in a loss of precision.

Enums

Chr.Avro maps .NET enumerations to Avro’s "enum" type by matching each symbol on the schema to an enumerator on the enumeration according to these rules:

  • Enumerator names don’t need to be an exact match—all non-alphanumeric characters are stripped and comparisons are case-insensitive. For example, a PRIMARY_RESIDENCE symbol will match enumerators named PrimaryResidence, primaryResidence, etc.

  • When the serializer builder and deserializer builder find multiple matching enumerators, AggregateException is thrown.

  • When the deserializer builder can’t find a matching enumerator, AggregateException is thrown.

By default, Chr.Avro also honors data contract attributes if a DataContractAttribute is present on the enumeration. In that case, if Value is set on an enumerator, the custom value must match the symbol exactly. If it’s not set, the enumerator name will be compared inexactly as described above.

To change or extend this behavior, implement ITypeResolver or extend one of the existing resolvers (ReflectionResolver and DataContractResolver).

Because enum types are able to be implicitly converted to and from integral types, Chr.Avro can map any integral type to "enum" as well.

Maps

Avro’s "map" type represents a map of keys (assumed to be strings) to values. Chr.Avro can map a .NET type to "map" if any of the following is true:

  1. The type is a generic dictionary type from System.Collections.Generic or System.Collections.Immutable.

  2. The type implements IEnumerable<KeyValuePair<,>> (for serialization) and has a constructor with a single IEnumerable<KeyValuePair<,>> parameter (for deserialization).

Additionally, because Avro map keys are assumed to be strings, serializers and deserializers are built for key types by mapping to "string" implicitly.

Some examples of this behavior:

.NET typeSerializableDeserializableNotes
IDictionary<string, int>IDictionary<string, int> is a generic dictionary type.
Dictionary<string, int>Dictionary<string, int> is a generic dictionary type.
IDictionary<Guid, int>IDictionary<Guid, int> is a generic dictionary type, and Guid can be mapped to "string".
IDictionary<byte[], int>🚫🚫IDictionary<byte[], int> is a generic dictionary type, but byte[] cannot be mapped to "string".
IEnumerable<KeyValuePair<string, int>>IEnumerable<KeyValuePair<string, int>> is recognized as a generic dictionary type.
ICollection<KeyValuePair<string, int>>ICollection<KeyValuePair<string, int>> is recognized as a generic dictionary type.
ImmutableSortedDictionary<string, int>ImmutableSortedDictionary<string, int> is a generic dictionary type.
IEnumerable<ValueTuple<string, int>>🚫🚫IEnumerable<ValueTuple<string, int>> is not recognized as a generic dictionary type.

Numbers

The Avro spec defines four primitive numeric types:

  • 32-bit signed integers ("int")
  • 64-bit signed integers ("long")
  • single-precision (32-bit) floating-point numbers ("float")
  • double-precision (64-bit) floating-point numbers ("double")

It also defines a logical "decimal" type that supports arbitrary-precision decimal numbers.

Integral types

When generating a schema, Chr.Avro maps integral types less than or equal to 32 bits to "int" and integral types greater than 32 bits to "long":

.NET typeRangeBitsGenerated schema
sbyte−128 to 1278
"int"
byte0 to 2558
"int"
short−32,768 to 32,76716
"int"
ushort0 to 65,53516
"int"
char0 (U+0000) to 65,535 (U+ffff)16
"int"
int−2,147,483,648 to 2,147,483,64732
"int"
uint0 to 4,294,967,29532
"int"
long−9,223,372,036,854,775,808 to 9,223,372,036,854,775,80764
"long"
ulong0 to 18,446,744,073,709,551,61564
"long"

Whether a schema is "int" or "long" has no impact on serialization. Integers are zig-zag encoded, so they take up as much space as they need. For that reason, Chr.Avro imposes no constraints on which numeric types can be serialized or deserialized to "int" or "long"—if a conversion exists, the binary serializer and deserializer will use it.

Non-integral types

On the non-integral side, .NET types are mapped to their respective Avro types:

.NET typeApproximate rangePrecisionGenerated schema
float±1.5 × 10−45 to ±3.4 × 1038~6–9 digits
"float"
double±5.0 × 10−324 to ±1.7 × 10308~15–17 digits
"double"
decimal±1.0 × 10−28 to ±7.9228 × 102828–29 significant digits
{
    "type": "bytes",
    "logicalType": "decimal",
    "precision": 29,
    "scale": 14
  }

Generally speaking, it’s a good idea to fit the precision and scale of a decimal schema to a specific use case. For example, air temperature measurements in ℉ might have a precision of 4 and a scale of 1. (29 and 14, the schema builder defaults, were selected to fit any .NET decimal.) Decimal values are resized to fit the scale specified by the schema—when serializing, digits may be truncated; when deserializing, zeros may be added.

Caveats

Because the serializer and deserializer rely on predefined conversions, the remarks from the C# numeric conversions table are relevant. Notably:

  • Conversions may cause a loss of precision. For instance, if a "double" value is deserialized into a float, the value will be rounded to the nearest float value:

    var schema = new DoubleSchema();
      var serializer = new BinarySerializerBuilder().BuildSerializer<double>(schema);
      var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<float>(schema);
    
      var e = Math.E; // 2.71828182845905
      var bytes = serializer.Serialize(e);
    
      deserializer.Deserialize(bytes); // 2.718282

    See the .NET type conversion tables for a complete list of conversions.

  • OverflowException is thrown when a conversion fails during serialization or deserialization.

    When a value is out of the range of a numeric type:

    var schema = new IntSchema();
      var serializer = new BinarySerializerBuilder().BuildSerializer<int>(schema);
      var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<short>(schema);
    
      var bytes = serializer.Serialize(int.MaxValue);
    
      deserializer.Deserialize(bytes); // throws OverflowException

    When special floating-point values are deserialized to decimal:

    var schema = new FloatSchema();
      var serializer = new BinarySerializerBuilder().BuildSerializer<float>(schema);
      var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<decimal>(schema);
    
      var bytes = serializer.Serialize(float.NaN);
    
      deserializer.Deserialize(bytes); // throws OverflowException

    Finally, when a serialized integer is too large to deserialize:

    var schema = new LongSchema();
      var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<long>(schema);
    
      var bytes = new byte[]
      {
          0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x01
      };
    
      deserializer.Deserialize(bytes); // throws OverflowException

Records

Chr.Avro maps .NET classes and structs to Avro’s "record" type by attempting to find a matching constructor. The rules:

  • Parameter names don't need to match the schema exactly—all non-alphanumeric characters are stripped and comparisons are case-insensitive. So, for example, a record field named addressLine1 will match parameters named AddressLine1, AddressLine_1, ADDRESS_LINE_1, etc.

  • Parameters must have exactly 1 match for each record field.

  • There may be additional optional parameters.

If no matching constructors are found then it will attempt to match each record field to a field or property on the type. The rules:

  • Type member names don’t need to match the schema exactly—all non-alphanumeric characters are stripped and comparisons are case-insensitive. So, for example, a record field named addressLine1 will match type members named AddressLine1, AddressLine_1, ADDRESS_LINE_1, etc.

  • When the serializer builder and deserializer builder find multiple matching type members, AggregateException is thrown.

  • When the serializer builder can’t find a matching type member, AggregateException is thrown. When the deserializer can’t find a matching type member, the field is ignored.

  • The deserializer builder throws AggregateException if a type doesn’t have a parameterless public constructor.

By default, Chr.Avro also honors data contract attributes if a DataContractAttribute is present on the type. In that case, two additional rules apply:

  • All type members without a DataMemberAttribute are ignored.

  • If Name is set, the custom name must match the record field name exactly. If it’s not set, the type member name will be compared inexactly as described above.

To change or extend this behavior, implement ITypeResolver or extend one of the existing resolvers (ReflectionResolver and DataContractResolver).

Strings

In addition to string, Chr.Avro supports mapping the following types to "string":

.NET typeNotes
DateTimeValues are expressed as strings according to ISO 8601. See the dates and times section for details.
DateTimeOffset
TimeSpan
GuidThe Guid.ToString method is used for serialization, and the Guid constructor is used for deserialization. FormatException is thrown when a string cannot be parsed.
UriThe Uri.ToString method is used for serialization, and the Uri constructor is used for deserialization. FormatException is thrown when a string cannot be parsed.

Unions

Chr.Avro maps Avro unions to .NET types according to these rules:

  • Unions must contain more than one schema. Avro doesn’t explicitly disallow empty unions, but they can’t be serialized or deserialized.

  • When mapping a union schema to a type for serialization, the type must be able to be mapped to one of the non-"null" schemas in the union (if there are any).

  • When mapping a union schema to a type for deserialization, the type must be able to be mapped to all of the schemas in the union.

So, for example:

Schema.NET typeSerializableDeserializableNotes
[]
🚫🚫Empty unions are not supported.
["int"]
intint could be serialized and deserialized as "int".
["null"]
int🚫int could be serialized as "null", but it couldn’t be deserialized as "null".
["int","string"]
int🚫int could be serialized as "int", but it couldn’t be deserialized as "string".
["null","int"]
int🚫int could be serialized as "int", but it couldn’t be deserialized as "null".
["null","int"]
Nullable<int>Nullable<int> could be serialized and deserialized as either "null" or "int".
Chr.Avro