Skip to content

Types and conversions

Chr.Avro supports mapping .NET’s built-in types, as well as commonly used types like DateTime and Uri, to Avro schemas. This document is a comprehensive explanation of how that mapping works.

The serializer builder and deserializer builder generally throw UnsupportedTypeException when a type can’t be mapped to a schema. Other exceptions, usually OverflowException and FormatException, are thrown when errors occur during serialization or deserialization.

Arrays

Avro specifies an "array" type for variable-length lists of items. Chr.Avro can map a .NET type to "array" if any of the following is true:

  1. The type is a one-dimensional or jagged array. Multi-dimensional arrays are currently not supported because they can’t be deserialized reliably.
  2. The type is an ArraySegment<T> type, a Collection<T> type, or a generic collection type from System.Collections.Generic or System.Collections.Immutable.
  3. The type implements IEnumerable<T> (for serialization) and has a constructor with a single IEnumerable<T> parameter (for deserialization).

Some examples:

.NET type Serializable Deserializable Notes
int[] int[] is a one-dimensional array type.
int[,,] 🚫 🚫 int[,,] is a multi-dimensional array type.
int[][] int[][] is a jagged array type.
IEnumerable<int> IEnumerable<T> is a generic collection type.
ISet<int> ISet<T> is a generic collection type.
List<int> List<T> is a generic collection type.
List<int[]> List<T> is a generic collection type and int[] is an array type.
ImmutableQueue<int> ImmutableQueue<T> is a generic collection type.
Array 🚫 🚫 Array isn’t a generic type, so Chr.Avro can’t determine how to handle its items.

Booleans

Chr.Avro maps Avro’s "boolean" primitive type to the .NET bool type. No implicit conversions exist between bool and other types (see the .NET docs), so no other mappings are supported.

Byte arrays

In addition to byte[], Chr.Avro supports mapping the following types to "bytes" and "fixed":

.NET type Notes
Guid The Guid.ToByteArray method is used for serialization, and the Guid constructor is used for deserialization. ArgumentException is thrown when the length is not 16.

Dates and times

The Avro spec defines six logical types for temporal data:

  • calendar dates with no time or time zone ("date")
  • duration comprised of months, days, and milliseconds ("duration")
  • times of day with no date or time zone ("time-millis" and "time-micros")
  • instants in time ("timestamp-millis" and "timestamp-micros")

In addition to the conversions described later in this section, these logical types can be treated as their underlying primitive types:

var schema = new MillisecondTimestampSchema();
var serializer = new BinarySerializerBuilder().BuildSerializer<DateTime>();
var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<long>();

var epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
deserializer.Deserialize(serializer.Serialize(epoch)); // 0L

Calendar dates

.NET’s DateOnly struct represents a calendar date with no time or time zone. To match other temporal types, Chr.Avro prefers to map DateOnlys to ISO 8601 strings, avoiding "date" by default when building schemas. FormatException is thrown when deserializing a value that cannot be parsed.

Serializing and deserializing "date" values is supported with the caveat that attempting to deserialize a date before 0001-01-01 or after 9999-12-31 will cause an OverflowException.

Durations

.NET’s TimeSpan struct is commonly used to represent durations. Under the hood, a TimeSpan stores its value as a number of ticks (1 tick = 100 ns = 0.0001 ms). However, "duration" values can’t be reliably converted to a number of ticks; there isn’t a consistent number of milliseconds in a day or a consistent number of days in a month. Also, all three components are represented as unsigned integers, so negative durations cannot be expressed.

In light of those incompatibilities, Chr.Avro prefers to map TimeSpans to ISO 8601 strings, avoiding "duration" by default when building schemas. XmlConvert.ToString and XmlConvert.ToTimeSpan are used for conversions. FormatException is thrown when a string cannot be parsed.

Serializing and deserializing "duration" values still work, though there are some limitations. .NET assumes a consistent number of milliseconds in a day, so Chr.Avro supports the day and millisecond components. (This may lead to minor arithmetic inconsistencies with other platforms.) All non-negative TimeSpans can be serialized without using the months component. OverflowException is thrown when serializing a negative TimeSpan and when deserializing a value with a non-zero months component.

Times

.NET’s TimeOnly struct represents a time of day with no time zone. To match other temporal types, Chr.Avro prefers to map TimeOnlys to ISO 8601 strings, avoiding "time-millis" and "time-micros" by default when building schemas. FormatException is thrown when deserializing a value that cannot be parsed.

Serializing and deserializing "time-millis" and "time-micros" values is supported. However, .NET date types are tick-precision, so serializing to "time-millis" or deserializing from "time-micros" may result in a loss of precision.

Timestamps

Both DateTime and DateTimeOffset can be used to represent timestamps. Chr.Avro prefers to map those types to ISO 8601 strings, avoiding "timestamp-millis" and "timestamp-micros" by default when building schemas. This behavior is consistent with how durations are handled, and it also means that DateTime kind and timezone are retained—the round-trip (“O”, “o”) format specifier is used for serialization. FormatException is thrown when a string cannot be parsed. Serializing and deserializing "timestamp-millis" and "timestamp-micros" values are supported as well, with a few caveats:

  • All DateTimes are converted to UTC. Don’t use DateTimes with kind unspecified.
  • .NET date types are tick-precision, so serializing to "timestamp-millis" or deserializing from "timestamp-micros" may result in a loss of precision.

Enums

Chr.Avro maps .NET enumerations to Avro’s "enum" type by matching each symbol on the schema to an enumerator on the enumeration according to these rules:

  • Enumerator names don’t need to be an exact match—all non-alphanumeric characters are stripped and comparisons are case-insensitive. For example, a PRIMARY_RESIDENCE symbol will match enumerators named PrimaryResidence, primaryResidence, etc.
  • When the serializer builder and deserializer builder find multiple matching enumerators, UnsupportedTypeException is thrown.
  • When the deserializer builder can’t find a matching enumerator but a default is specified by the schema, the deserializer builder will attempt to map the default instead.
  • When the deserializer builder can’t find a matching enumerator and no default is specified by the schema, UnsupportedTypeException is thrown.

By default, Chr.Avro also honors data contract attributes if a DataContractAttribute is present on the enumeration. In that case, if Value is set on an enumerator, the custom value must match the symbol exactly. If it’s not set, the enumerator name will be compared inexactly as described above.

To change or extend this behavior, implement ITypeResolver or extend one of the existing resolvers (ReflectionResolver and DataContractResolver).

Because "enum" symbols are represented as strings, Chr.Avro also supports mapping enum schemas to string. On serialization, if the name of the enumerator is not a symbol in the schema, ArgumentException will be thrown.

Maps

Avro’s "map" type represents a map of keys (assumed to be strings) to values. Chr.Avro can map a .NET type to "map" if any of the following is true:

  1. The type is a generic dictionary type from System.Collections.Generic or System.Collections.Immutable.
  2. The type implements IEnumerable<KeyValuePair<TKey, TValue>> (for serialization) and has a constructor with a single IEnumerable<KeyValuePair<TKey, TValue>> parameter (for deserialization).

Additionally, because Avro map keys are assumed to be strings, serializers and deserializers are built for key types by mapping to "string" implicitly.

Some examples of this behavior:

.NET type Serializable Deserializable Notes
IDictionary<string, int> IDictionary<TKey, TValue> is a generic dictionary type.
Dictionary<string, int> Dictionary<TKey, TValue> is a generic dictionary type.
IDictionary<Guid, int> IDictionary<TKey, TValue> is a generic dictionary type, and Guid can be mapped to "string".
IDictionary<byte[], int> 🚫 🚫 IDictionary<TKey, TValue> is a generic dictionary type, but byte cannot be mapped to "string".
IEnumerable<KeyValuePair<string, int>> IEnumerable<KeyValuePair<TKey, TValue>> is recognized as a generic dictionary type.
ICollection<KeyValuePair<string, int>> ICollection<KeyValuePair<TKey, TValue>> is recognized as a generic dictionary type.
ImmutableSortedDictionary<string, int> ImmutableSortedDictionary<TKey, TValue> is a generic dictionary type.
IEnumerable<ValueTuple<string, int>> 🚫 🚫 IEnumerable<ValueTuple<T1, T2>> is not recognized as a generic dictionary type.

Numbers

The Avro spec defines four primitive numeric types:

  • 32-bit signed integers ("int")
  • 64-bit signed integers ("long")
  • single-precision (32-bit) floating-point numbers ("float")
  • double-precision (64-bit) floating-point numbers ("double")

It also defines a logical "decimal" type that supports arbitrary-precision decimal numbers.

Integral types

When generating a schema, Chr.Avro maps integral types less than or equal to 32 bits to "int" and integral types greater than 32 bits to "long":

.NET type Range Bits Generated schema
sbyte −128 to 127 8 "int"
byte 0 to 255 8 "int"
short −32,768 to 32,767 16 "int"
ushort 0 to 65,535 16 "int"
char 0 (U+0000) to 65,535 (U+ffff) 16 "int"
int −2,147,483,648 to 2,147,483,647 32 "int"
uint 0 to 4,294,967,295 32 "int"
long −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 64 "long"
ulong 0 to 18,446,744,073,709,551,615 64 "long"

Whether a schema is "int" or "long" has no impact on serialization. Integers are zig-zag encoded, so they take up as much space as they need. For that reason, Chr.Avro imposes no constraints on which numeric types can be serialized or deserialized to "int" or "long"—if a conversion exists, the binary serializer and deserializer will use it. Because enum types are able to be implicitly converted to and from integral types, Chr.Avro can map any enum type to "int" or "long" as well.

Non-integral types

On the non-integral side, .NET types are mapped to their respective Avro types:

.NET type Approximate range Precision Generated schema
float ±1.5 × 10−45 to ±3.4 × 1038 ~6–9 digits "float"
double ±5.0 × 10−324 to ±1.7 × 10308 ~15–17 digits "double"
decimal ±1.0 × 10−28 to ±7.9228 × 1028 28–29 significant digits {
  "type": "bytes",
  "logicalType": "decimal",
  "precision": 29,
  "scale": 14
}

Generally speaking, it’s a good idea to fit the precision and scale of a decimal schema to a specific use case. For example, air temperature measurements in ℉ might have a precision of 4 and a scale of 1. (29 and 14, the schema builder defaults, were selected to fit any .NET decimal.) Decimal values are resized to fit the scale specified by the schema—when serializing, digits may be truncated; when deserializing, zeros may be added.

Caveats

Because the serializer and deserializer rely on predefined conversions, the remarks from the C# numeric conversions table are relevant. Notably:

  • Conversions may cause a loss of precision. For instance, if a "double" value is deserialized into a float, the value will be rounded to the nearest float value:

    var schema = new DoubleSchema();
    var serializer = new BinarySerializerBuilder().BuildSerializer<double>(schema);
    var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<float>(schema);
    
    var e = Math.E; // 2.71828182845905
    var bytes = serializer.Serialize(e);
    
    deserializer.Deserialize(bytes); // 2.718282
    

    See the .NET type conversion tables for a complete list of conversions.

  • OverflowException is thrown when a conversion fails during serialization or deserialization.

    When a value is out of the range of a numeric type:

    var schema = new IntSchema();
    var serializer = new BinarySerializerBuilder().BuildSerializer<int>(schema);
    var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<short>(schema);
    
    var bytes = serializer.Serialize(int.MaxValue);
    
    deserializer.Deserialize(bytes); // throws OverflowException
    

    When special floating-point values are deserialized to decimal:

    var schema = new FloatSchema();
    var serializer = new BinarySerializerBuilder().BuildSerializer<float>(schema);
    var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<decimal>(schema);
    
    var bytes = serializer.Serialize(float.NaN);
    
    deserializer.Deserialize(bytes); // throws OverflowException
    

    Finally, when a serialized integer is too large to deserialize:

    var schema = new LongSchema();
    var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<long>(schema);
    
    var bytes = new byte[]
    {
        0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x01
    };
    
    deserializer.Deserialize(bytes); // throws OverflowException
    

Records

Chr.Avro maps .NET classes and structs to Avro’s "record" type by attempting to find a matching constructor. The rules:

  • Parameter names don't need to match the schema exactly—all non-alphanumeric characters are stripped and comparisons are case-insensitive. So, for example, a record field named addressLine1 will match parameters named AddressLine1, AddressLine_1, ADDRESS_LINE_1, etc.
  • Parameters must have exactly 1 match for each record field.
  • There may be additional optional parameters.

If no matching constructors are found then it will attempt to match each record field to a field or property on the type. The rules:

  • Type member names don’t need to match the schema exactly—all non-alphanumeric characters are stripped and comparisons are case-insensitive. So, for example, a record field named addressLine1 will match type members named AddressLine1, AddressLine_1, ADDRESS_LINE_1, etc.
  • When the serializer builder and deserializer builder find multiple matching type members, UnsupportedTypeException is thrown.
  • When the serializer builder can’t find a matching type member but a default is specified by the schema, the default value will be serialized.
  • When the serializer builder can’t find a matching type member and no default is specified by the schema, UnsupportedTypeException is thrown. When the deserializer can’t find a matching type member, the field is ignored.
  • The deserializer builder throws UnsupportedTypeException if a type doesn’t have a parameterless public constructor.

By default, Chr.Avro also honors data contract attributes if a DataContractAttribute is present on the type. In that case, two additional rules apply:

  • All type members without a DataMemberAttribute are ignored.
  • If Name is set, the custom name must match the record field name exactly. If it’s not set, the type member name will be compared inexactly as described above.

To change or extend this behavior, implement ITypeResolver or extend one of the existing resolvers (ReflectionResolver and DataContractResolver).

Strings

In addition to string, Chr.Avro supports mapping the following types to "string":

.NET type Notes
DateTime Values are expressed as strings according to ISO 8601. See the dates and times section for details.
DateTimeOffset
TimeSpan
Guid The Guid.ToString method is used for serialization, and the Guid constructor is used for deserialization. FormatException is thrown when a string cannot be parsed.
Uri The Uri.ToString method is used for serialization, and the Uri constructor is used for deserialization. FormatException is thrown when a string cannot be parsed.

Unions

Chr.Avro maps Avro unions to .NET types according to these rules:

  • Unions must contain more than one schema. Avro doesn’t explicitly disallow empty unions, but they can’t be serialized or deserialized.
  • When mapping a union schema to a type for serialization, the type must be able to be mapped to one of the non-"null" schemas in the union (if there are any).
  • When mapping a union schema to a type for deserialization, the type must be able to be mapped to all of the schemas in the union.

So, for example:

Schema .NET type Serializable Deserializable Notes
[] 🚫 🚫 Empty unions are not supported.
["int"] int int could be serialized and deserialized as "int".
["null"] int 🚫 int could be serialized as "null", but it couldn’t be deserialized as "null".
["int", "string"] int 🚫 int could be serialized as "int", but it couldn’t be deserialized as "string".
["null", "int"] int 🚫 int could be serialized as "int", but it couldn’t be deserialized as "null".
["null", "int"] int? int? could be serialized and deserialized as either "null" or "int".