Types and conversions
Chr.Avro supports mapping .NET’s built-in types, as well as commonly used types like DateTime
and Uri
, to Avro schemas. This document is a comprehensive explanation of how that mapping works.
The serializer builder and deserializer builder generally throw UnsupportedTypeException
when a type can’t be mapped to a schema. Other exceptions, usually OverflowException
and FormatException
, are thrown when errors occur during serialization or deserialization.
Arrays
Avro specifies an "array"
type for variable-length lists of items. Chr.Avro can map a .NET type to "array"
if any of the following is true:
- The type is a one-dimensional or jagged array. Multi-dimensional arrays are currently not supported because they can’t be deserialized reliably.
- The type is an
ArraySegment<T>
type, aCollection<T>
type, or a generic collection type from System.Collections.Generic or System.Collections.Immutable. - The type implements
IEnumerable<T>
(for serialization) and has a constructor with a singleIEnumerable<T>
parameter (for deserialization).
Some examples:
.NET type | Serializable | Deserializable | Notes |
---|---|---|---|
int[] |
✅ | ✅ | int[] is a one-dimensional array type. |
int[,,] |
🚫 | 🚫 | int[,,] is a multi-dimensional array type. |
int[][] |
✅ | ✅ | int[][] is a jagged array type. |
IEnumerable<int> |
✅ | ✅ | IEnumerable<T> is a generic collection type. |
ISet<int> |
✅ | ✅ | ISet<T> is a generic collection type. |
List<int> |
✅ | ✅ | List<T> is a generic collection type. |
List<int[]> |
✅ | ✅ | List<T> is a generic collection type and int[] is an array type. |
ImmutableQueue<int> |
✅ | ✅ | ImmutableQueue<T> is a generic collection type. |
Array |
🚫 | 🚫 | Array isn’t a generic type, so Chr.Avro can’t determine how to handle its items. |
Booleans
Chr.Avro maps Avro’s "boolean"
primitive type to the .NET bool
type. No implicit conversions exist between bool
and other types (see the .NET docs), so no other mappings are supported.
Byte arrays
In addition to byte[]
, Chr.Avro supports mapping the following types to "bytes"
and "fixed"
:
.NET type | Notes |
---|---|
Guid |
The Guid.ToByteArray method is used for serialization, and the Guid constructor is used for deserialization. ArgumentException is thrown when the length is not 16.
|
Dates and times
The Avro spec defines six logical types for temporal data:
- calendar dates with no time or time zone (
"date"
) - duration comprised of months, days, and milliseconds (
"duration"
) - times of day with no date or time zone (
"time-millis"
and"time-micros"
) - instants in time (
"timestamp-millis"
and"timestamp-micros"
)
In addition to the conversions described later in this section, these logical types can be treated as their underlying primitive types:
var schema = new MillisecondTimestampSchema();
var serializer = new BinarySerializerBuilder().BuildSerializer<DateTime>();
var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<long>();
var epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
deserializer.Deserialize(serializer.Serialize(epoch)); // 0L
Calendar dates
.NET’s DateOnly
struct represents a calendar date with no time or time zone. To match other temporal types, Chr.Avro prefers to map DateOnly
s to ISO 8601 strings, avoiding "date"
by default when building schemas. FormatException
is thrown when deserializing a value that cannot be parsed.
Serializing and deserializing "date"
values is supported with the caveat that attempting to deserialize a date before 0001-01-01 or after 9999-12-31 will cause an OverflowException
.
Durations
.NET’s TimeSpan
struct is commonly used to represent durations. Under the hood, a TimeSpan
stores its value as a number of ticks (1 tick = 100 ns = 0.0001 ms). However, "duration"
values can’t be reliably converted to a number of ticks; there isn’t a consistent number of milliseconds in a day or a consistent number of days in a month. Also, all three components are represented as unsigned integers, so negative durations cannot be expressed.
In light of those incompatibilities, Chr.Avro prefers to map TimeSpan
s to ISO 8601 strings, avoiding "duration"
by default when building schemas. XmlConvert.ToString
and XmlConvert.ToTimeSpan
are used for conversions. FormatException
is thrown when a string cannot be parsed.
Serializing and deserializing "duration"
values still work, though there are some limitations. .NET assumes a consistent number of milliseconds in a day, so Chr.Avro supports the day and millisecond components. (This may lead to minor arithmetic inconsistencies with other platforms.) All non-negative TimeSpan
s can be serialized without using the months component. OverflowException
is thrown when serializing a negative TimeSpan
and when deserializing a value with a non-zero months component.
Times
.NET’s TimeOnly
struct represents a time of day with no time zone. To match other temporal types, Chr.Avro prefers to map TimeOnly
s to ISO 8601 strings, avoiding "time-millis"
and "time-micros"
by default when building schemas. FormatException
is thrown when deserializing a value that cannot be parsed.
Serializing and deserializing "time-millis"
and "time-micros"
values is supported. However, .NET date types are tick-precision, so serializing to "time-millis"
or deserializing from "time-micros"
may result in a loss of precision.
Timestamps
Both DateTime
and DateTimeOffset
can be used to represent timestamps. Chr.Avro prefers to map those types to ISO 8601 strings, avoiding "timestamp-millis"
and "timestamp-micros"
by default when building schemas. This behavior is consistent with how durations are handled, and it also means that DateTime
kind and timezone are retained—the round-trip (“O”, “o”) format specifier is used for serialization. FormatException
is thrown when a string cannot be parsed.
Serializing and deserializing "timestamp-millis"
and "timestamp-micros"
values are supported as well, with a few caveats:
- All
DateTime
s are converted to UTC. Don’t useDateTime
s with kind unspecified. - .NET date types are tick-precision, so serializing to
"timestamp-millis"
or deserializing from"timestamp-micros"
may result in a loss of precision.
Enums
Chr.Avro maps .NET enumerations to Avro’s "enum"
type by matching each symbol on the schema to an enumerator on the enumeration according to these rules:
- Enumerator names don’t need to be an exact match—all non-alphanumeric characters are stripped and comparisons are case-insensitive. For example, a
PRIMARY_RESIDENCE
symbol will match enumerators namedPrimaryResidence
,primaryResidence
, etc. - When the serializer builder and deserializer builder find multiple matching enumerators,
UnsupportedTypeException
is thrown. - When the deserializer builder can’t find a matching enumerator but a default is specified by the schema, the deserializer builder will attempt to map the default instead.
- When the deserializer builder can’t find a matching enumerator and no default is specified by the schema,
UnsupportedTypeException
is thrown.
By default, Chr.Avro also honors data contract attributes if a DataContractAttribute
is present on the enumeration. In that case, if Value
is set on an enumerator, the custom value must match the symbol exactly. If it’s not set, the enumerator name will be compared inexactly as described above.
To change or extend this behavior, implement ITypeResolver
or extend one of the existing resolvers (ReflectionResolver
and DataContractResolver
).
Because "enum"
symbols are represented as strings, Chr.Avro also supports mapping enum schemas to string
. On serialization, if the name of the enumerator is not a symbol in the schema, ArgumentException
will be thrown.
Maps
Avro’s "map"
type represents a map of keys (assumed to be strings) to values. Chr.Avro can map a .NET type to "map"
if any of the following is true:
- The type is a generic dictionary type from System.Collections.Generic or System.Collections.Immutable.
- The type implements
IEnumerable<KeyValuePair<TKey, TValue>>
(for serialization) and has a constructor with a singleIEnumerable<KeyValuePair<TKey, TValue>>
parameter (for deserialization).
Additionally, because Avro map keys are assumed to be strings, serializers and deserializers are built for key types by mapping to "string"
implicitly.
Some examples of this behavior:
.NET type | Serializable | Deserializable | Notes |
---|---|---|---|
IDictionary<string, int> |
✅ | ✅ | IDictionary<TKey, TValue> is a generic dictionary type. |
Dictionary<string, int> |
✅ | ✅ | Dictionary<TKey, TValue> is a generic dictionary type. |
IDictionary<Guid, int> |
✅ | ✅ | IDictionary<TKey, TValue> is a generic dictionary type, and Guid can be mapped to "string" . |
IDictionary<byte[], int> |
🚫 | 🚫 | IDictionary<TKey, TValue> is a generic dictionary type, but byte cannot be mapped to "string" . |
IEnumerable<KeyValuePair<string, int>> |
✅ | ✅ | IEnumerable<KeyValuePair<TKey, TValue>> is recognized as a generic dictionary type. |
ICollection<KeyValuePair<string, int>> |
✅ | ✅ | ICollection<KeyValuePair<TKey, TValue>> is recognized as a generic dictionary type. |
ImmutableSortedDictionary<string, int> |
✅ | ✅ | ImmutableSortedDictionary<TKey, TValue> is a generic dictionary type. |
IEnumerable<ValueTuple<string, int>> |
🚫 | 🚫 | IEnumerable<ValueTuple<T1, T2>> is not recognized as a generic dictionary type. |
Numbers
The Avro spec defines four primitive numeric types:
- 32-bit signed integers (
"int"
) - 64-bit signed integers (
"long"
) - single-precision (32-bit) floating-point numbers (
"float"
) - double-precision (64-bit) floating-point numbers (
"double"
)
It also defines a logical "decimal"
type that supports arbitrary-precision decimal numbers.
Integral types
When generating a schema, Chr.Avro maps integral types less than or equal to 32 bits to "int"
and integral types greater than 32 bits to "long"
:
.NET type | Range | Bits | Generated schema |
---|---|---|---|
sbyte |
−128 to 127 | 8 | "int" |
byte |
0 to 255 | 8 | "int" |
short |
−32,768 to 32,767 | 16 | "int" |
ushort |
0 to 65,535 | 16 | "int" |
char |
0 (U+0000) to 65,535 (U+ffff) | 16 | "int" |
int |
−2,147,483,648 to 2,147,483,647 | 32 | "int" |
uint |
0 to 4,294,967,295 | 32 | "int" |
long |
−9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | 64 | "long" |
ulong |
0 to 18,446,744,073,709,551,615 | 64 | "long" |
Whether a schema is "int"
or "long"
has no impact on serialization. Integers are zig-zag encoded, so they take up as much space as they need. For that reason, Chr.Avro imposes no constraints on which numeric types can be serialized or deserialized to "int"
or "long"
—if a conversion exists, the binary serializer and deserializer will use it.
Because enum types are able to be implicitly converted to and from integral types, Chr.Avro can map any enum type to "int"
or "long"
as well.
Non-integral types
On the non-integral side, .NET types are mapped to their respective Avro types:
.NET type | Approximate range | Precision | Generated schema |
---|---|---|---|
float |
±1.5 × 10−45 to ±3.4 × 1038 | ~6–9 digits | "float" |
double |
±5.0 × 10−324 to ±1.7 × 10308 | ~15–17 digits | "double" |
decimal |
±1.0 × 10−28 to ±7.9228 × 1028 | 28–29 significant digits |
{
|
Generally speaking, it’s a good idea to fit the precision and scale of a decimal schema to a specific use case. For example, air temperature measurements in ℉ might have a precision of 4 and a scale of 1. (29 and 14, the schema builder defaults, were selected to fit any .NET decimal.) Decimal values are resized to fit the scale specified by the schema—when serializing, digits may be truncated; when deserializing, zeros may be added.
Caveats
Because the serializer and deserializer rely on predefined conversions, the remarks from the C# numeric conversions table are relevant. Notably:
-
Conversions may cause a loss of precision. For instance, if a
"double"
value is deserialized into afloat
, the value will be rounded to the nearestfloat
value:var schema = new DoubleSchema(); var serializer = new BinarySerializerBuilder().BuildSerializer<double>(schema); var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<float>(schema); var e = Math.E; // 2.71828182845905 var bytes = serializer.Serialize(e); deserializer.Deserialize(bytes); // 2.718282
See the .NET type conversion tables for a complete list of conversions.
-
OverflowException
is thrown when a conversion fails during serialization or deserialization.When a value is out of the range of a numeric type:
var schema = new IntSchema(); var serializer = new BinarySerializerBuilder().BuildSerializer<int>(schema); var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<short>(schema); var bytes = serializer.Serialize(int.MaxValue); deserializer.Deserialize(bytes); // throws OverflowException
When special floating-point values are deserialized to
decimal
:var schema = new FloatSchema(); var serializer = new BinarySerializerBuilder().BuildSerializer<float>(schema); var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<decimal>(schema); var bytes = serializer.Serialize(float.NaN); deserializer.Deserialize(bytes); // throws OverflowException
Finally, when a serialized integer is too large to deserialize:
var schema = new LongSchema(); var deserializer = new BinaryDeserializerBuilder().BuildDeserializer<long>(schema); var bytes = new byte[] { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x01 }; deserializer.Deserialize(bytes); // throws OverflowException
Records
Chr.Avro maps .NET classes and structs to Avro’s "record"
type by attempting to find a matching constructor. The rules:
- Parameter names don't need to match the schema exactly—all non-alphanumeric characters are stripped and comparisons are case-insensitive. So, for example, a record field named
addressLine1
will match parameters namedAddressLine1
,AddressLine_1
,ADDRESS_LINE_1
, etc. - Parameters must have exactly 1 match for each record field.
- There may be additional optional parameters.
If no matching constructors are found then it will attempt to match each record field to a field or property on the type. The rules:
- Type member names don’t need to match the schema exactly—all non-alphanumeric characters are stripped and comparisons are case-insensitive. So, for example, a record field named
addressLine1
will match type members namedAddressLine1
,AddressLine_1
,ADDRESS_LINE_1
, etc. - When the serializer builder and deserializer builder find multiple matching type members,
UnsupportedTypeException
is thrown. - When the serializer builder can’t find a matching type member but a default is specified by the schema, the default value will be serialized.
- When the serializer builder can’t find a matching type member and no default is specified by the schema,
UnsupportedTypeException
is thrown. When the deserializer can’t find a matching type member, the field is ignored. - The deserializer builder throws
UnsupportedTypeException
if a type doesn’t have a parameterless public constructor.
By default, Chr.Avro also honors data contract attributes if a DataContractAttribute
is present on the type. In that case, two additional rules apply:
- All type members without a
DataMemberAttribute
are ignored. - If
Name
is set, the custom name must match the record field name exactly. If it’s not set, the type member name will be compared inexactly as described above.
To change or extend this behavior, implement ITypeResolver
or extend one of the existing resolvers (ReflectionResolver
and DataContractResolver
).
Strings
In addition to string
, Chr.Avro supports mapping the following types to "string"
:
.NET type | Notes |
---|---|
DateTime |
Values are expressed as strings according to ISO 8601. See the dates and times section for details. |
DateTimeOffset |
|
TimeSpan |
|
Guid |
The Guid.ToString method is used for serialization, and the Guid constructor is used for deserialization. FormatException is thrown when a string cannot be parsed.
|
Uri |
The Uri.ToString method is used for serialization, and the Uri constructor is used for deserialization. FormatException is thrown when a string cannot be parsed.
|
Unions
Chr.Avro maps Avro unions to .NET types according to these rules:
- Unions must contain more than one schema. Avro doesn’t explicitly disallow empty unions, but they can’t be serialized or deserialized.
- When mapping a union schema to a type for serialization, the type must be able to be mapped to one of the non-
"null"
schemas in the union (if there are any). - When mapping a union schema to a type for deserialization, the type must be able to be mapped to all of the schemas in the union.
So, for example:
Schema | .NET type | Serializable | Deserializable | Notes |
---|---|---|---|---|
[] |
🚫 | 🚫 | Empty unions are not supported. | |
["int"] |
int |
✅ | ✅ |
int could be serialized and deserialized as "int" .
|
["null"] |
int |
✅ | 🚫 |
int could be serialized as "null" , but it couldn’t be deserialized as "null" .
|
["int", "string"] |
int |
✅ | 🚫 |
int could be serialized as "int" , but it couldn’t be deserialized as "string" .
|
["null", "int"] |
int |
✅ | 🚫 |
int could be serialized as "int" , but it couldn’t be deserialized as "null" .
|
["null", "int"] |
int? |
✅ | ✅ |
int? could be serialized and deserialized as either "null" or "int" .
|