Avro Field Order matters when evolving a schema

August 12, 2021

JSON and AVRO are both great serialization models. JSON is all text, human readable, and very verbose. AVRO is an efficient binary format. They can serialize the same data but they can also handle schema evolution or field changes differently

JSON supports field order changes because all of its fields come with their own label in every single message. Avro messages do not always handle field order changes.

Field Order

Avro serializer/deserializers operate on fields in the order they are declared. Producers and Consumers must be on a compatible schema including the field order. Do not change the order of AVRO fields. All Producers and Consumers are must be updated at the same time if you change the field order.

The AVRO 1.8 documentation says

Records

A record is encoded by encoding the values of its fields in the order that they are declared. In other words, a record is encoded as just the concatenation of the encodings of its fields. Field values are encoded per their schema.

and

Enums
An enum is encoded by a int, representing the zero-based position of the symbol in the schema.

Schema Compatability

Schema evolution describes how the model can change over time and the type of impact those changes can have. Some of the Compatibility Types for schema evolution let the producer and consumer update in any order. Other types may force the consumer to update first. This really means that this type of schema change can break the consumers.

Confluent documentation

Confluent Schema Registry

Notice that some of the Compatibility types remove fields. We discussed above how field removal breaks the field order contract. This means that the consumer must be updated first to not expect the field. We are changing the schema contract on the consumer side first.

Pay special notice to the upgrade first column. Every row where Consumers have to upgrade first is a this will break the consumer situation.

Compatibility Type	Changes allowed	Check against which schemas	Upgrade first
`BACKWARD`	Delete fields Add optional fields	Last version	Consumers
`BACKWARD_TRANSITIVE`	Delete fields Add optional fields	All previous versions	Consumers
`FORWARD`	Add fields Delete optional fields	Last version	Producers
`FORWARD_TRANSITIVE`	Add fields Delete optional fields	All previous versions	Producers
`FULL`	Add optional fields Delete optional fields	Last version	Any order
`FULL_TRANSITIVE`	Add optional fields Delete optional fields	All previous versions	Any order
`NONE`	All changes are accepted	Compatibility checking disabled	Depends

Sample schema definition.

The fields will go onto the wire in the order A, B, F, Z. The good news here is that we will get a type failure if we change the order. You might not detect a problem and deserialization if all the fields are of the same type.

{
"type": "record",
"name": "Example",
"fields": [
    { "name": "A", "type":  "boolean"  },
    { "name": "B", "type":  "int"  },
    { "name": "F", "type":  "float"  },
    { "name": "Z", "type":  "string"  },
]
} 

Blog de Joe Freeman