Avro Field Order matters when evolving a schema

JSON and AVRO are both great serialization models.  JSON is all text, human readable, and very verbose.  AVRO is an efficient binary format.  They can serialize the same data but they can also handle schema evolution or field changes differently


JSON supports field order changes because all of its fields come with their own label in every single message. Avro messages do not always handle field order changes.

Field Order

Avro serializer/deserializers operate on fields in the order they are declared. Producers and Consumers must be on a compatible schema including the field order.  Do not change the order of AVRO fields. All Producers and Consumers are must be updated at the same time if you change the field order.

The AVRO 1.8 documentation says 

Records 

A record is encoded by encoding the values of its fields in the order that they are declared. In other words, a record is encoded as just the concatenation of the encodings of its fields. Field values are encoded per their schema.

and

Enums

An enum is encoded by a int, representing the zero-based position of the symbol in the schema.

Schema Compatability

Schema evolution describes how the model can change over time and the type of impact those changes can have.  Some of the Compatibility Types for schema evolution let the producer and consumer update in any order. Other types may force the consumer to update first.  This really means that this type of schema change can break the consumers. 

Confluent documentation


Notice that some of the Compatibility types remove fields. We discussed above how field removal breaks the field order contract.  This means that the consumer must be updated first to not expect the field.  We are changing the schema contract on the consumer side first.

Pay special notice to the upgrade first column.  Every row where Consumers have to upgrade first is a this will break the consumer situation.  

Compatibility TypeChanges allowedCheck against which schemasUpgrade first
BACKWARD
  • Delete fields
  • Add optional fields
Last versionConsumers
BACKWARD_TRANSITIVE
  • Delete fields
  • Add optional fields
All previous versionsConsumers
FORWARD
  • Add fields
  • Delete optional fields
Last versionProducers
FORWARD_TRANSITIVE
  • Add fields
  • Delete optional fields
All previous versionsProducers
FULL
  • Add optional fields
  • Delete optional fields
Last versionAny order
FULL_TRANSITIVE
  • Add optional fields
  • Delete optional fields
All previous versionsAny order
NONE
  • All changes are accepted
Compatibility checking disabledDepends

Sample schema definition.

The fields will go onto the wire in the order A, B, F, Z.   The good news here is that we will get a type failure if we change the order.  You might not detect a problem and deserialization if all the fields are of the same type.

{
"type""record",
"name""Example",
"fields": [
    { "name""A""type":  "boolean"  },
    { "name""B""type":  "int"  },
    { "name""F""type":  "float"  },
    { "name""Z""type":  "string"  },
]




Comments

Popular posts from this blog

Installing the RNDIS driver on Windows 11 to use USB Raspberry Pi as network attached

Understanding your WSL2 RAM and swap - Changing the default 50%-25%

Almost PaaS Document Parsing with Tika and AWS Elastic Beanstalk