SlideShare a Scribd company logo
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Avro
Kafka & Avro:
Confluent Schema
Registry
Managing Record Schema in
Kafka
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Confluent Schema Registry
❖ Confluent Schema Registry stores Avro Schemas for Kafka
clients
❖ Provides REST interface for putting and getting Avro schemas
❖ Stores a history of schemas
❖ versioned
❖ allows you to configure compatibility setting
❖ supports evolution of schemas
❖ Provides serializers used by Kafka clients which handles schema
storage and serialization of records using Avro
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why Schema Registry?
❖ Producer creates a record/message, which is an Avro record
❖ Record contains the schema and data
❖ Schema Registry Avro Serializer serializes the data and schema id (just id)
❖ Keeps a cache of registered schemas from Schema Registry to ids
❖ Consumer receives payload and deserializes it with Schema Registry Avro Deserializers
❖ Deserializer looks up the full schema from cache or Schema Registry based on id
❖ Consumer has its schema, one it is expecting record/message to conform to
❖ Compatibility check is performed or two schemas
❖ if no match, but are compatible, then payload transformation happens aka Schema Evolution
❖ if not failure
❖ Kafka records have Key and Value and schema can be done on both
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Compatibility
❖ Backward Compatibility (default)
❖ New, backward compatible schema, will not break consumers
❖ Producers could be using older schema that is backwards compatible with Consumer
❖ Forward compatibility
❖ Records sent with new forward compatible schema can be deserialized with older schemas
❖ Consumers can use an older schema and never be updated (maybe never needs new fields)
❖ Full compatibility
❖ New version of a schema is backward and forward compatible
❖ None
❖ Schema will not be validated for compatibility at all
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Registry Config
❖ Compatibility can be configured globally or per schema
❖ Options are:
❖ NONE - don’t check for schema compatibility
❖ FORWARD - check to make sure last schema version is forward
compatible with new schemas
❖ BACKWARDS (default) - make sure new schema is backwards
compatible with latest
❖ FULL - make sure new schema is forwards and backwards
compatible from latest to new and from new to latest
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Registry Actions
❖ Register schemas for key and values of Kafka records
❖ List schemas (subjects)
❖ List all versions of a subject (schema)
❖ Retrieve a schema by version or id
❖ get latest version of schema
❖ Check to see if schema is compatible with a certain version
❖ Get the compatibility level setting of the Schema Registry
❖ BACKWARDS, NONE
❖ Add compatibility settings to a subject/schema
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Evolution
❖ Avro schema is changed after data has been written to store using an older version of
that schema, then Avro might do a Schema Evolution
❖ Schema evolution is automatic transformation of Avro schema
❖ transformation is between version of consumer schema and what the producer put
into the Kafka log
❖ When Consumer schema is not identical to the Producer schema used to serialize
the Kafka Record then a data transformation is performed on the Kafka record (key or
value)
❖ If the schemas match then no need to do a transformation
❖ Schema evolution is happens only during deserialization at the Consumer
❖ If Consumer’s schema is different from Producer’s schema, then value or key is
automatically modified during deserialization to conform to consumers reader schema
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Allowed Schema Modifications
❖ Add a field with a default
❖ Remove a field that had a default value
❖ Change a fields order attribute
❖ Change a fields default value
❖ Remove or add a field alias
❖ Remove or add a type alias
❖ Change a type to a union that contains original type
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Rules of the Road for modifying
Schema
❖ Provide a default value for fields in your schema
❖ Allows you to delete the field later later
❖ Don’t change a field's data type
❖ When adding a new field to your schema, you have to
provide a default value for the field
❖ Don’t rename an existing field
❖ You can add an alias
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Remember our example
Employee
Avro covered in Avro/Kafka Tutorial
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Let’s say
❖ Employee did not have an age in version 1 of the
schema
❖ Later we decided to add an age field with a default value
of -1
❖ Now let’s say we have a Producer using version 2, and
a Consumer using version 1
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Scenario adding a new field age with
default value
❖ Producer uses version 2 of the Employee schema and creates a
com.cloudurable.Employee record, and sets age field to 42, then sends it to Kafka topic
new-employees
❖ Consumer consumes records from new-employees using version 1 of the Employee
Schema
❖ Since Consumer is using version 1 of schema, age field is removed during
deserialization
❖ Same consumer modifies name field and then writes the record back to a NoSQL store
❖ When it does this, the age field is missing from value that it writes to the store
❖ Another client using version 2 reads the record from the NoSQL store
❖ Age field is missing from the record (because the Consumer wrote it with version 1),
age is set to default value of -1
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Registry Actions
❖ Register schemas for key and values of Kafka records
❖ List schemas (subjects)
❖ List all versions of a subject (schema)
❖ Retrieve a schema by version or id
❖ get latest version of schema
❖ Check to see if schema is compatible with a certain version
❖ Get the compatibility level setting of the Schema Registry
❖ BACKWARDS, FORWARD, FULL, NONE
❖ Add compatibility settings to a subject/schema
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Register a Schema
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Register a Schema
{"id":2}
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json"
--data '{"schema": "{"type": …}’ 
http://localhost:8081/subjects/Employee/versions
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
List All Schema
["Employee","Employee2","FooBar"]
curl -X GET http://localhost:8081/subjects
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Working with versions
[1,2,3,4,5]
{“subject”:"Employee","version":2,"id":4,"schema":"
{"type":"record","name":"Employee",
”namespace”:"com.cloudurable.phonebook", …
{“subject”:"Employee","version":1,"id":3,"schema":"
{"type":"record","name":"Employee",
”namespace”:"com.cloudurable.phonebook", …
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Working with Schemas
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Changing Compatibility
Checks
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Incompatible Change
{“error_code":409,"
message":"Schema being registered is incompatible with an e
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Incompatible Change
{"is_compatible":false}
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Use Schema Registry
❖ Start up Schema Registry server pointing to Zookeeper
cluster
❖ Import Kafka Avro Serializer and Avro Jars
❖ Configure Producer to use Schema Registry
❖ Use KafkaAvroSerializer from Producer
❖ Configure Consumer to use Schema Registry
❖ Use KafkaAvroDeserializer from Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start up Schema Registry
Server
cat ~/tools/confluent-3.2.1/etc/schema-registry/schema-registry.properties
listeners=http://0.0.0.0:8081
kafkastore.connection.url=localhost:2181
kafkastore.topic=_schemas
debug=false
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Import Kafka Avro Serializer &
Avro Jars
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Configure Producer to use Schema
Registry
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Use KafkaAvroSerializer from
Producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Configure Consumer to use Schema
Registry
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Use KafkaAvroDeserializer from
Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Schema Registry
❖ Confluent provides Schema Registry to manage Avro
Schemas for Kafka Consumers and Producers
❖ Avro provides Schema Migration
❖ Confluent uses Schema compatibility checks to see if
Producer schema and Consumer schemas are
compatible and to do Schema evolution if needed
❖ Use KafkaAvroSerializer from Producer
❖ Use KafkaAvroDeserializer from Consumer

More Related Content

Kafka and Avro with Confluent Schema Registry

  • 1. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Avro Kafka & Avro: Confluent Schema Registry Managing Record Schema in Kafka
  • 2. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Confluent Schema Registry ❖ Confluent Schema Registry stores Avro Schemas for Kafka clients ❖ Provides REST interface for putting and getting Avro schemas ❖ Stores a history of schemas ❖ versioned ❖ allows you to configure compatibility setting ❖ supports evolution of schemas ❖ Provides serializers used by Kafka clients which handles schema storage and serialization of records using Avro
  • 3. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Why Schema Registry? ❖ Producer creates a record/message, which is an Avro record ❖ Record contains the schema and data ❖ Schema Registry Avro Serializer serializes the data and schema id (just id) ❖ Keeps a cache of registered schemas from Schema Registry to ids ❖ Consumer receives payload and deserializes it with Schema Registry Avro Deserializers ❖ Deserializer looks up the full schema from cache or Schema Registry based on id ❖ Consumer has its schema, one it is expecting record/message to conform to ❖ Compatibility check is performed or two schemas ❖ if no match, but are compatible, then payload transformation happens aka Schema Evolution ❖ if not failure ❖ Kafka records have Key and Value and schema can be done on both
  • 4. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Compatibility ❖ Backward Compatibility (default) ❖ New, backward compatible schema, will not break consumers ❖ Producers could be using older schema that is backwards compatible with Consumer ❖ Forward compatibility ❖ Records sent with new forward compatible schema can be deserialized with older schemas ❖ Consumers can use an older schema and never be updated (maybe never needs new fields) ❖ Full compatibility ❖ New version of a schema is backward and forward compatible ❖ None ❖ Schema will not be validated for compatibility at all
  • 5. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Registry Config ❖ Compatibility can be configured globally or per schema ❖ Options are: ❖ NONE - don’t check for schema compatibility ❖ FORWARD - check to make sure last schema version is forward compatible with new schemas ❖ BACKWARDS (default) - make sure new schema is backwards compatible with latest ❖ FULL - make sure new schema is forwards and backwards compatible from latest to new and from new to latest
  • 6. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Registry Actions ❖ Register schemas for key and values of Kafka records ❖ List schemas (subjects) ❖ List all versions of a subject (schema) ❖ Retrieve a schema by version or id ❖ get latest version of schema ❖ Check to see if schema is compatible with a certain version ❖ Get the compatibility level setting of the Schema Registry ❖ BACKWARDS, NONE ❖ Add compatibility settings to a subject/schema
  • 7. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Evolution ❖ Avro schema is changed after data has been written to store using an older version of that schema, then Avro might do a Schema Evolution ❖ Schema evolution is automatic transformation of Avro schema ❖ transformation is between version of consumer schema and what the producer put into the Kafka log ❖ When Consumer schema is not identical to the Producer schema used to serialize the Kafka Record then a data transformation is performed on the Kafka record (key or value) ❖ If the schemas match then no need to do a transformation ❖ Schema evolution is happens only during deserialization at the Consumer ❖ If Consumer’s schema is different from Producer’s schema, then value or key is automatically modified during deserialization to conform to consumers reader schema
  • 8. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Allowed Schema Modifications ❖ Add a field with a default ❖ Remove a field that had a default value ❖ Change a fields order attribute ❖ Change a fields default value ❖ Remove or add a field alias ❖ Remove or add a type alias ❖ Change a type to a union that contains original type
  • 9. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Rules of the Road for modifying Schema ❖ Provide a default value for fields in your schema ❖ Allows you to delete the field later later ❖ Don’t change a field's data type ❖ When adding a new field to your schema, you have to provide a default value for the field ❖ Don’t rename an existing field ❖ You can add an alias
  • 10. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Remember our example Employee Avro covered in Avro/Kafka Tutorial
  • 11. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Let’s say ❖ Employee did not have an age in version 1 of the schema ❖ Later we decided to add an age field with a default value of -1 ❖ Now let’s say we have a Producer using version 2, and a Consumer using version 1
  • 12. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Scenario adding a new field age with default value ❖ Producer uses version 2 of the Employee schema and creates a com.cloudurable.Employee record, and sets age field to 42, then sends it to Kafka topic new-employees ❖ Consumer consumes records from new-employees using version 1 of the Employee Schema ❖ Since Consumer is using version 1 of schema, age field is removed during deserialization ❖ Same consumer modifies name field and then writes the record back to a NoSQL store ❖ When it does this, the age field is missing from value that it writes to the store ❖ Another client using version 2 reads the record from the NoSQL store ❖ Age field is missing from the record (because the Consumer wrote it with version 1), age is set to default value of -1
  • 13. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Registry Actions ❖ Register schemas for key and values of Kafka records ❖ List schemas (subjects) ❖ List all versions of a subject (schema) ❖ Retrieve a schema by version or id ❖ get latest version of schema ❖ Check to see if schema is compatible with a certain version ❖ Get the compatibility level setting of the Schema Registry ❖ BACKWARDS, FORWARD, FULL, NONE ❖ Add compatibility settings to a subject/schema
  • 14. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Register a Schema
  • 15. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Register a Schema {"id":2} curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{"type": …}’ http://localhost:8081/subjects/Employee/versions
  • 16. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ List All Schema ["Employee","Employee2","FooBar"] curl -X GET http://localhost:8081/subjects
  • 17. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Working with versions [1,2,3,4,5] {“subject”:"Employee","version":2,"id":4,"schema":" {"type":"record","name":"Employee", ”namespace”:"com.cloudurable.phonebook", … {“subject”:"Employee","version":1,"id":3,"schema":" {"type":"record","name":"Employee", ”namespace”:"com.cloudurable.phonebook", …
  • 18. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Working with Schemas
  • 19. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Changing Compatibility Checks
  • 20. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Incompatible Change {“error_code":409," message":"Schema being registered is incompatible with an e
  • 21. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Incompatible Change {"is_compatible":false}
  • 22. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Use Schema Registry ❖ Start up Schema Registry server pointing to Zookeeper cluster ❖ Import Kafka Avro Serializer and Avro Jars ❖ Configure Producer to use Schema Registry ❖ Use KafkaAvroSerializer from Producer ❖ Configure Consumer to use Schema Registry ❖ Use KafkaAvroDeserializer from Consumer
  • 23. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Start up Schema Registry Server cat ~/tools/confluent-3.2.1/etc/schema-registry/schema-registry.properties listeners=http://0.0.0.0:8081 kafkastore.connection.url=localhost:2181 kafkastore.topic=_schemas debug=false
  • 24. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Import Kafka Avro Serializer & Avro Jars
  • 25. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Configure Producer to use Schema Registry
  • 26. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Use KafkaAvroSerializer from Producer
  • 27. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Configure Consumer to use Schema Registry
  • 28. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Use KafkaAvroDeserializer from Consumer
  • 29. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Schema Registry ❖ Confluent provides Schema Registry to manage Avro Schemas for Kafka Consumers and Producers ❖ Avro provides Schema Migration ❖ Confluent uses Schema compatibility checks to see if Producer schema and Consumer schemas are compatible and to do Schema evolution if needed ❖ Use KafkaAvroSerializer from Producer ❖ Use KafkaAvroDeserializer from Consumer