Apache Kafka

Event Driven Architecture (EDA)

Tool: Miro

Broker (running application on OS/HW, which takes messages and stores them on HD or reads them from HD)
Zookeeper (keeps list of brokers & configuration, checks broker availability)
Cluster (multiple brokers, for scalability, fault-tolerance, high throughput)
Record (key, value, timestamp)
Topic (can be on multiple brokers, since kafka is distributed)
- delete: remove msg if topic gets too big or no new msg received
- compact: msg with same key gets replaced with existing one or added if key is new
Partition (help splitting the load across brokers)
- same topic gets split over multiple partitions where each partition is stored on different broker
- if number of partition is bigger than number of brokers, multiple partition get stored on same broker :(
- each partition will store different messages, i.e. same message will not exist in different partition
- Partitions get replicated in order to keep fault-tolerance
Producer
- creates & transmits events to kafka
- creates a record which it needs to serialize (key/value-serializer)
- Serializer: Avro, Protobuf, Thrift, JSON, XML, YAML
Consumer
- uses pull to get new messages
- multiple consumers can “act” as one if they are in the same consumer-group
Schema registry (e.g. confluent)