top of page
Writer's pictureSquareShift Engineering Team

Setting Up MySQL Debezium Connector for Change Data Capture (CDC)

Change Data Capture (CDC) is essential for real-time data streaming and analytics. The MySQL Debezium Connector is a powerful tool to capture database changes and stream them into platforms like Apache Kafka. This guide walks you through setting up the MySQL Debezium Connector, covering key configurations, replication user setup, and best practices to ensure seamless CDC.

Why Use the MySQL Debezium Connector?

The Debezium Connector integrates with MySQL's binary log (binlog) to detect and capture changes at the row level, providing accurate and timely insights. It’s particularly useful for implementing event-driven architectures and replicating data in distributed systems.


Prerequisites for Setting Up MySQL Debezium Connector

Before configuring Debezium, ensure the following:

  • Access to a MySQL database with appropriate privileges.

  • Apache Kafka environment for streaming the change events.

  • Enabled binary logging on your MySQL server.


Step-by-Step Setup Guide

1. Configure the MySQL Database

Enable Binary Logging

Binary logging in MySQL records changes to the database in a log file, enabling Debezium to capture those changes. For Debezium to track and capture changes, the MySQL server must be configured to log changes in the binary log.

To enable binary logging and configure the server, update your MySQL configuration as follows:


SET GLOBAL binlog_format = 'ROW';

SET GLOBAL binlog_row_image = 'FULL';

SET GLOBAL gtid_mode = 'ON';

SET GLOBAL enforce_gtid_consistency = 'ON';


2. Set Up a Replication User

To allow Debezium to access the MySQL binary logs, you need to create a user with replication privileges. This user will be responsible for reading the binlog and capturing changes.

Create a replication user with the following commands:


CREATE USER 'debezium'@'%' IDENTIFIED BY 'password';

GRANT REPLICATION SLAVE, REPLICATION CLIENT ON . TO 'debezium'@'%';


This user allows the connector to read binary logs and capture changes.


3. Configure Debezium Connector Properties

Once you have your MySQL server and replication user set up, the next step is to configure the Debezium connector. This step connects Debezium to your MySQL database and specifies the settings needed for data capture.

Here are the key configuration properties required:


Property

Description

database.hostname

Hostname of the MySQL server.

database.port

MySQL server port (default is 3306).

database.user

Username with replication privileges (e.g., debezium).

database.password

Password for the above user.

Unique identifier for the MySQL server.

Logical name for the database server in Kafka topics.

database.include.list

Comma-separated list of databases to monitor.

table.include.lis

Comma-separated list of tables to monitor.

include.schema.changes

Whether to include schema changes (true or false).

snapshot.mode

Defines the behavior for the initial snapshot.

To connect the Debezium Connector

  1. Create a Configuration File

Create a JSON or properties file containing the above configuration. 

  1. Deploy the Connector

    1. Place the configuration file in the directory used by your Kafka Connect worker.

    2. Use the Kafka Connect REST API to deploy the connector.

  2. Verify the Connection

    1. Check the Kafka Connect logs to ensure that the connector is running and reading the binary logs.


4. Deploying the Connector

  • Start your Kafka Connect worker and deploy the connector configuration.

  • Verify the connector logs to ensure that it’s reading the binlog and streaming changes.


Tips for Reliable CDC

  1. Monitor Binlog Size and Retention: Use the binlog_expire_logs_seconds setting to control how long binary logs are retained.

  2. Use Dedicated Server IDs: Each MySQL server in a distributed environment must have a unique server-id to avoid conflicts.


Conclusion

Setting up the MySQL Debezium Connector is a straightforward process that enables robust and real-time data synchronization. By following the steps outlined above, you can ensure reliable CDC, helping your applications respond instantly to database changes.

Comments


bottom of page