Background
TiCDC is an important component for TiDB to synchronize data to various downstream systems. When synchronizing data to downstream systems, data integrity is especially important. However, TiCDC does not support end-to-end data integrity verification yet.
Spec
Provide below cluster level boolean type option in TiDB side.
tidb_enable_row_level_checksum = [true|false] # the default value is false.
SET GLOBAL tidb_enable_row_level_checksum = true;
After the customer enables this option, every data change for a row in non-system databases will append an invisible field that is used to store a computed checksum value based on the content of the row. This invisible field is just for data correctness checking purposes and is transparent to the customer.
TiCDC and end users would use this checksum value to verify the data integrity.
Development tracking for the TiDB part
Development tracking for the TiKV part
Development tracking for the TiCDC part
Background
TiCDC is an important component for TiDB to synchronize data to various downstream systems. When synchronizing data to downstream systems, data integrity is especially important. However, TiCDC does not support end-to-end data integrity verification yet.
Spec
Provide below cluster level boolean type option in TiDB side.
After the customer enables this option, every data change for a row in non-system databases will append an invisible field that is used to store a computed checksum value based on the content of the row. This invisible field is just for data correctness checking purposes and is transparent to the customer.
TiCDC and end users would use this checksum value to verify the data integrity.
Development tracking for the TiDB part
tidb_row_checksumto return the checksum value of a row. *: add tidb_row_checksum() as a builtin function #43479Let tidb be aware of the origin state (none or public) of a column if its current state is not public.-- we always append two checksums if there is a column whose state is not public, thus no need to know the direction of state transform.tidb_enable_row_level_checksumto enable or disable the checksum calculation when inserting new rows. When it's enabled, multi-schema change will be blocked.add columnschema change, and generate two checksum values if necessary.drop columnschema change, and generate two checksum values if necessary.modify columnschema change, and generate two checksum values if necessary.tablecodecpackage whenEncodeRowfunction is used. Calculate the CRC32 result for each column when executingencodeRowCols, a checksum result is returned finally.chunckDecoderin tidb if necessary. util: extend row format with checksum #42859internal_handle_requestandPointGetterfor chunk encoding processing in tikv if necessary. storage: add checksum logic in row slice, add cop and get test cases tikv/tikv#14611Skip the checksum part processing in tiflash if necessary.-- tiflash decodes a row value byappendRowV2ToBlockImpl. it iterates columns and decodes them one by one here, that is, the checksum part shall be already discarded in the current implementation.Development tracking for the TiKV part
Development tracking for the TiCDC part