Published on 2023-08-18 by Marat Bogatyrev.
Tweet
Sometimes, when testing backup utilities like WAL-G, Pgbackrest, and others, it’s necessary to intentionally corrupt data. This is because the features advertised by these tools don’t always correctly verify checksum corruption, for example, as in the case of WAL-G, where the tool fails to detect bad checksums when using the -v flag.
Step-by-Step Guide to Corrupting Data
- Create DB and Test Table
Let’s create a test database and a table for future data corruption tests.
sudo -u postgres psql -c "CREATE DATABASE maratos_db;" sudo -u postgres psql -d maratos_db -c "CREATE TABLE public.test_table (id INT4 NOT NULL,name VARCHAR) WITH (fillfactor = 100);"
- Add Some Data
sudo -u postgres psql -d maratos_db -c "INSERT INTO public.test_table (id, name) VALUES (1, 'MARAT'); INSERT INTO public.test_table (id, name) VALUES (2, 'OLGA'); INSERT INTO public.test_table (id, name) VALUES (3, 'MISHA');" sudo -u postgres pgbench -i -s 1 -d maratos_db
- Locate the File in the PostgreSQL Data Directory
To find the file associated with our table, run the following command:
sudo -u postgres psql -Atc "select format('%s/%s', current_setting('data_directory'), pg_relation_filepath('test_table'))" maratos_db sudo -u postgres psql -d maratos_db -Atc "select format('%s/%s', current_setting('data_directory'), pg_relation_filepath('pgbench_accounts'))"
Example output:
/data/pg_data/postgres/base/16495/16707 /data/pg_data/postgres/base/16495/16710
- Create a Checkpoint
Run the following command to create a checkpoint:
sudo -u postgres psql -c "checkpoint" maratos_db
- Check for Data in the File
Use hexdump tool to view the file’s contents:
hexdump -C /data/pg_data/postgres/base/16495/16707
- Corrupt Data in the Files. We are corrupting data in two tables using different methods.
- test_table: We replace the existing data at a specific location within the file. Specifically, we overwrite data starting at byte 8000 in the file with a new value “Hello1”, ensuring no truncation occurs.
- pgbench_accounts: We append 12 bytes of random data to the existing content of the file. By doing so, we alter the file size and misalign all subsequent data, including page headers, effectively breaking the block structure.
echo -n "Hello1" | dd conv=notrunc oflag=seek_bytes seek=8000 bs=6 count=1 of=/var/lib/postgresql/16/main/base/16495/16707 rm tempfile dd if=/var/lib/postgresql/16/main/base/16495/16710 of=tempfile bs=8190 count=1 dd if=/dev/urandom bs=12 count=1 >> tempfile dd if=/var/lib/postgresql/16/main/base/16495/16710 bs=8190 skip=1 oflag=seek_bytes >> tempfile mv tempfile /var/lib/postgresql/16/main/base/16495/16710 chown postgres:postgres /var/lib/postgresql/16/main/base/16495/16710
- Check the Data in the File Again
Check the file contents again with hexdump:
hexdump -C /data/pg_data/postgres/base/16495/16707
- Restart PostgreSQL and Access the Data
Restart PostgreSQL and attempt to access the data, then make a backup with WAL-G etc:
systemctl restart postgresql
Tweet