SensitiveInfoCheck

Available in Liquibase 5.1 and later, the SensitiveInfoCheck Policy check helps you identify and prevent sensitive personal identifiable information (PII) from being deployed through your database changelogs. This check employs the Phileas library of Identifiers to scan INSERT and UPDATE SQL statements in your changesets for common PII identifiers, such as social security numbers, credit card numbers, email addresses, and phone numbers. When the check detects potential PII, it flags the changelog and can block deployment, protecting your organization from accidentally committing sensitive data to version control or deploying it to your databases. You can customize which PII identifiers to detect and configure whether violations should trigger warnings or hard stops in your deployment pipeline. This automated detection also supports compliance efforts by providing audit-ready evidence that your team is actively monitoring and preventing PII exposure in database changes.

Scope

Database

changelog

Any SQL database

Known limitations

The SensitiveInfoCheck has the following detection limitations:

  • Phone numbers:

    The PHONE_NUMBER identifier does not detect vanity phone numbers containing letters instead of digits (example: 1-800-FLOWERS).

  • Email addresses:

    The EMAIL_ADDRESS identifier does not detect email addresses missing a top-level domain (TLD), even when you set validateTLD=false

    Example: user@domain without .com, .org, etc..

  • MAC addresses:

    The MAC_ADDRESS identifier detects standard colon-separated (00:1B:44:11:3A:B7) and hyphen-separated (00-1B-44-11-3A-B7) formats but does not recognize Cisco dot notation (001B.4411.3AB7).

  • Address suffixes: The underlying Phileas library requires a street suffix for address detection. Addresses without suffixes, such as Street, Way, Blvd, etc, will not be identified as PII, even though they are valid USA addresses.

Before you begin

  • Create a Check Settings file

  • Update any existing Check Settings file to 5.1+ by running liquibase checks show --auto-update=ON to ensure the SensitiveInfoCheck appears on the list of checks.

  • (Maven users only) Add the liquibase-checks dependency to your pom.xml file. See Add extensions with Maven for more information.

  • Ensure you have Java 17+ installed. If you used the Liquibase Installer, Java is included automatically. Otherwise, you must install Java manually.

Procedure

1

Enable the SensitiveInfoCheck

This check is disabled by default. To enable it, run the checks enable command: liquibase checks enable --check-name=SensitiveInfoCheck

2

Create a Checks settings file if it isn't created already.

WARNING: No default checks-settings file detected. Would you like to create and automatically use 'liquibase.checks-settings.conf' ? [1] Yes, create and use default check-settings file in the current working directory [2] Yes, create and use default check-settings file at a path I will specify [3] Just create default check-settings file in the CWD, do not use, and exit command [n] No, do not create default check-settings file, and exit command

Choose option one or two, depending on where you want your Checks settings file to be located.

If you already have an existing Checks settings file, this message will appear:

ALERT: A valid Liquibase Checks Settings file already exists at '[path/to/liquibase.checks-settings.conf]'. Do you want to: [O] Overwrite the existing file [C] Continue using the existing file

Choose either option to add the new check to the Checks settings file you want to use. If you get an error message that states: One or more specified checks not found: 'sensitiveInfoCheck'.

  • Update the file to 5.1+ by running the command below to ensure the SensitiveInfoCheck appears on the list of checks. liquibase checks show --auto-update=ON

  • Run the command in step 1 again.

3

Choose the PII identifier you want to search for.

In this step, we will choose a Phileas Identifier to flag which personal information the check triggers. We recommend that you name your check based on the type of identifier that you want to flag, so that you know what sensitive information is being committed. If you are flagging multiple identifiers, ensure the title reflects all of them. The table below can help you group your identifiers into bundles to facilitate the search for specific items in your database.

Examples:

  • If you only want an individual identifier to trigger, you could name it: SSN_SensitiveInfoCheck to flag that Social Security Numbers exist in your changelog.

  • If you want to flag a bundle of identifiers, you could name it: Financial_Info_SensitiveInfoCheck to flag that any financial information such as credit card numbers or routing numbers exist in your changelog.

  • If you want to flag all identifiers, you could name it: AllSensitiveInfoCheck to flag that any identifiers exist in your changelog.

The example message appears in the terminal.

Example message

The check 'SensitiveInfoCheck' can not be enabled directly because one or more fields does not have a default value. Create a copy of this check and initiate the customization workflow. Give your check a short name for easier identification (up to 64 alpha-numeric characters only) [SensitiveInfoCheck1]:

Once you add the desired name of the copied check in the CLI, Liquibase will make a copy of the SensitiveInfoCheck, and you will see the example output appear in the terminal.

Example output

New check 'SSNSensitiveInfoCheck' created from 'SensitiveInfoCheck'

Personal Information

Identifier

What It Detects

Example

SSN

US Social Security Numbers in XXX-XX-XXXX format

123-45-6789

FIRST_NAME

Common first names from dictionary

John

SURNAME

Surnames from dictionary

Berezenskyi

AGE

Age references in text

"Patient is 67 years old"

DATE

Dates in various formats (MM/DD/YYYY, Month Day Year)

02/29/2023

Contact Information

Identifier

What It Detects

Example

EMAIL_ADDRESS

Email addresses in standard format

alice.johnson@example.com

PHONE_NUMBER

Phone numbers in various formats

555-123-4567

PHONE_NUMBER_EXTENSION

Phone extensions (ext, x)

"Call ext 1234"

Location Data

Identifier

What It Detects

Example

ZIP_CODE

US ZIP codes in 5-digit or ZIP+4 format

90210

CITY

US city names from dictionary

New York

STATE

US state full names

California

STATE_ABBREVIATION

US state two-letter codes

CA

COUNTY

US county names

Los Angeles County

STREET_ADDRESS

Street addresses with numbers and suffixes

123 Main Street

Financial Information

Identifier

What It Detects

Example

CREDIT_CARD

Credit card numbers (Visa, MasterCard, Amex, Discover)

4532015112830366

BANK_ROUTING_NUMBER

9-digit US bank routing numbers

021000021

IBAN_CODE

International Bank Account Numbers

GB33BUKB20201555555555

BITCOIN_ADDRESS

Bitcoin addresses (P2PKH, P2SH, Bech32)

1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa

CURRENCY

Currency amounts with symbols ($, €, £)

$1,234.56

Government IDs

Identifier

What It Detects

Example

DRIVERS_LICENSE

Driver's license numbers

DL123456789

PASSPORT_NUMBER

Passport numbers (9 digits or alphanumeric)

123456789

VIN

Vehicle Identification Numbers (17 characters)

1HGBH41JXMN109186

Medical Information

Identifier

What It Detects

Example

HOSPITAL

Hospital names from dictionary

Mayo Clinic

HOSPITAL_ABBREVIATION

Hospital abbreviations (MGH, CHOP, etc.)

MGH

PHYSICIAN_NAME

Physician names with Dr. title

Dr. Anderson

Network Information

Identifier

What It Detects

Example

IP_ADDRESS

IPv4 and IPv6 addresses

192.168.1.100

MAC_ADDRESS

MAC addresses in colon-separated format

00:1B:44:11:3A:B7

Other Identifiers

Identifier

What It Detects

Example

TRACKING_NUMBER

Shipping tracking numbers (UPS, FedEx, USPS)

1z1234567890123456

URL

URLs with various protocols (http, https, ftp)

https://www.example.com

4

Set the Severity level.

This message will appear in the terminal: Set the Severity to return a code of 0-4 when triggered. (options: 'INFO'|0, 'MINOR'|1, 'MAJOR'|2, 'CRITICAL'|3, 'BLOCKER'|4)? [INFO]:

In this example, we will set the severity to 0.

5

Set the Identifiers you want to search the Changelog for.

This message will appear in the terminal: Set 'IDENTIFIERS' as a comma-separated list of identifier types (options: SSN, ZIP_CODE, CREDIT_CARD, EMAIL_ADDRESS, PHONE_NUMBER, PHONE_NUMBER_EXTENSION, IP_ADDRESS, MAC_ADDRESS, FIRST_NAME, SURNAME, AGE, DATE, CITY, STATE, STATE_ABBREVIATION, COUNTY, STREET_ADDRESS, BANK_ROUTING_NUMBER, BITCOIN_ADDRESS, CURRENCY, DRIVERS_LICENSE, HOSPITAL, HOSPITAL_ABBREVIATION, IBAN_CODE, PASSPORT_NUMBER, PHYSICIAN_NAME, TRACKING_NUMBER, URL, VIN, IDENTIFIER):

In this example, we will check for SSN in our changelog by typing SSN into the CLI and pressing enter. If you want to set the identifier to any of the checks that are in the Optional additional identifier configurations table below, be aware there are optional configurations you will need to set to use these identifiers successfully. Note: This policy check must include one or multiple identifiers. Use a comma-separated list to specify more than one. If you choose to flag multiple identifiers, we recommend grouping them by category, such as Financial Information or Location Data. It is easier to manage and analyze them this way instead of flagging all of them at once.

Optional additional identifier configurations

If you choose one of the identifiers below, please note that you will have configuration options to select from.

PII Type

Configuration Option

Default

Description

ZIP_CODE

ZIP_CODE_VALIDATE

false

Enables census database validation

CREDIT_CARD

CREDIT_CARD_VALIDATE

true

Enables Luhn algorithm validation

CREDIT_CARD_IGNORE_UNIX_TIMESTAMP

false

Excludes timestamp patterns

EMAIL_ADDRESS

EMAIL_VALIDATE_RFC

true

Uses strict RFC validation

EMAIL_VALIDATE_TLD

false

Validates top-level domains

DATE

DATE_VALIDATE

false

Validates calendar dates

IBAN_CODE

IBAN_ALLOW_SPACES

false

Uses spaced IBAN format (limits to 22-char IBANs)

IDENTIFIER

IDENTIFIER_PATTERN

N/A

Accepts any regex pattern

6

Set the message you want to display when the check triggers.

You will see the default message in the terminal. The default message pinpoints the location of the detected SSN, which is usually all you need. You can customize it later to show more or less detail if you prefer.

Set 'MESSAGE' [Policy violation: raw {filter_type} detected in {stmt_type} at statement #{statement_number}, line {line_number}, positions {start}-{end}.{file_line_info}]:

Press Enter to save the message formatting.

Results

The command is enabled successfully. Customization complete. Review the table below to confirm your changes. +-----------------------+-----------+--------+----------+----------+--------------------------------+--------------------------------+ | Short Name | Scope | Type | Status | Severity | Customization | Description | +-----------------------+-----------+--------+----------+----------+--------------------------------+--------------------------------+ | SensitiveInfoCheck | changelog | sql, | disabled | 0 | IDENTIFIERS = null | This check triggers when PII | | | | xml, | | | MESSAGE = Policy violation: | (Personally Identifiable | | | | yaml, | | | raw {filter_type} detected in | Information) or PHI (Protected | | | | json | | | {stmt_type} at statement | Health Information) are | | | | | | | #{statement_number}, line | detected in changelogs or in | | | | | | | {line_number}, positions | your database. Copy this check | | | | | | | {start}-{end}.{file_line_info} | to configure the subset of | | | | | | | | 'identifiers' you wish to | | | | | | | | detect. | +-----------------------+-----------+--------+----------+----------+--------------------------------+--------------------------------+ | SSNSensitiveInfoCheck | changelog | sql, | enabled | 0 | IDENTIFIERS = SSN | This check triggers when PII | | | | xml, | | | MESSAGE = Policy violation: | (Personally Identifiable | | | | yaml, | | | raw {filter_type} detected in | Information) or PHI (Protected | | | | json | | | {stmt_type} at statement | Health Information) are | | | | | | | #{statement_number}, line | detected in changelogs or in | | | | | | | {line_number}, positions | your database. Copy this check | | | | | | | {start}-{end}.{file_line_info} | to configure the subset of | | | | | | | | 'identifiers' you wish to | | | | | | | | detect. | +-----------------------+-----------+--------+----------+----------+--------------------------------+--------------------------------+ Liquibase command 'checks enable' was executed successfully.

7

Run the checks run command

After enabling and customizing the SensitiveInfoCheck, run it against your changelog to detect the identifiers you've configured. Your customized check is stored in your checks settings file.

To run checks with a custom settings file, use the --checks-settings-file parameter. If you don't specify this parameter, Liquibase uses the default liquibase.checks-settings.conf file. You can also use the --changelog-file flag to target a specific changelog.

Checks settings file example: liquibase checks run --checks-settings-file=path_to_my_file_name.yaml

Checks settings file and specific changelog file example: liquibase checks run --checks-settings-file=<path_to_my_file_name>.yaml --changelogfile --<your_changelog_here>.sql

Note: Be sure to replace path_to_my_file_name.yaml with the path to the settings file where you included the new policy check. Also be sure to replace your_changelog_here with your real changelog name.

Example output - No violation

Example output - Check violation found Notice that the customized SSNSensitiveInfoCheck flagged two violations, and their messages tell you where the violation occurred.CHANGELOG CHECKS ---------------- Checks completed validation of the changelog and found the following issues:Check Name: Sensitive Info Check| PII and PHI Check (SSNSensitiveInfoCheck) Changeset ID: raw Changeset Filepath: changelogIncludeAllIdentifiers.sql Check Severity: INFO (Return code: 0) Message: Policy violation: raw SSN detected in INSERT at statement #1, line 1, positions 63-74.Check Name: Sensitive Info Check| PII and PHI Check (SSNSensitiveInfoCheck) Changeset ID: raw Changeset Filepath: changelogIncludeAllIdentifiers.sql Check Severity: INFO (Return code: 0) Message: Policy violation: raw SSN detected in INSERT at statement #25, line 2, positions 10-19.Changesets Validated: in changelogIncludeAllIdentifiers.sql ID: raw; Author: includeAllChecks run against each changeset: Sensitive Info Check| PII and PHI Check (Short names: SSNSensitiveInfoCheck)

liquibase checks run --checkname < yourCustomizedCheckName > --changelogfile --< yourChangelogHere >.sql

liquibase checks run --checks-settings-file=my_file_name.yaml
8

Analyze the results in the terminal output.

When the check finds an item flagged by the customized SensitiveInfoCheck in the changelog, you will see an output similar to this from the terminal:

CHANGELOG CHECKS ---------------- Checks completed validation of the changelog and found the following issues: Check Name: Sensitive Info Check| PII and PHI Check (SSNSensitiveInfoCheck) Changeset ID: raw Changeset Filepath: changelogIncludeAllIdentifiers.sql Check Severity: INFO (Return code: 0) Message: Policy violation: raw SSN detected in INSERT at statement #1, line 1, positions 63-74. Check Name: Sensitive Info Check| PII and PHI Check (SSNSensitiveInfoCheck) Changeset ID: raw Changeset Filepath: changelogIncludeAllIdentifiers.sql Check Severity: INFO (Return code: 0) Message: Policy violation: raw SSN detected in INSERT at statement #25, line 2, positions 10-19. Changesets Validated: in changelogIncludeAllIdentifiers.sql ID: raw; Author: includeAll Checks run against each changeset: Sensitive Info Check| PII and PHI Check (Short names: SSNSensitiveInfoCheck)

Notice that the customized SSNSensitiveInfoCheck flagged two violations and their messages tell you where the violation occurred.

Results

Now you can see how the check runs and what the results look like when it flags an issue within the changelog.

You can then remove, mask, or handle the sensitive information appropriately before committing changes to your database.