Handling Nullable Columns in Seed Data
Learn how the seed generator handles nullable columns by inserting NULL values at a realistic frequency. Understand NOT NULL vs nullable behavior.
Detailed Explanation
Nullable Column Behavior
Real-world databases contain columns where values are optional. The seed generator simulates this by inserting NULL for nullable columns with a controlled probability.
How Nullable Detection Works
The parser checks each column definition for the NOT NULL constraint:
VARCHAR(255) NOT NULL→ always receives a valueVARCHAR(255)(no NOT NULL) → ∼10% chance of NULL per rowINTEGER DEFAULT 0(no NOT NULL) → ∼10% chance of NULL per row
Example
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
first_name VARCHAR(100) NOT NULL,
last_name VARCHAR(100) NOT NULL,
middle_name VARCHAR(100), -- nullable
phone VARCHAR(20), -- nullable
department VARCHAR(50) NOT NULL,
notes TEXT, -- nullable
hire_date DATE NOT NULL
);
In 100 rows of generated data, middle_name, phone, and notes will each have roughly 10 NULL values, while first_name, last_name, department, and hire_date will always have values.
Why 10% NULL Rate?
A 10% NULL rate is a pragmatic middle ground:
- Too few NULLs (1–2%) would rarely surface UI bugs related to missing data
- Too many NULLs (50%+) would make the dataset unrealistic
- 10% ensures NULLs appear in every significant dataset while keeping most rows complete
Columns That Are Never NULL
Regardless of the declared constraint, these columns never receive NULL:
- Primary key columns
- Auto-increment / SERIAL columns
- Columns explicitly marked
NOT NULL
Testing NULL Handling
NULL seed data helps verify:
- How your UI renders empty cells in tables
- Whether search queries handle NULL correctly (
IS NULLvs= '') - Form pre-population when editing a record with missing fields
- API serialization of null vs missing vs empty string
Use Case
Your application displays employee records in a data table. Optional fields like middle name and phone number are sometimes empty. You need seed data with realistic NULL distribution to test that the table renders gracefully, search filters work correctly, and export functions handle missing values.