Seed Data with Unique Constraint Columns
Learn how the seed generator handles UNIQUE columns like email and username. Understand collision probability and strategies for large datasets.
Detailed Explanation
Unique Constraint Considerations
Columns with UNIQUE constraints require every value to be distinct. The seed generator produces data with high variance to minimize collisions, but does not strictly enforce uniqueness.
How Unique Columns Are Generated
The generator does not have a special "unique mode." Instead, it relies on the large combinatorial space of its data pools:
| Column Type | Combinatorial Space |
|---|---|
| 40 first names × 40 last names × 99 numbers × 10 domains = 15,840,000 | |
| Username | 40 names × 999 numbers = 39,960 |
| UUID | 16^32 ≈ 3.4 × 10^38 |
| Phone | ~800 area codes × 900 mid × 9000 end > 6 billion |
Collision Probability
For typical seed sizes (10–1,000 rows), the collision probability is negligible:
- 10 rows: virtually zero chance of collision
- 100 rows: < 0.1% chance for emails
- 1,000 rows: < 1% chance for emails, still rare
What If Collisions Occur?
If you run the generated INSERT statements and encounter a unique constraint violation:
- Regenerate: Click Regenerate to get a new random seed, which produces entirely different data
- Reduce rows: Lower the row count to decrease collision probability
- Remove the UNIQUE constraint from the CREATE TABLE input temporarily, generate the data, then add the constraint back
UUID Columns
For UUID-type columns (or columns named uuid), the generator produces RFC 4122-format UUIDs. The space is so vast that collisions are statistically impossible at any practical row count.
Practical Recommendation
For most development and testing scenarios with up to 1,000 rows, unique constraint collisions are extremely unlikely. If you are generating data for a column with unusual uniqueness requirements (e.g., short codes, 2-character abbreviations), consider using the JSON output and post-processing it to remove duplicates.
Use Case
Your users table has UNIQUE constraints on both email and username. Before running the generated seed data, you want to understand how likely collisions are and what to do if they occur, especially when generating 500+ rows for load testing.