Optimizing Bulk JSON to SQL INSERT Performance

Learn performance optimization techniques for bulk inserting JSON data into SQL databases. Covers batch sizing, transaction wrapping, COPY commands, index management, and parallelism.

Batch Operations

Detailed Explanation

Bulk Insert Performance Optimization

When converting large JSON datasets (thousands to millions of records) into SQL, naive row-by-row INSERT is prohibitively slow. Several techniques can improve import speed by 10x to 100x.

Batch INSERT Sizing

Instead of one INSERT per row, batch rows into multi-value statements:

-- Slow: one INSERT per row
INSERT INTO events (id, type) VALUES (1, 'click');
INSERT INTO events (id, type) VALUES (2, 'view');
INSERT INTO events (id, type) VALUES (3, 'click');

-- Fast: batched INSERT
INSERT INTO events (id, type) VALUES
  (1, 'click'),
  (2, 'view'),
  (3, 'click');

Optimal batch size varies by database but is typically 500-5000 rows per statement.

Transaction Wrapping

Wrap all inserts in a single transaction to avoid per-statement commit overhead:

BEGIN;
INSERT INTO events (id, type) VALUES (1, 'click'), (2, 'view'), ...;
INSERT INTO events (id, type) VALUES (5001, 'click'), (5002, 'view'), ...;
COMMIT;

COPY Command (PostgreSQL)

For maximum speed, PostgreSQL's COPY command reads CSV/TSV directly:

COPY events (id, type, timestamp) FROM STDIN WITH (FORMAT csv);
1,click,2024-06-15
2,view,2024-06-15
\.

The converter can generate COPY-compatible output from JSON arrays.

LOAD DATA INFILE (MySQL)

MySQL's equivalent for bulk loading:

LOAD DATA INFILE '/tmp/events.csv'
INTO TABLE events
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';

Index and Constraint Management

Temporarily disabling indexes and constraints during bulk insert can dramatically improve speed:

-- Before bulk insert
ALTER TABLE events DISABLE TRIGGER ALL;     -- PostgreSQL
SET FOREIGN_KEY_CHECKS = 0;                 -- MySQL

-- ... bulk inserts ...

-- After bulk insert
ALTER TABLE events ENABLE TRIGGER ALL;
SET FOREIGN_KEY_CHECKS = 1;

Performance Comparison

Method ~Rows/second (PostgreSQL)
Individual INSERTs 100-500
Batched INSERTs (1000 rows) 10,000-50,000
COPY command 100,000-500,000

Use Case

When loading a 500MB JSON export from an analytics service into a PostgreSQL data warehouse for reporting, bulk insert optimization reduces import time from hours to minutes. This is critical for nightly ETL jobs and data pipeline reliability.

Try It — JSON to SQL

Open full tool