Chunk CSVs by Row Count to Fit ETL Batch Size Limits
ETL pipelines with batch-load stages often enforce strict record limits per transaction — Salesforce bulk API caps at 10,000 records per batch, many SQL bulk-insert tools perform best under 50,000 rows. When source CSVs arrive oversized, the load stage fails or degrades. Splitting the file before the load stage is the cleanest fix.
Ad-hoc splitting with awk or Python works but adds fragile, undocumented steps to your pipeline. A reliable external tool with a consistent interface reduces pipeline complexity and eliminates the need to version-control one-off split scripts. Deliteful processes splits server-side, requires no local dependencies, and produces sequentially numbered output files that sort correctly for ordered batch loading.
Every output chunk retains the source header row, which means load-stage scripts that reference column names by header do not require modification. Row order is preserved across all chunks, so sequence-sensitive loads — event logs, audit trails, time-series records — arrive at the destination in the correct order.
How it works
- 1
Upload the oversized source CSV
Upload the file that is exceeding your ETL load stage's record or batch size limit.
- 2
Set the row limit to match your load target
Enter the maximum records your load stage accepts per batch — for Salesforce Bulk API, this is 10,000; for most SQL loaders, 50,000–100,000 is typical.
- 3
Download sequentially numbered chunks
Output files are numbered in order and each includes the header row, ready to feed directly into your load stage.
Frequently asked questions
- What row limit should I use for Salesforce Bulk API loads?
- The Salesforce Bulk API v1 processes up to 10,000 records per batch. Set your max rows to 10,000 to produce correctly sized chunks for each API call.
- Are output files numbered in sequence?
- Yes. Output files are numbered sequentially (e.g., file_1.csv, file_2.csv) so they sort and load in the correct order.
- Does the tool sort or reorder rows?
- No. Row order from the source file is preserved exactly across all output chunks. This is critical for sequence-sensitive ETL loads.
- Can I use this for files produced by database dump tools?
- Yes. CSVs exported from PostgreSQL COPY, MySQL SELECT INTO OUTFILE, or similar tools are fully compatible as long as they are valid UTF-8 CSV files.
Create a free Deliteful account with Google and chunk your ETL source CSVs to the exact batch size your load stage requires.