Chunk CSVs by Row Count to Fit ETL Batch Size Limits

ETL pipelines with batch-load stages often enforce strict record limits per transaction — Salesforce bulk API caps at 10,000 records per batch, many SQL bulk-insert tools perform best under 50,000 rows. When source CSVs arrive oversized, the load stage fails or degrades. Splitting the file before the load stage is the cleanest fix.

Ad-hoc splitting with awk or Python works but adds fragile, undocumented steps to your pipeline. A reliable external tool with a consistent interface reduces pipeline complexity and eliminates the need to version-control one-off split scripts. Deliteful processes splits server-side, requires no local dependencies, and produces sequentially numbered output files that sort correctly for ordered batch loading.

Every output chunk retains the source header row, which means load-stage scripts that reference column names by header do not require modification. Row order is preserved across all chunks, so sequence-sensitive loads — event logs, audit trails, time-series records — arrive at the destination in the correct order.

How it works

  1. 1

    Upload the oversized source CSV

    Upload the file that is exceeding your ETL load stage's record or batch size limit.

  2. 2

    Set the row limit to match your load target

    Enter the maximum records your load stage accepts per batch — for Salesforce Bulk API, this is 10,000; for most SQL loaders, 50,000–100,000 is typical.

  3. 3

    Download sequentially numbered chunks

    Output files are numbered in order and each includes the header row, ready to feed directly into your load stage.

Frequently asked questions

What row limit should I use for Salesforce Bulk API loads?
The Salesforce Bulk API v1 processes up to 10,000 records per batch. Set your max rows to 10,000 to produce correctly sized chunks for each API call.
Are output files numbered in sequence?
Yes. Output files are numbered sequentially (e.g., file_1.csv, file_2.csv) so they sort and load in the correct order.
Does the tool sort or reorder rows?
No. Row order from the source file is preserved exactly across all output chunks. This is critical for sequence-sensitive ETL loads.
Can I use this for files produced by database dump tools?
Yes. CSVs exported from PostgreSQL COPY, MySQL SELECT INTO OUTFILE, or similar tools are fully compatible as long as they are valid UTF-8 CSV files.

Create a free Deliteful account with Google and chunk your ETL source CSVs to the exact batch size your load stage requires.