Pre-Ingestion File Size Reports for ETL and Data Pipeline Teams

ETL jobs that process unexpectedly large files can exhaust memory limits, exceed connector payload caps, or trigger timeout errors that are expensive to diagnose. Deliteful's File Size Report tool lets pipeline engineers check the exact byte size of every source file before ingestion — catching oversized inputs before they become runtime failures.

Source file sizes in ETL work are rarely what they appear. A CSV exported from a legacy ERP might compress to 8 MB on disk but expand to 200 MB in memory when parsed. Knowing the on-disk size in advance lets engineers make informed decisions about batch splitting, streaming vs. bulk load strategies, and whether a file needs preprocessing before it hits the pipeline. This is especially important when working with cloud-native ingestion tools that have strict payload or row-count limits.

Deliteful supports all common ETL source formats — CSV, JSON, Excel (XLSX/XLS), and more — in batches of up to 50 files and 2 GB total. The output is a tab-separated .txt report with raw byte counts and base-1024 human-readable equivalents, making it easy to compare against your connector's documented limits or log the sizes for pipeline observability records.

How it works

  1. 1

    Sign in with Google

    Create a free Deliteful account in 3 clicks — no credit card or installation required.

  2. 2

    Upload your source files

    Add up to 50 CSV, JSON, Excel, or other supported files to the batch.

  3. 3

    Generate the size report

    Deliteful returns a .txt file listing each filename with its exact byte count and human-readable size.

  4. 4

    Use the output to inform pipeline design

    Compare file sizes against connector limits, memory budgets, or batch-splitting thresholds before kicking off your ETL job.

Frequently asked questions

Does the file size report work on CSV and JSON files used as ETL sources?
Yes. CSV (up to 500 MB per file) and JSON (up to 50 MB per file) are both supported. You can include both in the same batch alongside Excel, PDF, and other formats.
Can I use the size report output to document pipeline inputs for observability logging?
Yes. The plain-text, tab-separated format is easy to parse programmatically or attach to a pipeline run log. It includes the filename, byte count, and human-readable size for each file.
What happens if a file exceeds the per-format size limit?
Files that exceed the per-format limit (e.g., 500 MB for CSV, 50 MB for JSON) cannot be uploaded. For pre-check purposes, you can measure files using your OS and compare against these limits before uploading.
Are source files modified or retained after the report is generated?
No. The tool is read-only — your source files are never modified. Files are stored temporarily during processing and are not retained after your session.

Sign up free with Google and run a pre-ingestion size audit on your ETL source files before your next pipeline run.