File Metadata Reports for ETL Pipeline Validation and Pre-Ingestion Checks

ETL pipelines fail in predictable ways: wrong file format, unexpected file size, a timestamp that does not match the expected delivery window. The File Metadata Report tool surfaces all of these signals before ingestion begins, returning a structured JSON record for every file in the batch.

Pre-ingestion validation is standard practice in any mature ETL workflow, but most teams rely on ad hoc scripts or manual checks that are fragile and hard to audit. A consistent, reproducible metadata report for each incoming batch closes that gap. Knowing that a CSV arrived at 4.2MB rather than the expected 40MB — before your transformation step runs — can save hours of debugging and prevent corrupted loads from reaching downstream consumers.

Deliteful supports the file types most common in ETL source layers: CSV, Excel, JSON, ZIP, PDF, and DOCX. Batches up to 50 files or 2GB are processed in a single run. The JSON output uses a flat, consistent structure per file record — straightforward to log, diff against expectations, or feed into a validation schema check.

How it works

  1. 1

    Upload the incoming source files

    Drop the delivery batch — CSVs, Excel files, JSONs, ZIPs, or mixed formats — up to 50 files per run.

  2. 2

    Receive the JSON metadata report

    Each file gets a record with name, size_bytes, mime_type, and filesystem timestamps.

  3. 3

    Integrate into your pre-ingestion validation step

    Compare the report against expected file specs, log it alongside the pipeline run, or trigger alerts on anomalies before the transform stage begins.

Frequently asked questions

How does MIME type detection work and is it reliable enough for pipeline gating?
MIME type is inferred from the file extension, not binary inspection. This reliably catches renamed or misclassified files in most partner delivery scenarios, but should be paired with a schema check on file contents for strict pipeline gating.
Can I use the JSON output to build a file delivery audit log?
Yes. The consistent per-file record structure maps well to an audit log schema. Store the report alongside the pipeline run metadata to maintain a full history of what arrived in each delivery window.
What is the maximum batch size?
Up to 50 files or 2GB total per batch. For high-volume deliveries, split across multiple Deliteful runs and merge the JSON outputs before ingestion.
Does this replace a checksum or hash verification step?
No. Metadata reports give you size and type signals. For integrity verification — confirming files were not tampered with in transit — you still need checksum comparison. These are complementary checks.

Create your free Deliteful account with Google and add a reliable pre-ingestion metadata check to your ETL workflow today.