File Metadata Reports for ETL Pipeline Validation and Pre-Ingestion Checks
ETL pipelines fail in predictable ways: wrong file format, unexpected file size, a timestamp that does not match the expected delivery window. The File Metadata Report tool surfaces all of these signals before ingestion begins, returning a structured JSON record for every file in the batch.
Pre-ingestion validation is standard practice in any mature ETL workflow, but most teams rely on ad hoc scripts or manual checks that are fragile and hard to audit. A consistent, reproducible metadata report for each incoming batch closes that gap. Knowing that a CSV arrived at 4.2MB rather than the expected 40MB — before your transformation step runs — can save hours of debugging and prevent corrupted loads from reaching downstream consumers.
Deliteful supports the file types most common in ETL source layers: CSV, Excel, JSON, ZIP, PDF, and DOCX. Batches up to 50 files or 2GB are processed in a single run. The JSON output uses a flat, consistent structure per file record — straightforward to log, diff against expectations, or feed into a validation schema check.
How it works
- 1
Upload the incoming source files
Drop the delivery batch — CSVs, Excel files, JSONs, ZIPs, or mixed formats — up to 50 files per run.
- 2
Receive the JSON metadata report
Each file gets a record with name, size_bytes, mime_type, and filesystem timestamps.
- 3
Integrate into your pre-ingestion validation step
Compare the report against expected file specs, log it alongside the pipeline run, or trigger alerts on anomalies before the transform stage begins.
Frequently asked questions
- How does MIME type detection work and is it reliable enough for pipeline gating?
- MIME type is inferred from the file extension, not binary inspection. This reliably catches renamed or misclassified files in most partner delivery scenarios, but should be paired with a schema check on file contents for strict pipeline gating.
- Can I use the JSON output to build a file delivery audit log?
- Yes. The consistent per-file record structure maps well to an audit log schema. Store the report alongside the pipeline run metadata to maintain a full history of what arrived in each delivery window.
- What is the maximum batch size?
- Up to 50 files or 2GB total per batch. For high-volume deliveries, split across multiple Deliteful runs and merge the JSON outputs before ingestion.
- Does this replace a checksum or hash verification step?
- No. Metadata reports give you size and type signals. For integrity verification — confirming files were not tampered with in transit — you still need checksum comparison. These are complementary checks.
Create your free Deliteful account with Google and add a reliable pre-ingestion metadata check to your ETL workflow today.