Validate Source File Integrity Before ETL Ingestion
ETL pipelines that ingest file-based sources are vulnerable to silent upstream changes — a vendor quietly reformats their export, a shared drive file gets overwritten, or a transfer corrupts bytes mid-stream. Deliteful's File Hash Checker generates MD5, SHA-1, SHA-256, and SHA-512 checksums so you can confirm a source file's integrity before it enters your pipeline.
File-based ETL ingestion — whether pulling CSVs from SFTP, processing vendor Excel drops, or loading JSON exports from third-party APIs — relies on the assumption that the source file is what it claims to be. A hash check at intake is the simplest way to validate that assumption. Compare the incoming file's checksum against an expected value from the sender, or against the previous delivery's hash, to catch schema drift, truncation, or corruption before it propagates downstream.
Deliteful supports the formats most common in ETL source layers: CSV (up to 500MB), Excel XLSX/XLS (up to 200MB), JSON (up to 50MB), and more. All algorithms run in a single streaming pass, keeping processing time low even for larger files. The plain-text output report is easy to parse and log as part of an ingestion audit trail. Batches of up to 50 files can be hashed in a single run.
How it works
- 1
Sign in with Google
Create your free Deliteful account with Google OAuth — no credit card required.
- 2
Upload the incoming source file
Drop in the CSV, Excel, JSON, or other file you're about to ingest.
- 3
Select your algorithm
Use SHA-256 for standard pipeline integrity checks, or leave blank to run all four algorithms by default (MD5, SHA-1, SHA-256, SHA-512).
- 4
Compare against expected hash
Match the output against the sender's published checksum or your previous-delivery baseline.
- 5
Log and proceed or reject
If hashes match, proceed with ingestion. If they differ, flag the file for investigation before it enters your pipeline.
Frequently asked questions
- How do I use file hashing to validate ETL source files?
- Generate a hash of the incoming file before ingestion and compare it against either the sender's published checksum or the hash of the previous delivery. A mismatch indicates the file changed — either intentionally (new data) or unexpectedly (corruption, truncation, schema change).
- What is the file size limit for CSV and Excel files?
- CSV files up to 500MB and Excel files up to 200MB are supported. The per-batch limit is 50 files or 2GB total, whichever comes first.
- Can I use this to detect if a recurring vendor file delivery changed unexpectedly?
- Yes. Hash each delivery on arrival and store the checksum in your pipeline log. If a delivery's hash differs from the previous one, you know the file changed — even if the row count appears the same.
- Is MD5 or SHA-256 better for ETL file validation?
- SHA-256 is preferred for new implementations. MD5 is still widely used in legacy systems and is acceptable for non-security-critical integrity checks where the sender provides an MD5 checksum for comparison. Avoid MD5 for any security-sensitive validation.
- Can I hash multiple source files from the same batch delivery at once?
- Yes. Upload the entire batch (up to 50 files or 2GB) and receive a single report with checksums for all files, making it easy to validate a full delivery in one step.
Create your free Deliteful account with Google and add checksum validation to your ETL intake process today.