Batch MIME Type Validation for Data Engineers

A single mislabeled file injected into an ETL pipeline can cascade into schema errors, failed ingestions, or silent data corruption downstream. Deliteful's File MIME Type Detector runs content-based MIME inspection across entire file batches and delivers a tab-separated report you can use as a pre-ingestion validation gate.

Data engineers routinely receive file drops from external vendors, partners, or automated exports where the declared extension cannot be trusted. A CSV that is actually a TSV, a ZIP renamed to .xlsx, or a zero-byte placeholder file all behave differently at ingestion time. Content-based MIME detection surfaces these discrepancies before they reach your pipeline, letting you route, reject, or flag files programmatically based on the report output.

The report format is deliberately minimal: a tab-separated .txt file with two columns — filename and mime_type — designed to be diffed against an expected manifest, loaded into a DataFrame, or checked in a shell script. Edge cases are handled explicitly: empty files report as application/x-empty and unrecognized files as application/octet-stream, so there are no silent unknowns.

How it works

  1. 1

    Create a free Deliteful account

    Sign in with Google — takes about 3 clicks, no credit card needed.

  2. 2

    Upload the file batch

    Upload up to 50 files per batch (2GB total max) — ZIP, CSV, XLSX, JSON, TAR, and more are all supported.

  3. 3

    Download and integrate the report

    Get a tab-separated .txt report mapping each filename to its detected MIME type for use in your validation step.

Frequently asked questions

How does content-based MIME detection differ from extension-based detection?
Content-based detection reads the file's magic bytes and internal structure to determine what the file actually is, regardless of its name or extension. Extension-based detection only checks the filename suffix. Content-based is more reliable for validating untrusted file drops in data pipelines.
What does the output report look like?
The output is a tab-separated .txt file with two columns: filename and mime_type. One row per file. Empty files show application/x-empty; unidentifiable files show application/octet-stream. It's designed to be machine-readable and easy to parse.
Can I process a large batch of files at once?
Yes. Batches support up to 50 files or 2GB total, whichever comes first. Individual files can be up to 50MB each for non-PDF/CSV/Excel types.
Does this tool modify any of the uploaded files?
No. The tool only reads files to detect their MIME type. All original files are left unmodified and the only output is the detection report.

Sign up free with Google and run MIME validation on your next file batch before it hits your pipeline.