File Metadata Reports for Academic Researchers Documenting Dataset Provenance
Research data management plans increasingly require documentation of dataset provenance — what files make up a dataset, their formats, sizes, and when they were created or last modified. The File Metadata Report tool generates a structured JSON inventory of these attributes for any batch of research files, producing the kind of machine-readable provenance record that data repositories and journal submission systems expect.
Funding bodies including the NSF and NIH now mandate data management plans that specify how datasets will be documented, stored, and made accessible. A core component of that documentation is a file-level inventory: format, size, and provenance timestamps for every file in the dataset. Researchers who assemble this manually — from file explorers, terminal commands, or spreadsheet notes — produce records that are inconsistent, hard to update, and rarely in the structured format that repositories like Zenodo, Figshare, or institutional data archives prefer.
Deliteful processes up to 50 files per batch across the formats most common in research datasets: CSV and Excel for tabular data, PDF for publications and codebooks, images for figures and scans, DOCX for documentation, and ZIP for archived packages. The JSON output uses a flat, consistent per-file record structure that maps cleanly to Dublin Core and other common metadata schemas used by academic repositories.
How it works
- 1
Upload the dataset files
Add the files that constitute the dataset — raw data, processed outputs, codebooks, and documentation — up to 50 files per batch.
- 2
Generate the provenance report
Deliteful returns a JSON record for each file with name, size in bytes, MIME type, and creation and modification timestamps.
- 3
Include the report in your data management documentation
Attach the JSON output to your DMP, repository submission, or supplementary materials as a machine-readable dataset inventory.
Frequently asked questions
- Does this output satisfy data management plan requirements for file-level documentation?
- It produces the core attributes most DMPs require at the file level — format, size, and timestamps — in a structured, machine-readable format. Whether it fully satisfies a specific funder's DMP requirements depends on that funder's schema; check your institution's DMP guidance for any additional required fields.
- Can this help when submitting a dataset to a repository like Zenodo or Figshare?
- Yes. The JSON metadata report gives you a structured inventory of your dataset files that you can reference when filling in repository submission forms and attach as a supplementary provenance document alongside the dataset.
- What file formats from research workflows are supported?
- CSV, Excel (xlsx/xls), PDF, DOCX, TXT, JSON, PNG, JPG, JPEG, WEBP, and ZIP are all supported — covering raw data, analysis outputs, figures, codebooks, and archived packages.
- Are embedded document metadata fields like author or title included in the report?
- No. The tool extracts filesystem-level metadata only — name, size, MIME type, and filesystem timestamps. Embedded document properties such as author, title, or subject are not extracted. For those fields, a dedicated metadata extraction tool is needed.
Create your free Deliteful account with Google and generate structured dataset provenance records for your next data management plan or repository submission.