Rename Raw Data Files with Pipeline-Ready Naming Before Ingestion
Data engineers pulling raw exports from source systems frequently receive files with names generated by the exporting application — report_20240101_143022.csv, extract_v2_USE_THIS.json — that break downstream pipeline assumptions about filename structure. Deliteful's Batch Rename Files tool applies a deterministic prefix, suffix, and sequential counter to an entire batch of data files, producing names your pipeline can parse without a custom preprocessing step.
Many ETL pipelines use filename patterns as part of their routing or partitioning logic. A file landing in an S3 bucket or watched directory with an unexpected name either fails silently or requires an exception handler that someone has to write and maintain. Standardizing filenames at the point of receipt — before files enter the pipeline — is cheaper than handling naming variance inside the pipeline. For teams ingesting files from external vendors or legacy systems with no naming contract, a consistent manual rename step is often the pragmatic solution.
Deliteful accepts CSV, JSON, XLSX, and other common data formats up to their respective size limits (CSV up to 500MB, Excel up to 200MB) in batches of up to 50 files or 2GB total. A typical convention for pipeline ingestion might use a prefix encoding the source system and data type — SalesForce_Opportunities_ or ERP_Inventory_ — with a date suffix like _2025_03 and a sequential counter for daily or weekly file sets. Renamed copies download immediately; originals are never modified.
How it works
- 1
Collect the raw export files for one source and period
Group files by source system and time period before uploading — sequential numbers are only meaningful within a consistent grouping.
- 2
Define a prefix encoding source and data type
Use a prefix like SourceSystem_DataType_ that your pipeline can use as a routing key or partition identifier.
- 3
Add a date or period suffix
A suffix like _2025_Q1 or _2025_03 makes the ingestion period explicit in the filename, which aids both pipeline logic and manual debugging.
- 4
Download and drop into your landing zone
Renamed files are ready for S3, SFTP drop, or watched directory ingestion; originals remain available as the raw source.
Frequently asked questions
- Can I rename CSV, JSON, and XLSX files in the same batch?
- Yes. Deliteful handles mixed file types in one batch. Each file retains its original extension after renaming, which is important for pipelines that route by file extension.
- Is there a file size limit for CSV or JSON data files?
- CSV files up to 500MB and Excel files up to 200MB are supported. JSON and other file types up to 50MB each. The batch total limit is 50 files or 2GB, whichever comes first.
- Does renaming affect any data inside the file — headers, encoding, or content?
- No. Deliteful creates renamed copies that are byte-identical to the originals. No content, encoding, or structure is modified. The file is identical to the original in every way except its name.
- Can I use the starting counter to match an existing file sequence in my landing zone?
- Yes. Set the starting counter to continue an existing sequence — for example, if files 1–30 were already ingested, start the next batch at 31 to maintain continuity.
Create your free Deliteful account with Google and stop writing preprocessing logic to handle inconsistent source filenames.