Maintenance (vacuum / optimize)¶
Prerequisites · Existing Delta or Iceberg destinations.
End state · Periodic OPTIMIZE / VACUUM runs safely in parallel without racing fan-in topologies.
Invoke¶
result = driver.run_maintenance(
connection=["bronze", "silver"], # optional – filter by connection name
do_compact=True, # run OPTIMIZE (default: True)
do_cleanup=True, # run VACUUM (default: True)
)
Deduplication¶
When multiple dataflows write to the same physical destination (fan-in), DataCoolie deduplicates before dispatching maintenance. Only the winning dataflow emits a maintenance log row; covered dataflows are implicitly covered.
This prevents concurrent OPTIMIZE calls from racing on the same table — a
common source of commit failures in Delta.
Retention¶
DataCoolieRunConfig.retention_hours controls VACUUM retention (default:
DEFAULT_RETENTION_HOURS = 168 hours / 7 days). Pass it when constructing
the driver:
from datacoolie import DataCoolieDriver, DataCoolieRunConfig
driver = DataCoolieDriver(config=DataCoolieRunConfig(retention_hours=72))