Concepts¶
Explanation-tier documentation. Each page answers "why does DataCoolie work this way?" rather than "how do I do X?". For task recipes see How-to; for precise field-level contracts see Reference.
How these concepts connect¶
DataCoolie's architecture is layered. At the top sits the metadata model — connections, dataflows, and transforms defined as JSON, YAML, or Excel. The DataCoolieDriver reads that metadata and coordinates execution. It delegates data processing to an engine (Polars or Spark), which uses a platform (local, AWS, Fabric, Databricks) for file I/O and secrets. Between read and write, a transformer pipeline applies schema hints, deduplication, computed columns, and filters. The write step uses a load strategy — append, merge, or SCD2. Finally, watermarks track progress so the next run picks up only new data.
Reading in the order below gives you the full picture, but you can jump to any page independently.
This is the best starting section if you want to understand the workflow, terminology, and operating model without running code.
Start with Architecture if you're new to the framework. Otherwise jump to the concept you need:
- Architecture — component diagram, dependency directions, runtime flow.
- Engines —
BaseEngine[DF], thefmtparameter contract, format dispatch. - Platforms — file I/O and secret-provider responsibilities, per-cloud specifics.
- Metadata model — connections, dataflows, transforms.
- Metadata providers — file vs database vs API, picking the right one.
- Sources & destinations — plugin registries, format → reader mapping.
- Transformers & pipeline — ordering slots, tracking labels.
- Load strategies — append / overwrite / merge / SCD2.
- Watermarks — raw-JSON contract,
__datetime__sentinel. - Orchestration — driver, job distributor, parallel executor, retry handler.
- Logging — ETL logger vs system logger,
LogPurpose, partitioning. - Secrets — provider vs resolver,
secrets_refschema.
Related sections¶
- Need task recipes instead of explanations? → How-to guides
- Need exact field names and API signatures? → Reference
- Want to add a new engine, source, or transformer? → Extending
- Looking for production guidance? → Operations