Skip to content

Destinations

base

Abstract base classes for destination writers and load strategies.

BaseDestinationWriter[DF] uses the Template Method pattern for writes and the Strategy pattern for load-type dispatch.

BaseLoadStrategy defines the contract for individual load strategies (overwrite, append, merge_upsert, merge_overwrite, scd2).

BaseDestinationWriter

BaseDestinationWriter(engine: BaseEngine[DF])

Bases: ABC, Generic[DF]

Abstract destination writer with Template Method lifecycle.

Subclasses implement :meth:_write_internal. The base class manages timing, error wrapping, runtime info, and maintenance.

get_runtime_info

get_runtime_info() -> DestinationRuntimeInfo

Return runtime information from the most recent write.

run_maintenance

run_maintenance(dataflow: DataFlow, *, do_compact: bool = True, do_cleanup: bool = True, retention_hours: Optional[int] = None) -> DestinationRuntimeInfo

Run maintenance operations (compact + cleanup) as a single result.

Sub-operation outcomes are stored in operation_details as a list of dicts, each with an "operation" key ("compact" / "cleanup" / "history"). File/byte metrics are summed across both operations.

Parameters:

Name Type Description Default
dataflow DataFlow

Pipeline configuration.

required
do_compact bool

Whether to run compaction.

True
do_cleanup bool

Whether to run cleanup.

True
retention_hours Optional[int]

Cleanup retention override (default: 168 h).

None

Returns:

Type Description
DestinationRuntimeInfo

A single :class:DestinationRuntimeInfo covering the full run.

write

write(df: DF, dataflow: DataFlow) -> DestinationRuntimeInfo

Write data to the destination (Template Method).

  1. Validate destination path.
  2. Delegate to :meth:_write_internal.
  3. Populate and return runtime info.

Parameters:

Name Type Description Default
df DF

DataFrame to write.

required
dataflow DataFlow

Full pipeline configuration.

required

Returns:

Type Description
DestinationRuntimeInfo

Runtime metrics for the write operation.

Raises:

Type Description
DestinationError

On any write failure.

BaseLoadStrategy

Bases: ABC

Abstract load strategy (Strategy pattern).

Each strategy knows how to write a DataFrame to a destination path using a specific load type (overwrite, append, merge, etc.).

load_type abstractmethod property

load_type: str

The load type this strategy handles (e.g. "overwrite").

execute abstractmethod

execute(df: DF, table_name: str, dataflow: DataFlow, engine: BaseEngine[DF], path: Optional[str] = None) -> None

Execute the load strategy.

Parameters:

Name Type Description Default
df DF

DataFrame to write.

required
table_name str

Fully qualified table name (always available).

required
dataflow DataFlow

Pipeline configuration.

required
engine BaseEngine[DF]

DataFrame engine.

required
path Optional[str]

Optional destination path for path-based fallback.

None

file_writer

File-format destination writer (Parquet, CSV, JSON, JSONL, Avro).

Writes DataFrames to storage paths, supporting date-folder partitioning via connection.date_folder_partitions resolved with the current UTC timestamp. Only append, overwrite, and full_load load types are supported. Maintenance (compact / cleanup) is not applicable.

FileWriter

FileWriter(engine: BaseEngine[DF])

Bases: BaseDestinationWriter[DF]

Destination writer for flat-file formats (Parquet, CSV, JSON, …).

Delegates to :func:get_load_strategy for the actual write logic. Date-folder partitioning is resolved at write time using utc_now().

Maintenance operations (compact / cleanup) are not supported for flat files and will raise :class:DestinationError if invoked.

delta_writer

Delta Lake destination writer.

Writes DataFrames to Delta Lake tables, dispatching to the appropriate load strategy based on dataflow.load_type.

DeltaWriter

DeltaWriter(engine: BaseEngine[DF])

Bases: BaseDestinationWriter[DF]

Destination writer for Delta Lake tables.

Delegates to :func:get_load_strategy for the actual write logic based on the dataflow's load type.

Also supports: * Maintenance operations (compact, cleanup) via the base class.

iceberg_writer

Iceberg destination writer.

Writes DataFrames to Iceberg tables, preferring table-name-based operations (DataFrameWriterV2 / SQL MERGE) via catalog. Falls back to path-based operations when no catalog is configured.

IcebergWriter

IcebergWriter(engine: BaseEngine[DF])

Bases: BaseDestinationWriter[DF]

Destination writer for Apache Iceberg tables.

Prefers table-name-based operations (catalog-aware) for Iceberg. Falls back to path-based operations when no catalog is configured.

load_strategies

Load strategies for destination writers.

Each strategy implements a specific write mode (overwrite, append, merge_upsert, merge_overwrite, scd2) via the :class:BaseLoadStrategy interface.

The :data:LOAD_STRATEGIES registry and :func:get_load_strategy helper provide lookup by load-type string.

AppendStrategy

Bases: BaseLoadStrategy

Append — add new rows to the table.

MergeOverwriteStrategy

Bases: BaseLoadStrategy

Merge overwrite — delete + re-insert matching rows.

MergeUpsertStrategy

Bases: BaseLoadStrategy

Merge upsert — insert new rows, update existing by merge keys.

OverwriteStrategy

Bases: BaseLoadStrategy

Full overwrite — replace the entire table.

SCD2Strategy

Bases: BaseLoadStrategy

Slowly Changing Dimension Type 2.

Closes existing current records and inserts new versions using the staged-updates MERGE pattern. Requires merge_keys and scd2_effective_column in destination.configure.

get_load_strategy

get_load_strategy(load_type: str) -> BaseLoadStrategy

Look up a load strategy by type string.

Parameters:

Name Type Description Default
load_type str

Load type (e.g. "overwrite", "merge_upsert").

required

Returns:

Name Type Description
Matching BaseLoadStrategy

class:BaseLoadStrategy instance.

Raises:

Type Description
DestinationError

If the load type is not registered.