Load strategies¶
TL;DR destination.load_type selects one of five strategies: append,
overwrite (aka full_load), merge_upsert, merge_overwrite, scd2. Each
maps onto a specific BaseEngine method.
Strategy matrix¶
load_type |
Engine method | Touches existing rows? | Keys required | Typical use |
|---|---|---|---|---|
append |
write_to_*(mode="append") |
no | — | Event streams, fact tables where duplicates are acceptable. |
overwrite / full_load |
write_to_*(mode="overwrite") |
replaces everything | — | Daily snapshot dims, aggregates. |
merge_upsert |
merge_to_* |
updates matched, inserts unmatched | merge_keys |
Incremental CDC-style loads. |
merge_overwrite |
merge_overwrite_to_* |
deletes matched rows then reinserts from source | merge_keys |
"Rolling overwrite" when source always holds full current-state for a window. |
scd2 |
scd2_* |
closes out matched, inserts new version | merge_keys |
Slowly-changing dimensions with history. |
full_load vs overwrite
Same semantics. full_load is kept for metadata authored before the enum
was consolidated.
merge_upsert (SCD1 / CDC)¶
Standard upsert: rows with matching merge_keys are updated, rows without
a match are inserted. Nothing is deleted.
merge_overwrite (rolling overwrite)¶
Used when the source always holds the current state for a set of keys and you
want to replace the destination rows for those keys. Existing rows matching
merge_keys are deleted and re-inserted from the source.
Example: nightly snapshot of "last 7 days" — everything older is already final; everything inside the window is delivered fresh.
scd2 (Type 2 with history)¶
SCD2ColumnAdder (order 60) reads the metadata-declared
destination.configure.scd2_effective_column (a source business-time column) and adds
three framework columns before write:
__valid_from— copied from the effective column__valid_to— initially NULL__is_current— initiallytrue
The engine's scd2_to_path / scd2_to_table then runs a two-step MERGE:
- Close step — for every source row whose
merge_keysmatch a target row where__is_current = trueand the source__valid_fromis later than the target__valid_from(late-arrival guard), set__valid_to = source.__valid_fromand__is_current = false. - Append step — insert all source rows as new versions.
No hash column is stored; versioning is driven entirely by the effective-date column you nominate. See How-to · Merge & SCD2 for a worked example.
Choosing a strategy¶
flowchart TD
A[Is history required?] -->|yes| B[scd2]
A -->|no| C[Can source deliver full state?]
C -->|full table| D[overwrite]
C -->|delta rows only| E[Do duplicates matter?]
E -->|no| F[append]
E -->|yes, need latest-wins| G[merge_upsert]
E -->|yes, need full refresh of matched keys| H[merge_overwrite]