Execution Modes¶
weevr separates how data is read from how data is written. The load mode controls source reading. The write mode controls target writing. This page covers both.
Write modes¶
The write block on a thread controls how shaped data reaches the target
Delta table.
Overwrite¶
Replaces the entire target table with the new data.
Use overwrite for full refreshes, snapshots, and targets where you always want the complete current state. This is the default write mode.
Overwrite is naturally idempotent -- rerunning the same config with the same source data produces the same target state.
Append¶
Inserts new rows into the target without modifying existing rows.
Use append for event logs, audit trails, and any target that accumulates rows over time.
Append is not idempotent
Rerunning an append produces duplicate rows. Pair append with an incremental load mode (watermark or parameter) to prevent reprocessing the same source data. See Idempotency for details.
Merge¶
Performs an upsert: matching rows are updated, unmatched source rows are inserted, and unmatched target rows are optionally deleted.
write:
mode: merge
match_keys: [customer_id, source_system]
on_match: update
on_no_match_target: insert
on_no_match_source: ignore
Merge requires match_keys -- the columns used to match source rows to
target rows. The behavior for each match outcome is independently
configurable:
| Parameter | Options | Default |
|---|---|---|
on_match |
update, ignore |
update |
on_no_match_target |
insert, ignore |
insert |
on_no_match_source |
delete, soft_delete, ignore |
ignore |
For soft deletes, specify the marker column and value:
write:
mode: merge
match_keys: [customer_id]
on_no_match_source: soft_delete
soft_delete_column: is_deleted
soft_delete_value: "true"
Merge is idempotent by match key -- rerunning with the same data produces the same target state.
SCD patterns
SCD Type 1 and SCD Type 2 are not separate write modes. They are delivered as reusable stitches that compose on top of the core merge mode. See the YAML Schema Reference for stitch usage.
Load modes¶
The load block on a thread controls how source data is bounded on each
execution.
Full¶
Reads all source data on every run.
This is the default. Pair with write.mode: overwrite for a complete
refresh, or with write.mode: merge for a full comparison merge.
Incremental watermark¶
Reads only source rows that have changed since the last successful run. weevr persists a high-water mark and filters subsequent reads automatically.
On the first run, all rows are read (no prior watermark exists). On
subsequent runs, only rows where modified_date exceeds the stored
watermark are read.
The watermark is persisted in a configurable state store -- either as a Delta table property on the target or in a dedicated metadata table.
Incremental parameter¶
Incremental boundaries are passed as runtime parameters. weevr does not manage state -- the caller is responsible for providing the correct range.
load:
mode: incremental_parameter
params:
start_date:
type: date
required: true
end_date:
type: date
required: true
Use this mode when the orchestration layer (Fabric pipeline, Airflow) controls the processing window.
CDC¶
The thread understands change data capture patterns. Source rows carry operation flags (insert, update, delete) that weevr applies as merge operations on the target.
CDC mode supports two configuration styles:
- Preset -- Use
delta_cdfto auto-configure for Delta Change Data Feed conventions. - Explicit -- Declare the operation column and flag values directly.
load:
mode: cdc
cdc:
operation_column: change_type
insert_value: "I"
update_value: "U"
delete_value: "D"
Choosing the right combination¶
| Scenario | Load mode | Write mode |
|---|---|---|
| Full snapshot refresh | full |
overwrite |
| Accumulating event log | incremental_watermark |
append |
| Dimension table with updates | full or incremental_watermark |
merge |
| CDC from upstream system | cdc |
merge |
| Externally bounded batch | incremental_parameter |
append or merge |
Next steps¶
- Idempotency -- Understand rerun safety per mode
- Artifacts Model -- How targets map to Delta tables
- YAML Schema: Thread -- Full write and load configuration reference