Skip to content

Execution Modes

weevr separates how data is read from how data is written. The load mode controls source reading. The write mode controls target writing. This page covers both.

Write modes

The write block on a thread controls how shaped data reaches the target Delta table.

Overwrite

Replaces the entire target table with the new data.

write:
  mode: overwrite

Use overwrite for full refreshes, snapshots, and targets where you always want the complete current state. This is the default write mode.

Overwrite is naturally idempotent -- rerunning the same config with the same source data produces the same target state.

Append

Inserts new rows into the target without modifying existing rows.

write:
  mode: append

Use append for event logs, audit trails, and any target that accumulates rows over time.

Append is not idempotent

Rerunning an append produces duplicate rows. Pair append with an incremental load mode (watermark or parameter) to prevent reprocessing the same source data. See Idempotency for details.

Merge

Performs an upsert: matching rows are updated, unmatched source rows are inserted, and unmatched target rows are optionally deleted.

write:
  mode: merge
  match_keys: [customer_id, source_system]
  on_match: update
  on_no_match_target: insert
  on_no_match_source: ignore

Merge requires match_keys -- the columns used to match source rows to target rows. The behavior for each match outcome is independently configurable:

Parameter Options Default
on_match update, ignore update
on_no_match_target insert, ignore insert
on_no_match_source delete, soft_delete, ignore ignore

For soft deletes, specify the marker column and value:

write:
  mode: merge
  match_keys: [customer_id]
  on_no_match_source: soft_delete
  soft_delete_column: is_deleted
  soft_delete_value: "true"

Merge is idempotent by match key -- rerunning with the same data produces the same target state.

SCD patterns

SCD Type 1 and SCD Type 2 are not separate write modes. They are delivered as reusable stitches that compose on top of the core merge mode. See the YAML Schema Reference for stitch usage.

Load modes

Run 1 (first load)Run 2 (incremental)Run 3 (incremental)Sourceall rowswatermark: none(read everything)Targetfull datasetwatermark = 2024-04-05Sourcenew + changedwatermark > 2024-04-05Targetmerged resultwatermark = 2024-05-10Sourcenew + changedwatermark > 2024-05-10Targetmerged resultwatermark = 2024-06-01 next executionnext execution

The load block on a thread controls how source data is bounded on each execution.

Full

Reads all source data on every run.

load:
  mode: full

This is the default. Pair with write.mode: overwrite for a complete refresh, or with write.mode: merge for a full comparison merge.

Incremental watermark

Reads only source rows that have changed since the last successful run. weevr persists a high-water mark and filters subsequent reads automatically.

load:
  mode: incremental_watermark
  watermark_column: modified_date
  watermark_type: timestamp

On the first run, all rows are read (no prior watermark exists). On subsequent runs, only rows where modified_date exceeds the stored watermark are read.

The watermark is persisted in a configurable state store -- either as a Delta table property on the target or in a dedicated metadata table.

Incremental parameter

Incremental boundaries are passed as runtime parameters. weevr does not manage state -- the caller is responsible for providing the correct range.

load:
  mode: incremental_parameter

params:
  start_date:
    type: date
    required: true
  end_date:
    type: date
    required: true

Use this mode when the orchestration layer (Fabric pipeline, Airflow) controls the processing window.

CDC

The thread understands change data capture patterns. Source rows carry operation flags (insert, update, delete) that weevr applies as merge operations on the target.

load:
  mode: cdc
  cdc:
    preset: delta_cdf

CDC mode supports two configuration styles:

  • Preset -- Use delta_cdf to auto-configure for Delta Change Data Feed conventions.
  • Explicit -- Declare the operation column and flag values directly.
load:
  mode: cdc
  cdc:
    operation_column: change_type
    insert_value: "I"
    update_value: "U"
    delete_value: "D"

Choosing the right combination

Scenario Load mode Write mode
Full snapshot refresh full overwrite
Accumulating event log incremental_watermark append
Dimension table with updates full or incremental_watermark merge
CDC from upstream system cdc merge
Externally bounded batch incremental_parameter append or merge

Next steps