Skip to content

Release Notes — v1.15

Release date: April 2026

This release lets the generic CDC load path compose with a watermark column, so append-only change data capture history tables no longer reread the full history on every run. The dominant use case is SAP data landed by Fabric Open Database Mirror, where every change row carries an operation flag like OPFLAG and a change timestamp like AEDATTM.


CDC + watermark composition

Prior releases rejected this configuration at model validation time, under the assumption that any CDC use meant the Delta Change Data Feed preset. The generic CDC path — the one with an explicit operation_column and I/U/D value mapping — was caught in the same rejection even though it had no built-in incremental mechanism and left users rereading the entire source on every thread run.

Starting with v1.15, mode: cdc and watermark_column compose for the generic path:

load:
  mode: cdc
  cdc:
    operation_column: OPFLAG
    insert_value: "I"
    update_value: "U"
    delete_value: "D"
    on_delete: soft_delete
  watermark_column: AEDATTM
  watermark_type: timestamp
  # watermark_store omitted: defaults to table_properties

On the first run weevr reads the full source and captures max(AEDATTM) as the new high-water mark. On subsequent runs the source read is narrowed to rows past the stored HWM, the usual I/U/D routing runs over only the new window, and the HWM advances after a successful write. Steady-state cost drops from O(history) to O(delta).

What stays the same

  • The Delta CDF preset (cdc.preset: delta_cdf) is unchanged. It still rejects watermark_column — CDF's commit-version tracking is the incremental mechanism for that path, and combining the two would be redundant.
  • Generic CDC threads that don't set watermark_column keep full-source read behavior. No existing configuration changes meaning.
  • Failure semantics mirror incremental_watermark: the HWM is persisted only after a successful write, so a mid-run failure leaves the prior HWM in place and the next run idempotently reprocesses the same window via CDC merge match keys.
  • Both watermark_store: table_properties (default, zero-config) and watermark_store: metadata_table backends work.

Delete rows advance the watermark

The HWM is captured from the filtered DataFrame before any I/U/D routing. That means delete rows (the D branch of the operation column) still participate in max(watermark_column) — their change timestamp counts toward advancing the window, even though they route to the delete path during the merge. This avoids a subtle bug where a run that only saw delete rows would fail to advance the HWM and reread them on the next run.

All three watermark types supported

watermark_type: timestamp, date, and long all work for CDC composition, via the same build_watermark_filter helper that incremental_watermark mode uses. watermark_inclusive: true behaves identically to the incremental watermark mode: the filter becomes >= prior_hwm instead of > prior_hwm, which is the safer default when pairing with merge or overwrite writes.


Configuration summary

No new fields were added. The change is a validator relaxation plus an internal wiring change, so the public YAML schema and the state schema are both unchanged.

Cross-field rules now enforced by LoadConfig:

Rule Effect
mode=cdc + cdc.operation_column + watermark_column Accepted; composition path
mode=cdc + cdc.preset=delta_cdf + watermark_column Rejected with a preset-specific error
watermark_column set without watermark_type in cdc mode Rejected

See the load configuration reference for the full field table and the thread YAML schema for labelled YAML examples of all three load patterns (incremental_watermark, CDC via delta_cdf preset, and CDC with a watermark column).

Internal changes

read_cdc_source now returns tuple[DataFrame, str | None]. The second element is the HWM captured from this run, or None when nothing was captured (CDF preset path, empty first run, or empty subsequent window). The executor unpacks the tuple, wires new_hwm through the existing save_watermark plumbing, and only sets last_version on the CDF preset branch — explicit, with no reliance on incidental exception behavior in int().

Backwards compatibility

  • Existing configurations keep working without changes.
  • No new YAML keys, no new state schema, no new credentials.
  • Only an internal helper signature changed; the executor is the sole non-test caller and was migrated in the same commit.