Skip to content

State API

The weevr.state module manages watermark persistence for incremental loads. It supports multiple storage backends including Delta table properties and dedicated metadata tables.

weevr.state

Watermark state persistence for incremental processing.

__all__ = ['WatermarkState', 'WatermarkStore', 'resolve_store'] module-attribute

WatermarkState

Bases: FrozenBase

Immutable watermark high-water mark snapshot.

last_value is serialized as a string for uniform storage across all watermark types (timestamp, date, int, long).

WatermarkStore

Bases: ABC

Abstract interface for watermark state persistence.

read(spark, thread_name) abstractmethod

Load persisted watermark state.

Returns None if no prior state exists for the given thread.

write(spark, state) abstractmethod

Persist watermark state.

Raises StateError on failure.

LoadConfig

Bases: FrozenBase

Incremental load mode and watermark parameters.

Cross-field validation: - mode == "incremental_watermark" requires watermark_column to be set. - mode == "cdc" requires cdc config to be set. - mode == "cdc" must not have watermark_column set (CDF version tracking is automatic).

resolve_store(load_config, target_path)

Resolve the appropriate watermark store from config.

If load_config.watermark_store is None or its type is table_properties, returns a :class:TablePropertiesStore. If the type is metadata_table, returns a :class:MetadataTableStore.

Parameters:

Name Type Description Default
load_config LoadConfig

Thread-level load configuration.

required
target_path str

Path to the thread's Delta target table.

required

Returns:

Type Description
WatermarkStore

A concrete WatermarkStore implementation.