Release Notes — v1.10¶
Release date: March 2026
This release adds four new pipeline steps (concat, map, format, fill_null type defaults), two analytical target modes (dimension and fact), audit column templates with built-in presets, shared resource universality across loom/weave/thread levels, the resolve step for declarative FK resolution, the fk_sentinel_rate assertion, and a broad set of codebase quality improvements spanning model validation, error handling, security hardening, and telemetry reliability.
Concat Step¶
Concatenates multiple columns into a single string column with configurable null handling, separators, and trimming.
steps:
- concat:
target: full_address
columns: [street, city, state, zip]
separator: ", "
null_mode: skip
trim: true
collapse_separators: true
Key features:
- null_mode —
skip(omit nulls and collapse their separator),empty(convert nulls to empty strings), orliteral(replace with a custom string) - null_literal — replacement string when
null_modeisliteral(default<NULL>) - trim — strip leading/trailing whitespace from source columns before joining
- collapse_separators — when nulls are skipped, collapse
adjacent separators into one (default
true) - When all source values are null in
skipmode, the result is NULL rather than an empty string
Map Step¶
Maps discrete values in a column to new values using a lookup dictionary. Handles nulls and unmapped values independently.
steps:
- map:
column: status_code
target: status_label
values:
A: "Active"
B: "Blocked"
C: "Closed"
default: "Unknown"
on_null: "Missing"
case_sensitive: false
Key features:
- target — optional output column; when omitted, the source column is overwritten in place
- default — fallback value for unmapped inputs (mutually
exclusive with
unmapped: nullandunmapped: validate) - on_null — specific replacement for null inputs
- unmapped —
keep(retain original),null(set NULL), orvalidate(keep and add a boolean flag column__map_unmapped_{column}) - case_sensitive — set
falsefor case-insensitive matching (defaulttrue)
Format Step¶
Formats columns using pattern, number, or date rules. Each column specifies exactly one format type.
steps:
- format:
columns:
phone:
source: raw_phone
pattern: "({1:3}){4:3}-{7:4}"
amount_display:
source: amount
number: "#,##0.00"
event_date:
date: "yyyy-MM-dd"
Three format types:
- pattern — positional extraction with
{position:length}syntax (e.g., phone number formatting).on_shortcontrols behavior when input is shorter than the pattern:null(default) orpartial(best effort) - number — DecimalFormat pattern for numeric display
(e.g.,
#,##0.00→1,234,567.89).strict_types: falseauto-casts non-numeric sources - date — SimpleDateFormat pattern for date display
(e.g.,
yyyy-MM-dd)
Type-Aware fill_null¶
The fill_null step gains a new type_defaults mode that
assigns semantically appropriate defaults based on each
column's data type.
steps:
- fill_null:
mode: type_defaults
code: unknown
include: ["amount_*"]
exclude: ["id"]
overrides:
region: "Unspecified"
Semantic codes and their defaults:
| Code | String | Integer | Boolean | Date |
|---|---|---|---|---|
unknown |
Unknown | 0 | false | 1970-01-01 |
not_applicable |
Not Applicable | 0 | false | 1970-01-01 |
invalid |
Invalid | 0 | false | 1970-01-01 |
Key features:
- include/exclude — glob patterns to restrict which
columns receive defaults (e.g.,
include: ["addr_*"]) - overrides — per-column replacements applied on top of type-based defaults
- where — conditional predicate to fill only matching rows
- Composable —
mode: type_defaultsand explicitcolumnscan coexist in the same step; type defaults apply first, then explicit columns override
Analytical Target Modes¶
Dimension Mode¶
Declares a dimension table with surrogate key generation, business key identification, SCD Type 2 history tracking, change detection groups, and system member rows.
target:
alias: gold.dim_customer
dimension:
business_key: [customer_id]
surrogate_key:
name: sk_customer_id
algorithm: sha256
columns: [customer_id]
output: native
track_history: true
change_detection:
attrs:
columns: [customer_name, address]
on_change: version
static:
columns: [region_code]
on_change: static
columns:
valid_from: _valid_from
valid_to: _valid_to
is_current: _is_current
system_members:
- sk: -1
code: UNKNOWN
label: Unknown
- sk: -4
code: INVALID
label: Invalid Data
Key features:
- surrogate_key — hash-based key generation with
configurable algorithm (
sha256,xxhash64,md5,murmur3, etc.) and output type (nativeorstring) - business_key — one or more columns forming the natural key
- track_history — enables SCD Type 2 with
valid_from,valid_to, andis_currenttracking columns - change_detection — named groups with independent
on_changebehavior:version(new SCD row),overwrite(update in place), orstatic(non-versioned metadata).columns: autocaptures remaining unclaimed columns - previous_columns — capture prior values before an update
(e.g.,
previous_name: customer_name) - additional_keys — extra hash keys beyond the primary surrogate
- system_members — sentinel rows with negative SK values
for Unknown, Invalid, etc.
seed_system_members: trueinserts them on first load - history_filter — expose a filtered view of current rows
only (default
true) - dates — configurable SCD boundary dates (
mindefault1970-01-01,maxdefault9999-12-31)
Fact Mode¶
Declares a fact table with foreign key columns and sentinel value conventions for data quality enforcement.
target:
alias: gold.fact_sales
fact:
foreign_keys:
- customer_sk
- product_sk
- region_sk
sentinel_values:
invalid: -4
missing: -1
Key features:
- foreign_keys — required non-empty list of FK columns referencing dimensions
- sentinel_values — conventions for missing and invalid
data (defaults:
invalid: -4,missing: -1). Values must be distinct to prevent stats double-counting - Pairs naturally with the resolve step and fk_sentinel_rate assertion for end-to-end FK resolution and quality gates
Audit Column Templates¶
Background¶
v1.6 introduced audit column injection via inline
audit_columns dicts at the thread target level. v1.10 adds
named templates — reusable sets of audit columns that can
be defined at any level and referenced by name, with two
built-in presets.
Built-in Presets¶
| Preset | Cols | Description |
|---|---|---|
fabric |
9 | Fabric pipeline metadata — batch, pipeline, workspace, Spark app |
minimal |
3 | Lightweight — loaded_at, run_id, thread name |
Configuration¶
Define custom templates and reference them by name:
# Define at loom, weave, or thread level
audit_templates:
my_standard:
columns:
_loaded_at: "current_timestamp()"
_run_id: "${param.run_id}"
_environment: "'${param.env}'"
target:
alias: gold.customers
audit_template: minimal
Multiple templates can be referenced and merged in order:
target:
alias: gold.orders
audit_template:
- minimal
- my_standard
audit_columns:
_custom_col: "custom_value()"
audit_columns_exclude:
- "_batch_*"
Inheritance¶
Templates cascade from loom → weave → thread:
- Templates defined at the loom level are inherited by all weaves and threads
- Weave-level templates extend or override loom-level ones
- Thread-level templates extend or override weave-level ones
- Set
audit_template_inherit: falseon a thread to block inheritance from parent levels - Inline
audit_columnsmerge additively on top of resolved templates (same-named columns override) audit_columns_excludeapplies glob patterns last to remove unwanted columns
Context Variables¶
Templates support runtime substitution:
${thread.name},${thread.qualified_key},${thread.source},${thread.sources}${weave.name},${loom.name}${run.timestamp},${run.id}${param.*}(runtime parameters)
Shared Resource Universality¶
Prior to v1.10, shared resources like lookups, column sets, and variables could only be defined at the thread level. This release promotes resource definitions to the loom and weave levels, enabling centralized configuration that cascades down the hierarchy.
Resources at Every Level¶
The following resources can now be defined at loom, weave, or thread levels:
lookups— named lookup source definitionscolumn_sets— named column rename mappingsvariables— named variable specspre_steps/post_steps— named hook stepsparams— parameter definitionsexecution— execution config (log level, tracing)naming— column naming conventionsaudit_templates— audit column template definitions
Cascading Rules¶
Resources merge from loom → weave → thread, with the most specific level winning on conflicts:
# loom.yaml — shared across all weaves
lookups:
dim_customer:
source:
type: delta
alias: staging.dim_customer
# weave.yaml — adds weave-specific lookups
lookups:
ref_status:
source:
type: delta
alias: reference.status_codes
# thread.yaml — overrides loom-level customer lookup
lookups:
dim_customer:
source:
type: delta
alias: dev.dim_customer_test
Merge semantics:
- Additive merge —
lookups,column_sets,variables,pre_steps,post_stepsmerge by name across levels; same-named entries at a lower level override the parent - Replacement —
execution,naming,paramsat a lower level replace the parent entirely
Resolve Step¶
Foreign key resolution is a universal pattern in dimensional
modeling. Every fact table requires resolving business keys from
source systems into surrogate keys from dimension tables.
Previously, this required chaining join, derive, filter,
and coalesce steps — verbose, error-prone, and lacking
standardized sentinel handling.
Single FK Resolve¶
The resolve step encapsulates the complete FK resolution
pattern in one declarative block:
steps:
- resolve:
name: plant_id
lookup: dim_plant
match: plant_code
pk: id
on_invalid: -4
on_unknown: -1
Key features:
- BK completeness check — null/blank source columns
automatically receive the
on_invalidsentinel - Sentinel assignment — aligned with system member codes
(
on_invaliddefault-4,on_unknowndefault-1) - Match sugar — string, list, or dict forms for column mapping
- Normalization —
trim_lower,trim_upper,trimpresets applied symmetrically to both sides - Include columns — bring additional lookup columns into the fact with optional rename dict and prefix
- on_duplicate —
warn(default),error, orfirstfor multi-match scenarios - Resolution stats — per-FK metadata (total, matched, unknown, invalid, duplicates, match_rate)
SCD2 Narrowing¶
The effective block supports two sub-modes for point-in-time
dimension resolution:
# Current flag (string sugar or dict form)
effective:
current: is_current
# Date range (half-open interval [from, to))
effective:
date_column: order_date
from: effective_from
to: effective_to
A general where predicate can compose with effective using
AND semantics for custom narrowing.
Batch Mode¶
Resolve multiple FKs in one step with shared defaults:
- resolve:
pk: id
on_invalid: -4
on_unknown: -1
batch:
- name: plant_id
lookup: dim_plant
match: plant_code
- name: customer_id
lookup: dim_customer
match:
customer_code: natural_id
normalize: trim_lower
Item-level values override shared defaults. Source columns are dropped only after all FKs complete, preventing mid-batch failures when columns are shared across resolutions.
Pipeline Integration¶
The resolve step is dispatched via special handling in
run_pipeline(), which now accepts an optional lookups
parameter. The executor passes the effective cached lookups
(merged from loom, weave, and thread levels) through to the
pipeline.
on_failure: warn assigns the on_unknown sentinel to all
rows and logs a warning instead of aborting the thread when a
lookup cannot be found (single mode only).
fk_sentinel_rate Assertion¶
A new post-write assertion type for checking FK sentinel value rates:
assertions:
- type: fk_sentinel_rate
column: plant_id
sentinel: -4
max_rate: 0.05
message: "plant FK invalid rate exceeded"
Supports:
- Single column or columns list — each checked independently
- Named sentinel groups — dict-of-int (shared max_rate) or dict-of-dict (per-group rates)
- System member codes — string values resolved at evaluation time
Config Validation¶
validate_resolve_lookups() checks at config time that all
resolve steps reference defined lookup names, catching
configuration errors before execution.
Codebase Quality Improvements¶
Model Validation¶
- Step discriminator fix — multi-word step types
(
case_when,fill_null,string_ops,date_ops) now round-trip correctly through the discriminated union when passed as model instances - Empty collection guards —
SelectParams,DropParams,CastParams,DeriveParams,SortParams,UnionParamsnow reject empty column/source lists at parse time - Target requires alias or path — at least one must be set
- Thread requires non-empty sources — empty sources dict rejected at parse time
- ColumnSetSource type validation —
deltarequiresalias,yamlrequirespath - fk_sentinel_rate presence validation — requires at least
one of
column/columnsandsentinel/sentinels - on_invalid/on_unknown uniqueness — equal sentinel values rejected to prevent stats double-counting
- DimensionSurrogateKeyConfig — renamed from
SurrogateKeyConfigin the dimension module to resolve namespace collision withkeys.SurrogateKeyConfig
Public API¶
STEP_TYPES— renamed from_STEP_TYPES(now public)- New exports from
weevr.model:ConcatStep,ConcatParams,MapStep,MapParams,FormatStep,FormatSpec,FormatParams,ResolveStep,ResolveParams,ResolveBatchItem,EffectiveConfig,CurrentConfig,DimensionSurrogateKeyConfig quote_identifier— renamed from_quote_identifierCONTEXT_VAR_PATTERN— renamed from_CONTEXT_VAR_PATTERN
Security and Reliability¶
- SQL injection guard — table aliases in quality gate SQL
queries are now backtick-escaped via
_quote_table_ref() - Path traversal guard —
resolve_ref_pathrejects refs that escape the project root - Thread name sanitization — single quotes escaped in table property keys
- Assert replacement — all
assertstatements in production code replaced with explicit if-raise guards that survive Python-Ooptimization - Error type normalization —
ValueErrorreplaced withExecutionErrorin resolve and formatting handlers for consistent error taxonomy
Telemetry and Observability¶
- Span finalization — weave and loom telemetry spans are
now finalized in
finallyblocks, producing complete traces even on failure paths - Weave status fix — a weave where all threads succeeded
or were conditionally skipped now correctly returns
"success"(was"partial") - CDC count consolidation — three separate
.count()calls inexecute_cdc_mergeconsolidated into a singlegroupByaggregation
Minor Improvements¶
__dedup_rn__temp column name prevents collision with user columns named_rnFormatSpecrejects empty pattern stringsConcatParamsrejects emptynull_literalwith literal modeExporttreats empty-string path/alias same asNonefact.pylogs exceptions at DEBUG level instead of silently swallowing them in the FK sentinel advisory checkWeaveTelemetry.column_set_resultstyped aslist[ColumnSetResult](waslist[Any])- Internal planning IDs stripped from all source and test files