Resolve Step — FK Resolution¶
The resolve step encapsulates the complete foreign key
resolution pattern: validate business key completeness, join
to a dimension lookup, assign sentinel values for invalid and
unknown BKs, handle SCD2 point-in-time narrowing, and
optionally include extra columns from the lookup.
Why resolve?¶
Without resolve, FK resolution requires chaining join +
derive + filter + coalesce steps — verbose, error-prone,
and lacking standardized sentinel handling. The resolve step
replaces that entire chain with one declarative block.
Value over manual steps:
- Sentinel assignment replaces join + coalesce + case_when
- BK completeness check (null/blank detection) is automatic
- Single output column — no cleanup of extra join columns
- Batch mode — 8 FKs in one block vs 8 join chains
- Lookup integration — references lookups by name
fk_sentinel_rateassertion — purpose-built FK quality gate
Single FK resolve¶
The simplest form maps a source column to a lookup business key and produces a surrogate key column:
match supports three sugar forms:
# String — single column, same name both sides
match: natural_id
# Equivalent to: { natural_id: natural_id }
# List — multiple columns, same names
match: [mandt, plant]
# Equivalent to: { mandt: mandt, plant: plant }
# Dict — explicit mapping (different column names)
match:
plant_code: natural_id
region: region_code
Sentinel values¶
Rows with incomplete or unmatched BKs receive sentinel values instead of nulls:
| Scenario | Default | Config field |
|---|---|---|
| Null/blank source column | -4 |
on_invalid |
| No match in lookup | -1 |
on_unknown |
These align with system member codes from dimension seeding.
SCD2 resolution — effective block¶
For slowly changing dimensions, the effective block narrows
the lookup to the correct version of each record.
Current flag¶
Filter to the active record using a boolean or coded column:
- resolve:
name: product_id
lookup: dim_product
match: product_code
pk: id
effective:
current: is_current
Custom value (e.g. "Y" instead of true):
Date range¶
Half-open interval [from, to) checked against a fact date
column. Null to matches as the current record:
- resolve:
name: product_id
lookup: dim_product
match: product_code
pk: id
effective:
date_column: order_date
from: effective_from
to: effective_to
Where predicate¶
General SQL filter applied to the lookup before joining:
effective and where compose with AND semantics — both
filters apply when present together.
Include columns¶
Bring additional columns from the lookup into the fact:
- resolve:
name: plant_id
lookup: dim_plant
match: plant_code
pk: id
include:
- description
- category
include_prefix: "plant_"
include supports list (keep names), dict (rename), or
string sugar (single column). include_prefix prepends a
string to all included column names.
Normalization¶
Normalize both source and lookup BK columns before joining:
Available presets: trim_lower, trim_upper, trim, none.
Dropping source columns¶
After resolution, the original BK source columns can be dropped:
Defaults to false. In batch mode, source columns are dropped
only after all FKs complete — preventing mid-batch failures
when the same source column is used by multiple resolutions.
Duplicate handling¶
When multiple lookup rows match a single fact row:
on_duplicate: warn # default — log warning, take first
on_duplicate: error # raise an exception
on_duplicate: first # take first silently
Batch mode¶
Resolve multiple FKs in one step with shared defaults:
- resolve:
pk: id
on_invalid: -4
on_unknown: -1
batch:
- name: plant_id
lookup: dim_plant
match: plant_code
- name: customer_id
lookup: dim_customer
match:
customer_code: natural_id
normalize: trim_lower
- name: product_id
lookup: dim_product
match: product_code
effective:
current: is_current
Shared defaults (pk, on_invalid, etc.) apply to all items.
Item-level values override the shared default. When
drop_source_columns: true, source columns are dropped only
after all FKs complete.
on_failure handling¶
Control behavior when a lookup cannot be found (single mode):
on_failure: abort # default — fail the thread
on_failure: warn # assign on_unknown to all rows, continue
In batch mode, a missing lookup always aborts regardless of
on_failure.
fk_sentinel_rate assertion¶
Post-write assertion that checks FK sentinel rates:
assertions:
# Single column, single sentinel
- type: fk_sentinel_rate
column: plant_id
sentinel: -4
max_rate: 0.05
message: "plant FK invalid rate exceeded"
# Named sentinel groups with per-group rates
- type: fk_sentinel_rate
column: plant_id
sentinels:
invalid:
value: -4
max_rate: 0.02
unknown:
value: -1
max_rate: 0.10
message: "plant FK quality check"
# Multiple columns, shared config
- type: fk_sentinel_rate
columns: [plant_id, customer_id, product_id]
sentinels:
invalid: -4
unknown: -1
max_rate: 0.10
Resolution statistics¶
Each resolve step records per-FK statistics in the step result metadata:
total— total fact rows processedmatched— rows with a successful FK matchunknown— rows assignedon_unknownsentinelinvalid— rows assignedon_invalidsentinelduplicates— rows with multiple matches (before dedup)match_rate— percentage of matched rows