Skip to content

Config API

The weevr.config module handles YAML parsing, schema validation, configuration inheritance, and parameter resolution across the loom/weave/thread hierarchy.

weevr.config

Configuration loading and validation.

ConfigLocation

Bases: ABC

Abstract reference to a config file or directory.

Implementations encapsulate a single addressing scheme (local filesystem or remote Hadoop URI) and expose the minimal surface area the config pipeline needs: joining, existence checks, text reads, name and parent derivation, and a containment check used for path-traversal protection.

name abstractmethod property

The final path segment, including any extension.

stem abstractmethod property

The final path segment without its extension.

suffix abstractmethod property

The file extension including the leading dot, or empty string.

parent abstractmethod property

The location one level up.

join(rel) abstractmethod

Resolve rel against this location and return a new location.

rel must be a relative path. Implementations normalize .. and . segments and reject inputs that look absolute.

exists() abstractmethod

Return whether the underlying file or directory exists.

read_text() abstractmethod

Return the file contents decoded as UTF-8.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

OSError

For any other I/O failure.

is_relative_to(other) abstractmethod

Return whether this location is contained within other.

__str__() abstractmethod

Return the canonical path or URI for diagnostics and logging.

__fspath__()

Return the string form for os.fspath consumers.

Remote implementations return their URI; using the result with the local filesystem will fail loudly, which is the desired behavior.

LocalConfigLocation

Bases: ConfigLocation

A :class:ConfigLocation backed by a local filesystem path.

path property

The underlying :class:pathlib.Path.

name property

The final path segment.

stem property

The final path segment without its extension.

suffix property

The file extension including the leading dot.

parent property

The parent directory as a :class:LocalConfigLocation.

__init__(path)

Wrap an absolute or relative :class:Path.

Parameters:

Name Type Description Default
path Path

The path to wrap. Stored as-is; resolution and existence checks are performed lazily by individual methods.

required

join(rel)

Join rel to the underlying path.

Parameters:

Name Type Description Default
rel str

Relative path to append.

required

Returns:

Type Description
ConfigLocation

A new :class:LocalConfigLocation.

Raises:

Type Description
ValueError

If rel is an absolute path.

exists()

Return whether the path exists on the local filesystem.

read_text()

Read the file as UTF-8 text.

is_relative_to(other)

Return whether this path is contained within other.

Both sides are resolved to absolute paths before comparison so relative segments such as .. are honored. Comparing across location types always returns False.

__str__()

Return the underlying path as a string.

__repr__()

Developer-friendly representation.

__eq__(other)

Two locations are equal when their underlying paths are equal.

__hash__()

Hash by underlying path.

ConfigError

Bases: WeevError

Base exception for configuration-related errors.

__init__(message, cause=None, file_path=None, config_key=None)

Initialize ConfigError.

Parameters:

Name Type Description Default
message str

Human-readable error message

required
cause Exception | None

Optional underlying exception

None
file_path str | None

Path to the config file where the error occurred

None
config_key str | None

Specific config key that caused the error

None

__str__()

Return string representation with context.

ModelValidationError

Bases: ConfigError

Raised when a fully resolved config fails to hydrate into a typed model.

This occurs after variable resolution and inheritance, when the concrete values are validated through the Pydantic domain model (Thread, Weave, or Loom). Semantic constraints that span multiple fields are checked here.

Loom

Bases: FrozenBase

A deployment unit containing weave references with optional shared defaults.

Thread

Bases: FrozenBase

Complete domain model for a thread configuration.

A thread is the smallest unit of work: one or more sources, a sequence of transformation steps, and a single target.

Weave

Bases: FrozenBase

A collection of thread references with optional shared defaults.

apply_inheritance(loom_defaults, weave_defaults, thread_config, *, loom_audit_templates=None, weave_audit_templates=None, loom_connections=None, weave_connections=None)

Apply multi-level inheritance cascade.

Cascade order (lowest to highest priority): 1. loom_defaults (lowest) 2. weave_defaults 3. thread_config (highest)

Parameters:

Name Type Description Default
loom_defaults dict[str, Any] | None

Defaults from loom level

required
weave_defaults dict[str, Any] | None

Defaults from weave level

required
thread_config dict[str, Any]

Thread-specific config

required
loom_audit_templates dict[str, Any] | None

User-defined audit template definitions from loom

None
weave_audit_templates dict[str, Any] | None

User-defined audit template definitions from weave

None
loom_connections dict[str, Any] | None

Named connection definitions from loom top-level

None
weave_connections dict[str, Any] | None

Named connection definitions from weave top-level

None

Returns:

Type Description
dict[str, Any]

Fully merged config with thread values taking precedence

make_location(path_or_uri, spark=None)

Construct a :class:ConfigLocation from a path, URI, or existing location.

A string containing :// is treated as a remote URI and requires a spark session. Anything else is treated as a local filesystem path. Existing :class:ConfigLocation inputs are returned unchanged.

Parameters:

Name Type Description Default
path_or_uri str | Path | ConfigLocation

A local path, a URI string, or an existing :class:ConfigLocation.

required
spark SparkSession | None

Active :class:SparkSession. Required when path_or_uri is a remote URI.

None

Returns:

Name Type Description
A ConfigLocation

class:ConfigLocation instance.

Raises:

Type Description
ValueError

If a remote URI is supplied without a spark session.

expand_foreach(steps)

Expand foreach macro blocks into repeated step sequences.

Each foreach block in the steps list is replaced by its template steps repeated once per value, with {var} placeholders substituted.

Non-foreach entries pass through unchanged.

Parameters:

Name Type Description Default
steps list[dict[str, Any]]

Raw step list (dicts), possibly containing foreach blocks.

required

Returns:

Type Description
list[dict[str, Any]]

Expanded step list with all foreach blocks replaced.

Raises:

Type Description
ConfigError

If a foreach block is missing required fields.

detect_config_type(raw)

Detect the type of config from its structure.

Parameters:

Name Type Description Default
raw dict[str, Any]

Parsed config dictionary

required

Returns:

Type Description
str

Config type: 'thread', 'weave', 'loom', or 'params'

Raises:

Type Description
ConfigParseError

If config type cannot be determined

detect_config_type_from_extension(path)

Detect config type from file extension.

Parameters:

Name Type Description Default
path str | Path | ConfigLocation

Path or location of the config file.

required

Returns:

Type Description
str | None

Config type string if the extension is a typed extension

str | None

(.thread, .weave, .loom), or None for

str | None

.yaml/.yml files.

Raises:

Type Description
ConfigError

If the extension is .yaml or .yml and a typed extension was expected (caller context).

extract_config_version(raw)

Extract and parse config_version from a config dict.

Parameters:

Name Type Description Default
raw dict[str, Any]

Parsed config dictionary

required

Returns:

Type Description
tuple[int, int]

Tuple of (major, minor) version numbers

Raises:

Type Description
ConfigParseError

If config_version is missing or invalid format

parse_yaml(path)

Parse a YAML file and return its contents.

Parameters:

Name Type Description Default
path str | Path | ConfigLocation

A local path, a URI string, or a :class:ConfigLocation. Local path inputs are wrapped in a :class:LocalConfigLocation automatically. Remote URIs require the caller to construct a :class:RemoteConfigLocation themselves so they can supply the active SparkSession.

required

Returns:

Type Description
dict[str, Any]

Parsed YAML content as a dictionary.

Raises:

Type Description
ConfigParseError

If the file is not found, unreadable, or contains invalid YAML syntax.

validate_config_version(version, config_type)

Validate that the config version is supported.

Parameters:

Name Type Description Default
version tuple[int, int]

Tuple of (major, minor) version

required
config_type str

Type of config (thread, weave, loom, params)

required

Raises:

Type Description
ConfigVersionError

If the major version doesn't match supported version

build_param_context(runtime_params=None, config_defaults=None, fabric_context=None, entry_params=None)

Build parameter context with proper priority layering.

Priority order (highest to lowest): 1. runtime_params 2. entry_params (nested under param key for ${param.x} access) 3. config_defaults 4. fabric_context

Parameters:

Name Type Description Default
runtime_params dict[str, Any] | None

Runtime parameter overrides.

None
config_defaults dict[str, Any] | None

Default parameters from config.

None
fabric_context dict[str, Any] | None

Fabric environment values keyed as fabric.<field> (e.g. fabric.workspace_id). None values are omitted. Lowest priority — overridden by config_defaults and runtime_params.

None
entry_params dict[str, Any] | None

ThreadEntry-level parameters injected under the param namespace for ${param.x} dotted-key resolution.

None

Returns:

Type Description
dict[str, Any]

Merged parameter context dictionary with dotted key access support.

resolve_declared_params(param_specs, runtime_params, *, file_path=None)

Resolve loom/weave-level declared params: to a flat {name: value} dict.

Precedence per declared param:

  1. Value from runtime_params if supplied
  2. ParamSpec.default if set on the spec
  3. ConfigSchemaError if the param is required
  4. Omitted from the result if optional with no default

The returned dict is intended to bind under the param.* namespace via :func:build_param_context's entry_params argument, so that ${param.x} expressions resolve to the layered value.

Runtime keys not declared in param_specs are ignored — declared scope is the only contract honored at this layer.

Parameters:

Name Type Description Default
param_specs dict[str, Any] | None

Mapping of declared params (ParamSpec instances or their model_dump dicts). None or empty returns {}.

required
runtime_params dict[str, Any] | None

Caller-supplied values keyed by param name.

required
file_path str | None

Optional originating config file path for error messages.

None

Returns:

Type Description
dict[str, Any]

Resolved {name: value} dict ready for entry_params.

Raises:

Type Description
ConfigSchemaError

A required declared param has no runtime value and no spec default. The message includes the file path and param name.

resolve_references(config, config_type, project_root, runtime_params=None, visited=None)

Resolve references to other config files.

Handles both external references (ref key) and inline definitions (name + body keys). Recursively loads referenced configs with circular reference detection.

Parameters:

Name Type Description Default
config dict[str, Any]

Config dict to resolve references in.

required
config_type str

Type of this config ('weave' or 'loom').

required
project_root ConfigLocation | Path

The .weevr project directory. A bare :class:pathlib.Path is wrapped in a :class:LocalConfigLocation for backward compatibility.

required
runtime_params dict[str, Any] | None

Runtime parameters to pass to child configs.

None
visited set[str] | None

Set of already-visited ref strings (for cycle detection).

None

Returns:

Type Description
dict[str, Any]

Config dict with resolved child configs attached under

dict[str, Any]

'_resolved_threads' or '_resolved_weaves' keys.

Raises:

Type Description
ReferenceResolutionError

If referenced file not found or circular reference detected.

ConfigError

If an inline definition is missing a name field.

resolve_variables(config, context, consumed_keys=None)

Recursively resolve variable references in config.

Supports: - ${var} - simple variable reference (error if not found) - ${var:-default} - variable with fallback default

Parameters:

Name Type Description Default
config dict[str, Any] | list[Any] | str | Any

Config structure to resolve (dict, list, str, or primitive)

required
context dict[str, Any]

Parameter context for variable lookup

required
consumed_keys set[str] | None

Optional set to track which context keys were consumed during resolution. When provided, each resolved dotted key is added to the set for post-resolution unused-param analysis.

None

Returns:

Type Description
Any

Config with all variables resolved

Raises:

Type Description
VariableResolutionError

If variable not found and no default provided

validate_params(param_specs, context)

Validate parameters against their type specifications.

Parameters:

Name Type Description Default
param_specs dict[str, Any] | None

Parameter specifications from config

required
context dict[str, Any]

Actual parameter values

required

Raises:

Type Description
ConfigSchemaError

If required params missing or type mismatches

validate_schema(raw, config_type)

Validate a raw config dict against the appropriate pre-resolution schema.

Parameters:

Name Type Description Default
raw dict[str, Any]

Raw config dictionary (variables not yet resolved)

required
config_type str

Type of config (thread, weave, loom, params)

required

Returns:

Type Description
BaseModel

Validated Pydantic model instance

Raises:

Type Description
ConfigSchemaError

If validation fails

_derive_config_name(path)

Derive the component name from a config location.

Returns the filename stem — the filename without the typed extension. For example, dim_customer.thread returns 'dim_customer'.

load_config(path, runtime_params=None, project_root=None)

Load and validate a weevr configuration file.

This function orchestrates the full config loading pipeline: 1. Parse YAML file 2. Extract and validate config_version 3. Detect config type (extension-based for components, content-based for params) 4. Validate schema with Pydantic 5. Build parameter context (runtime > defaults) 6. Resolve variable references (${var} and ${var:-default}) 7. Resolve references to child configs (threads, weaves) 8. Apply inheritance cascade (loom -> weave -> thread) 9. Validate name against filename stem 10. Hydrate into typed domain model (thread, weave, loom only)

Parameters:

Name Type Description Default
path str | Path | ConfigLocation

Path or :class:ConfigLocation for the config file (thread, weave, or loom). Local paths are wrapped automatically; remote URIs must be supplied as a :class:RemoteConfigLocation so the caller can attach a SparkSession.

required
runtime_params dict[str, Any] | None

Optional runtime parameter overrides.

None
project_root Path | ConfigLocation | None

The .weevr project directory. Required for configs that reference other files. Accepts a :class:ConfigLocation or a bare :class:pathlib.Path.

None

Returns:

Type Description
Thread | Weave | Loom | dict[str, Any]

A frozen, typed domain model instance (Thread, Weave, or Loom) for

Thread | Weave | Loom | dict[str, Any]

thread/weave/loom config types. Returns a plain dict for params configs.

Raises:

Type Description
ConfigParseError

YAML syntax errors, file not found

ConfigVersionError

Unsupported config version

ConfigSchemaError

Schema validation failures

ConfigError

Extension or name validation failures

VariableResolutionError

Unresolved variables without defaults

ReferenceResolutionError

Missing referenced files, circular dependencies

ModelValidationError

Semantic validation failures during model hydration