Skip to content

weevr

Configuration-driven execution framework for Spark in Microsoft Fabric.

weevr lets you declare data shaping intent in YAML and execute it as optimized, repeatable PySpark transformations. No code generation, no manual notebook orchestration — just deterministic, metadata-driven data pipelines.

Key features

  • Declarative YAML — Define sources, transforms, and targets in configuration
  • Spark-native — Executes via PySpark DataFrame APIs in Microsoft Fabric
  • Deterministic — Same config + inputs = same outputs, every time
  • 23 transform types — Filter, derive, join, aggregate, window, pivot, and more
  • DAG orchestration — Threads form weaves, weaves form looms, with automatic dependency resolution
  • Incremental processing — Watermark and CDC modes for efficient loads
  • Structured telemetry — Spans, events, and row counts for full observability

Quick start

pip install weevr
from weevr import Context

ctx = Context(spark, "my-project.weevr")
result = ctx.run("nightly.loom")

See the Your First Loom tutorial for a complete walkthrough.

How it works

YAML Configurationweevr EngineSpark + Delta LakeMicrosoft FabricThreadsources → steps → targetWeavethread DAG + lookupsLoomordered weaves + defaultsConfig Resolutionparse → validate → resolvePlannerDAG → groups → cacheExecutorread → transform → writeTelemetryspans + metricsDataFrame APIsDelta transactionsOneLake / LakehouseNotebooks / Pipelines interpretsexecutesreads / writes

CLI

weevr-cli is a standalone command-line companion for validating configs, inspecting schemas, and running dry-run operations outside of a notebook.

pip install weevr-cli

See the CLI documentation for usage and command reference.

Learn more

Tutorials Step-by-step guides to get started
How-to Guides Task-oriented recipes for common scenarios
Reference YAML schema, API docs, and configuration keys
Concepts Architecture, design principles, and mental models