QuickETL¶
Fast & Flexible Python ETL Framework
Build data pipelines in YAML or Python with support for 20+ compute backends.
Why QuickETL?¶
QuickETL provides a unified API for data transformation across multiple compute engines. Write your pipeline once, run it anywhere - from local DuckDB to distributed Spark to cloud warehouses like Snowflake and BigQuery.
-
Configuration-Driven
Define pipelines in YAML with variable substitution. No code required for common ETL patterns.
-
Multi-Backend
Run the same pipeline on DuckDB, Polars, Spark, Snowflake, BigQuery, and more via Ibis.
-
Quality Checks
Built-in data quality validation with not_null, unique, row_count, and custom expressions.
-
CLI & API
Use the
quicketlCLI for quick runs or the Python API for programmatic control.
Quick Example¶
name: sales_summary
engine: duckdb
source:
type: file
path: data/sales.csv
format: csv
transforms:
- op: filter
predicate: amount > 0
- op: aggregate
group_by: [region]
aggs:
total_sales: sum(amount)
order_count: count(*)
checks:
- type: not_null
columns: [region, total_sales]
sink:
type: file
path: output/sales_by_region.parquet
format: parquet
from quicketl import Pipeline
from quicketl.config.models import FileSource, FileSink
from quicketl.config.transforms import FilterTransform, AggregateTransform
pipeline = (
Pipeline("sales_summary", engine="duckdb")
.source(FileSource(path="data/sales.csv", format="csv"))
.transform(FilterTransform(predicate="amount > 0"))
.transform(AggregateTransform(
group_by=["region"],
aggs={"total_sales": "sum(amount)", "order_count": "count(*)"}
))
.sink(FileSink(path="output/sales_by_region.parquet"))
)
result = pipeline.run()
print(f"Processed {result.rows_processed} rows")
Installation¶
# Basic installation (includes DuckDB + Polars)
pip install quicketl
# With cloud storage support
pip install quicketl[aws] # S3
pip install quicketl[gcp] # GCS + BigQuery
pip install quicketl[azure] # Azure ADLS
# With additional backends
pip install quicketl[spark]
pip install quicketl[snowflake]
Features¶
12 Transform Operations¶
| Transform | Description |
|---|---|
select |
Choose and reorder columns |
rename |
Rename columns |
filter |
Filter rows with SQL predicates |
derive_column |
Create computed columns |
cast |
Convert column types |
fill_null |
Replace null values |
dedup |
Remove duplicates |
sort |
Order rows |
join |
Join datasets |
aggregate |
Group and aggregate |
union |
Combine datasets vertically |
limit |
Limit row count |
5 Quality Checks¶
| Check | Description |
|---|---|
not_null |
Ensure columns have no null values |
unique |
Verify uniqueness constraints |
row_count |
Validate row count bounds |
accepted_values |
Check values against whitelist |
expression |
Custom SQL predicate validation |
Supported Backends¶
| Backend | Type | Included |
|---|---|---|
| DuckDB | Local | Yes |
| Polars | Local | Yes |
| DataFusion | Local | Optional |
| Spark | Distributed | Optional |
| Snowflake | Cloud DW | Optional |
| BigQuery | Cloud DW | Optional |
| PostgreSQL | Database | Optional |
| MySQL | Database | Optional |
Next Steps¶
-
Getting Started
Install QuickETL and run your first pipeline in 5 minutes.
-
User Guide
Learn about transforms, quality checks, and configuration.
-
API Reference
Explore the Python API for programmatic pipeline building.
-
Examples
See complete examples for common ETL patterns.