Skip to content

Installation

DataCoolie is published to PyPI. The base package stays light; engines, cloud SDKs, and metadata backends are opt-in extras so you only install what you need.

For most new users, start with the smallest setup that can run a real pipeline:

pip install "datacoolie[polars,deltalake]"

If you already know Spark is your main runtime, install Spark + Delta instead:

pip install "datacoolie[spark,delta-spark]"

Use datacoolie[all] for contributor machines or broad local experimentation. The bare pip install datacoolie package is mainly useful for extension work, API exploration, or environments where another package layer supplies the runtime dependencies.

Quick decision table

You want to do first Install
Fastest first success on one machine pip install "datacoolie[polars,deltalake]"
Spark/Fabric/Databricks-style local validation pip install "datacoolie[spark,delta-spark]"
Try many engines, platforms, and metadata backends locally pip install "datacoolie[all]"
Only inspect APIs or develop extensions pip install datacoolie

Pick your extras

# Minimal (no engine extras; rarely useful on its own)
pip install datacoolie

# Most common: one engine + one table format
pip install "datacoolie[polars,deltalake]"
pip install "datacoolie[spark,delta-spark]"

# Everything
pip install "datacoolie[all]"

Extras reference

Extra Installs Use when
spark pyspark>=3.5 You want the Spark engine.
polars polars>=1.0 You want the Polars engine.
delta-spark delta-spark>=3.0 Spark + Delta Lake.
deltalake deltalake>=0.15 Polars + Delta Lake (delta-rs).
iceberg pyiceberg>=0.6 Apache Iceberg tables (any engine).
aws boto3>=1.28 AWSPlatform (S3, Secrets Manager, Glue).
api httpx>=0.24 APIReader and the API metadata provider.
db sqlalchemy>=2.0 DatabaseProvider for metadata stored in an RDBMS.
excel fastexcel, openpyxl Reading Excel files as sources or metadata.
fabric spark + delta-spark + polars + deltalake Microsoft Fabric notebooks / Spark pools.
databricks spark + delta-spark + polars + deltalake Databricks Runtime.
all everything above Kitchen-sink local dev.

System requirements

Component Minimum Notes
Python 3.11 Strict — the framework uses PEP 604 unions and TypeVar defaults.
Java 17 for Spark Only required when using SparkEngine.
RAM 4 GB (Polars) · 8 GB (Spark) Per executor; scale with your data.
Disk Depends on your lakehouse layout.

Windows timezones

On Windows, Python's zoneinfo needs tzdata to resolve IANA zones. tzdata is pulled in automatically via the sys_platform == 'win32' marker in pyproject.toml. If you vendor a custom wheel, install tzdata explicitly.

Verify

import datacoolie

print(datacoolie.__version__)

# Listing registered plugins proves the install wired up entry points.
print(sorted(datacoolie.engine_registry.names()))
print(sorted(datacoolie.platform_registry.names()))

Expected output (with [all]):

0.1.1
['polars', 'spark']
['aws', 'databricks', 'fabric', 'local']

If one of your engines is missing, the extra for it is not installed — see the table above.

Common beginner trap

If import datacoolie works but your quickstart still cannot create an engine or read/write a table, you almost always installed the base package without the engine or table-format extra you need.

Next