Skip to content

Getting started

For most new users, the smoothest path is:

  1. Installation — install the smallest useful engine + table-format extras.
  2. Quickstart · Polars — laptop-only, no Docker, no JVM. Fastest way to see a pipeline run.
  3. Use your own data after the quickstart — keep the same runner and swap the sample input for your own files.
  4. Your first dataflow — move from one stage to an ordered bronze→silver flow.

Choose Quickstart · Spark instead of Polars only if Spark is already your target runtime or you want early parity with Fabric, Databricks, or another Spark-first environment.

Choose your route

Situation Start here Then go to
I am new and want the first successful run Installation Quickstart · Polars
I already know my runtime will be Spark Installation Quickstart · Spark
The sample worked and now I need my own files Use your own data after the quickstart Metadata guide for new users
I want the deeper workflow model before building Concepts Metadata guide or the quickstarts
I need multi-stage orchestration Your first dataflow How-to guides

Audience

This documentation is for both hands-on builders and readers who mainly want to understand the workflow. The getting-started pages are written for new data engineers, analytics engineers, and adjacent backend teams who want a concrete starting point before learning the full model.

If you do not need to run code yet, start with Concepts, especially Architecture, Metadata model, and Orchestration.

If you do want to run the examples, basic Python and command-line familiarity will help, and unfamiliar framework terms are defined or linked on first use.