Getting started¶
This section takes you from zero to a working DataCoolie pipeline. You will install the framework, run a pre-built quickstart, and then move on to building your own dataflows with real data.
What you will learn:
- How to install DataCoolie with the engine and format extras you need
- How to run a complete ETL pipeline on your laptop in under 5 minutes
- How to swap sample data for your own files and iterate
- How the metadata model drives every step — so you write less code
No prior ETL framework experience is required. If you are comfortable installing Python packages and running scripts, you have everything you need.
For most new users, the smoothest path is:
- Installation — install the smallest useful engine + table-format extras.
- Quickstart · Polars — laptop-only, no Docker, no JVM. Fastest way to see a pipeline run.
- Use your own data after the quickstart — keep the same runner and swap the sample input for your own files.
- Your first dataflow — move from one stage to an ordered bronze→silver flow.
Choose Quickstart · Spark instead of Polars only if Spark is already your target runtime or you want early parity with Fabric, Databricks, or another Spark-first environment.
Choose your route¶
| Situation | Start here | Then go to |
|---|---|---|
| I am new and want the first successful run | Installation | Quickstart · Polars |
| I already know my runtime will be Spark | Installation | Quickstart · Spark |
| The sample worked and now I need my own files | Use your own data after the quickstart | Metadata guide for new users |
| I want the deeper workflow model before building | Concepts | Metadata guide or the quickstarts |
| I need multi-stage orchestration | Your first dataflow | How-to guides |
Audience
This documentation is for both hands-on builders and readers who mainly want to understand the workflow. The getting-started pages are written for new data engineers, analytics engineers, and adjacent backend teams who want a concrete starting point before learning the full model.
If you do not need to run code yet, start with Concepts, especially Architecture, Metadata model, and Orchestration.
If you do want to run the examples, basic Python and command-line familiarity will help, and unfamiliar framework terms are defined or linked on first use.