Skip to content

Tutorial

Python ETL Tutorial for Beginners — Build Your First Data Pipeline

If you work with data, you have probably heard the term "ETL" but never had a clear explanation of what it means or how to build one yourself. This tutorial starts from zero — no prior ETL experience needed — and walks you through building a working data pipeline in Python.

By the end you will understand what ETL is, why naive approaches break down at scale, and how a metadata-driven framework like DataCoolie makes pipelines portable, repeatable, and easy to maintain.

Implementing SCD Type 2 in Python with Delta Lake

Slowly Changing Dimension Type 2 (SCD2) is a data warehousing pattern that preserves the full history of dimension changes. When a customer changes their address, SCD2 keeps both the old and new records with effective date ranges — so you can join facts to the correct dimension state at any point in time.

Implementing SCD2 correctly is harder than it looks. This post shows how DataCoolie handles it declaratively with metadata instead of hand-coded merge logic.

How to Build Cloud-Agnostic Data Pipelines in Python

Moving a data pipeline from one cloud to another usually means rewriting file I/O, secrets management, and authentication code. Platform lock-in — where pipeline code is tightly coupled to a specific cloud's APIs and paths — isn't a theoretical problem. It's the reason data teams maintain parallel codebases for the same business logic.

This post shows how to build pipelines that run on local machines, AWS Glue, Microsoft Fabric, and Databricks without code changes.