Skip to content

DataCoolie

Configure database metadata

datacoolie/datacoolie

Configure database metadata¶

Prerequisites · pip install "datacoolie[db]" · an RDBMS the team can share · DDL privileges to create the metadata schema. End state · DatabaseProvider reading dc_framework_connections, dc_framework_dataflows, dc_framework_watermarks, and dc_framework_schema_hints tables.

Supported dialects¶

Any dialect supported by SQLAlchemy 2.x. The usecase-sim testbed ships DDL for SQLite, PostgreSQL, MySQL, MSSQL, and Oracle — see usecase-sim/metadata/database/.

Schema¶

erDiagram
    dc_framework_connections ||--o{ dc_framework_dataflows : "source / destination connection"
    dc_framework_dataflows ||--o{ dc_framework_schema_hints : "many"
    dc_framework_dataflows ||--o| dc_framework_watermarks : "per dataflow"

All tables are workspace-scoped (workspace_id column) and honour soft-delete (deleted_at IS NULL).

Loading¶

from datacoolie.metadata.database_provider import DatabaseProvider

provider = DatabaseProvider(
    connection_string="postgresql+psycopg2://user:pwd@host:5432/metadata",
    workspace_id="your-workspace-id",
)

Seeding¶

Use usecase-sim/scripts/setup_metadata.py --targets db:postgresql as a reference implementation. It:

Creates the schema via dialect-specific DDL.
Parses every canonical JSON metadata file.
INSERTs connections, dataflows, and children under a single transaction.

Concurrency notes¶

DatabaseProvider opens one short-lived connection per operation. It does not pin a session across the driver's parallel execution. Safe with max_workers > 1.
Watermarks are written with SELECT ... FOR UPDATE on dialects that support it, otherwise an optimistic upsert.

Concepts · Metadata providers