Skip to content

Platforms

base

Abstract base class for platform (file system and secrets) operations.

Every concrete platform — Local, Fabric, Databricks, AWS — inherits from :class:BasePlatform and implements all abstract methods.

BasePlatform

BasePlatform(cache_ttl: int = 300, **kwargs: Any)

Bases: BaseSecretProvider

Abstract interface for file/directory operations and secret retrieval.

Platforms encapsulate the storage layer differences (local FS, ADLS, S3, DBFS) behind a uniform API so that the rest of the framework never touches os, shutil, or cloud SDKs directly.

Each platform also implements :meth:_fetch_secret so it can serve as its own :class:~datacoolie.core.secret_provider.BaseSecretProvider, using its existing SDK handle (env vars / notebookutils / dbutils / boto3).

16 abstract methods across five categories:

  • File I/O — read / write / append / delete text content
  • Directory Ops — create / delete / list files / list folders
  • Existence Checks — file_exists / folder_exists
  • File Management — upload / download / copy / move / get_file_info
  • Secrets_fetch_secret (inherited requirement from BaseSecretProvider)

append_file abstractmethod

append_file(path: str, content: str) -> None

Append text content to an existing file.

If the file does not exist, create it.

Parameters:

Name Type Description Default
path str

Absolute path or URI for the target file.

required
content str

Text to append.

required

Raises:

Type Description
PlatformError

On I/O failure.

copy_file abstractmethod

copy_file(src: str, dest: str, *, overwrite: bool = False) -> None

Copy a file from src to dest.

Parameters:

Name Type Description Default
src str

Source file path.

required
dest str

Destination file path.

required
overwrite bool

Allow overwriting an existing file at dest.

False

Raises:

Type Description
PlatformError

If source does not exist or dest exists without overwrite.

create_folder abstractmethod

create_folder(path: str) -> None

Create a directory (including any missing parents).

No-op if the directory already exists.

Parameters:

Name Type Description Default
path str

Absolute path or URI.

required

Raises:

Type Description
PlatformError

On failure.

delete_file abstractmethod

delete_file(path: str) -> None

Delete a file.

No-op if the file does not exist (idempotent).

Parameters:

Name Type Description Default
path str

Absolute path or URI to delete.

required

Raises:

Type Description
PlatformError

On I/O failure.

delete_folder abstractmethod

delete_folder(path: str, *, recursive: bool = False) -> None

Delete a directory.

Parameters:

Name Type Description Default
path str

Absolute path or URI.

required
recursive bool

If True, delete contents recursively.

False

Raises:

Type Description
PlatformError

If recursive is False and the directory is non-empty, or on I/O failure.

download_file abstractmethod

download_file(src: str, dest: str) -> None

Download a file from this platform to the local filesystem.

Parameters:

Name Type Description Default
src str

Source path on this platform.

required
dest str

Absolute path on the local OS filesystem to write to.

required

Raises:

Type Description
PlatformError

If the source does not exist or the download fails.

file_exists abstractmethod

file_exists(path: str) -> bool

Return True if the path refers to an existing file.

folder_exists abstractmethod

folder_exists(path: str) -> bool

Return True if the path refers to an existing directory.

get_file_info abstractmethod

get_file_info(path: str) -> FileInfo

Return metadata about a single file or directory.

Returns:

Name Type Description
A FileInfo

class:FileInfo instance with name, path, size,

FileInfo

modification_time (UTC-aware datetime or None), and

FileInfo

is_dir.

Raises:

Type Description
PlatformError

If the path does not exist.

list_files abstractmethod

list_files(path: str, *, recursive: bool = False, extension: str | None = None) -> list[FileInfo]

List files under path.

Each entry is a :class:FileInfo instance.

Parameters:

Name Type Description Default
path str

Directory path or URI.

required
recursive bool

Descend into sub-directories.

False
extension str | None

Filter by suffix (e.g. ".parquet").

None

Returns:

Type Description
list[FileInfo]

List of :class:FileInfo objects (directories excluded).

Raises:

Type Description
PlatformError

If the path does not exist or is not a directory.

list_folders abstractmethod

list_folders(path: str, *, recursive: bool = False) -> list[str]

List immediate (or recursive) sub-directories.

Parameters:

Name Type Description Default
path str

Directory path or URI.

required
recursive bool

Descend into sub-directories.

False

Returns:

Type Description
list[str]

List of directory paths.

Raises:

Type Description
PlatformError

If path is not a directory.

move_file abstractmethod

move_file(src: str, dest: str, *, overwrite: bool = False) -> None

Move (rename) a file from src to dest.

Parameters:

Name Type Description Default
src str

Source file path.

required
dest str

Destination file path.

required
overwrite bool

Allow overwriting an existing file at dest.

False

Raises:

Type Description
PlatformError

If source does not exist or dest exists without overwrite.

read_bytes abstractmethod

read_bytes(path: str) -> bytes

Return the entire file contents as bytes.

Implementations MUST NOT use bounded head()-style APIs that truncate large files. Prefer the platform's fastest native unbounded primitive (e.g. s3.get_object for AWS, direct open(path, "rb") for Unity Catalog Volumes, fs.cp to a temp file for other cloud paths).

Suitable for files that fit comfortably in memory.

Raises:

Type Description
PlatformError

If the file does not exist or cannot be read.

read_file abstractmethod

read_file(path: str) -> str

Read the entire content of a text file.

Parameters:

Name Type Description Default
path str

Absolute path or URI to the file.

required

Returns:

Type Description
str

File content as a string.

Raises:

Type Description
PlatformError

If the file does not exist or cannot be read.

upload_file abstractmethod

upload_file(local_path: str, dest: str, *, overwrite: bool = False) -> None

Upload a file from the local filesystem to the platform.

Parameters:

Name Type Description Default
local_path str

Absolute path on the local OS filesystem.

required
dest str

Destination path on this platform.

required
overwrite bool

Allow overwriting an existing file at dest.

False

Raises:

Type Description
PlatformError

If local_path does not exist or dest exists without overwrite.

write_bytes abstractmethod

write_bytes(path: str, data: bytes, *, overwrite: bool = False) -> None

Write raw bytes to path.

Symmetric counterpart of :meth:read_bytes. Prefer the platform's fastest native upload primitive. For very large payloads, write to a temp file and call :meth:upload_file directly.

Parameters:

Name Type Description Default
path str

Destination path or URI.

required
data bytes

Bytes to write.

required
overwrite bool

Allow overwriting an existing file at path.

False

Raises:

Type Description
PlatformError

If the file exists and overwrite is False, or on I/O failure.

write_file abstractmethod

write_file(path: str, content: str, *, overwrite: bool = False) -> None

Write text content to a file.

Parameters:

Name Type Description Default
path str

Absolute path or URI for the target file.

required
content str

Text to write.

required
overwrite bool

If True, overwrite an existing file; otherwise raise when the file already exists.

False

Raises:

Type Description
PlatformError

If the file exists and overwrite is False, or on I/O failure.

FileInfo dataclass

FileInfo(name: str, path: str, modification_time: Optional[datetime], size: int = 0, is_dir: bool = False)

Metadata for a single file-system entry.

Returned by :meth:BasePlatform.list_files and :meth:BasePlatform.get_file_info. frozen=True makes instances immutable and hashable; slots=True reduces per-instance memory.

local_platform

Local filesystem platform implementation.

Uses pathlib, os, and shutil for all operations. Suitable for local development, testing, and single-node environments.

LocalPlatform

LocalPlatform(base_path: str | None = None, cache_ttl: int = 300, **kwargs: Any)

Bases: BasePlatform

Platform backed by the local filesystem.

Also implements :meth:_fetch_secret via os.environ, so it can serve as its own secret provider in local / test environments.

Parameters:

Name Type Description Default
base_path str | None

Optional root directory. When set, all relative paths are resolved against this root.

None
cache_ttl int

Secret cache time-to-live in seconds (default 300). Pass 0 to disable caching.

300

read_bytes

read_bytes(path: str) -> bytes

Zero-overhead native read — no temp file.

write_bytes

write_bytes(path: str, data: bytes, *, overwrite: bool = False) -> None

Zero-overhead native write — no temp file.

aws_platform

AWS S3 platform implementation.

Uses boto3 for all file-system operations. The SDK is lazily imported so the class can be defined without boto3 installed (e.g. for registry registration).

AWSPlatform

AWSPlatform(bucket: str = '', region: str | None = None, profile: str | None = None, endpoint_url: str | None = None, cache_ttl: int = 300, **kwargs: Any)

Bases: BasePlatform

Platform backed by AWS S3 via boto3.

Also implements :meth:_fetch_secret via AWS Secrets Manager, so it can serve as its own secret provider.

Parameters:

Name Type Description Default
bucket str

Default S3 bucket name.

''
region str | None

AWS region for both S3 and Secrets Manager operations (e.g. "us-east-1").

None
profile str | None

Named AWS profile from ~/.aws/credentials.

None
endpoint_url str | None

Custom endpoint (for MinIO, LocalStack, etc.).

None
cache_ttl int

Secret cache time-to-live in seconds (default 300). Pass 0 to disable caching.

300

s3 property

s3: Any

Return a cached boto3 S3 client.

boto3_client

boto3_client(service: str, **kwargs: Any) -> Any

Create a boto3 service client using the platform's credentials.

Useful for accessing any AWS service (SQS, SNS, Glue, etc.) with the same profile/region/endpoint configuration as the platform. The underlying :pyattr:_session is created once and reused.

Parameters:

Name Type Description Default
service str

AWS service name (e.g. "sqs", "glue", "sns").

required
**kwargs Any

Extra keyword arguments forwarded to boto3.client(). Pass endpoint_url here to override the endpoint for any service. For S3, the platform-level endpoint_url is used automatically; for all other services it is ignored so that they always reach the real AWS endpoint.

{}

Returns:

Type Description
Any

A new boto3 service client.

Raises:

Type Description
PlatformError

If boto3 is unavailable.

delete_glue_table

delete_glue_table(database: str, table_name: str) -> None

Delete a Glue catalog table entry, ignoring if it does not exist.

Athena DDL does not support DROP TABLE for native Delta tables, so the Glue API is used directly for all table types.

Parameters:

Name Type Description Default
database str

Glue database name.

required
table_name str

Table name to delete.

required

download_file

download_file(src: str, dest: str) -> None

Download a file from S3 to the local filesystem.

Parameters:

Name Type Description Default
src str

S3 URI (s3://bucket/key) or plain key using the default bucket.

required
dest str

Absolute local filesystem path to write to.

required

Raises:

Type Description
PlatformError

On download failure.

execute_athena_ddl

execute_athena_ddl(sql: str, *, database: str | None = None, output_location: str = '') -> str

Execute a DDL statement via Athena and wait for completion.

Parameters:

Name Type Description Default
sql str

The SQL/DDL statement to execute.

required
database str | None

Optional Athena database context.

None
output_location str

S3 location for query results (e.g. "s3://bucket/athena-results/").

''

Returns:

Type Description
str

The Athena query execution ID.

Raises:

Type Description
PlatformError

If the query fails, is cancelled, or times out.

glue_table_exists

glue_table_exists(database: str, table_name: str) -> bool

Return True if a Glue catalog table entry exists.

Uses the Glue get_table API. Never raises; returns False on any error including a missing database, missing table, or network failure.

Parameters:

Name Type Description Default
database str

Glue database name.

required
table_name str

Table name to check.

required

read_bytes

read_bytes(path: str) -> bytes

In-memory read — single GET for small files, multipart for large.

Avoids the temp-file round-trip of the base implementation.

register_delta_table

register_delta_table(table_name: str, path: str, *, database: str, output_location: str = '', recreate: bool = False) -> None

Register a native Delta table in the Glue Catalog via Athena.

Creates an external table with TBLPROPERTIES ('table_type'='DELTA') so Athena v3 can query it natively.

When recreate is True the existing Glue catalog entry is deleted first (via the Glue API, not Athena DDL which does not support DROP TABLE for Delta) so that the registration reflects schema evolution. When recreate is False (the default) the CREATE is issued directly — idempotent and avoids a redundant delete on every write.

Parameters:

Name Type Description Default
table_name str

Table name (without database prefix).

required
path str

S3 path to the Delta table root.

required
database str

Glue/Athena database name (required).

required
output_location str

S3 location for Athena query results.

''
recreate bool

When True, delete the existing Glue entry before creating so schema changes are reflected.

False
register_symlink_table(table_name: str, path: str, *, database: str, output_location: str = '', schema_ddl: str = '', partition_ddl: str = '', recreate: bool = False, run_msck: bool = True) -> None

Register a symlink-based table in the Glue Catalog.

Creates a table using SymlinkTextInputFormat pointing at the _symlink_format_manifest/ directory of a Delta table, for use with Redshift Spectrum.

When recreate is True the Glue catalog entry is deleted first so that schema or partition changes are picked up. When False (the default) only a CREATE is issued — idempotent on first write.

When run_msck is True (the default) and partition_ddl is provided, MSCK REPAIR TABLE is run after the CREATE so that newly written partition paths are visible to Athena and Redshift.

Parameters:

Name Type Description Default
table_name str

Table name.

required
path str

S3 path to the Delta table root (/_symlink_format_manifest/ is appended automatically).

required
database str

Glue/Athena database name (required).

required
output_location str

S3 location for Athena query results.

''
schema_ddl str

Column definitions, e.g. "id INT, name STRING, event_date STRING".

''
partition_ddl str

PARTITIONED BY clause, e.g. "PARTITIONED BY (year STRING, month STRING)".

''
recreate bool

When True, delete the Glue entry before creating.

False
run_msck bool

When True and table is partitioned, run MSCK REPAIR TABLE after the CREATE.

True

repair_table_partitions

repair_table_partitions(table_name: str, *, database: str, output_location: str = '') -> None

Run MSCK REPAIR TABLE to sync partition metadata in the Glue Catalog.

Should be called whenever new partition paths have been written to S3 so the Glue/Athena catalog discovers them without a full table recreate.

Parameters:

Name Type Description Default
table_name str

Table name to repair.

required
database str

Glue/Athena database name.

required
output_location str

S3 location for Athena query results.

''

write_bytes

write_bytes(path: str, data: bytes, *, overwrite: bool = False) -> None

Native single-call upload — no temp file, no extra disk write.

Uses put_object for payloads at or below the inline threshold and upload_fileobj (managed multipart) for larger payloads.

fabric_platform

Microsoft Fabric platform implementation.

Uses notebookutils for all operations. The module is lazily imported so the class can be defined outside a Fabric notebook without error (e.g. for registry registration). All notebookutils services are accessible via the :attr:notebookutils property (fs, credentials, mssparkutils, etc.).

FabricPlatform

FabricPlatform(cache_ttl: int = 300, **kwargs: Any)

Bases: BasePlatform

Platform backed by Microsoft Fabric notebookutils.

Also implements :meth:_fetch_secret via notebookutils.credentials, so it can serve as its own secret provider inside Fabric notebooks.

The full notebookutils module is available via the :attr:notebookutils property, giving access to all services: fs, credentials, mssparkutils, and more.

Requires execution in a Fabric notebook environment where notebookutils is pre-installed.

Parameters:

Name Type Description Default
cache_ttl int

Secret cache time-to-live in seconds (default 300). Pass 0 to disable caching.

300

fs property

fs: Any

Return notebookutils.fs, resolving lazily on first use.

notebookutils property

notebookutils: Any

Return the notebookutils module, importing lazily on first use.

Provides access to all notebookutils services: fs, credentials, mssparkutils, and more.

download_file

download_file(src: str, dest: str) -> None

Download a file from Fabric storage to the local filesystem.

Uses notebookutils.fs.cp to copy src to the local path dest.

Parameters:

Name Type Description Default
src str

Source path on Fabric storage (e.g. "abfss://container@account.dfs.core.windows.net/path").

required
dest str

Absolute local filesystem path to write to.

required

Raises:

Type Description
PlatformError

On download failure.

read_bytes

read_bytes(path: str) -> bytes

Return the entire file contents as bytes.

Downloads path to a temp file via notebookutils.fs.cp (download_file) then reads it back — the only reliable unbounded path on Fabric (fs.head truncates large files).

read_file

read_file(path: str) -> str

Read a remote file by downloading it to a temp file and reading locally.

Using :meth:download_file (notebookutils.fs.cp) guarantees the full file content is read regardless of size, unlike fs.head which may truncate large files even with a high maxBytes limit.

write_bytes

write_bytes(path: str, data: bytes, *, overwrite: bool = False) -> None

Write raw bytes to path.

Writes data to a local temp file then pushes it via notebookutils.fs.cp (upload_file).

databricks_platform

Databricks platform implementation.

Primary backend: Unity Catalog Volumes (paths starting with /Volumes/). These are accessed via standard Python I/O (os, open, shutil), which is the Databricks-recommended approach for UC Volume paths — lower latency, standard semantics, no driver-side serialisation overhead.

Fallback backend: dbutils.fs for all other path schemes (dbfs:/, abfss://, wasbs://, s3://, legacy DBFS mounts, etc.).

Secrets always use dbutils.secrets (no alternative exists on Databricks).

DatabricksPlatform

DatabricksPlatform(dbutils: Any | None = None, cache_ttl: int = 300, **kwargs: Any)

Bases: BasePlatform

Platform backed by Databricks.

Routes file operations based on path type:

  • Unity Catalog Volumes (/Volumes/...) — uses os, open, shutil. This is the Databricks-recommended approach for UC Volumes.
  • Other paths (dbfs:/, abfss://, wasbs://, s3://, legacy DBFS mounts) — falls back to dbutils.fs.

Secrets always use dbutils.secrets.

The full dbutils handle is available via the :attr:dbutils property, giving access to all services: secrets, widgets, notebook, fs, and more.

Parameters:

Name Type Description Default
dbutils Any | None

Pre-existing dbutils handle. If None, the platform resolves it lazily from the Spark session or IPython.

None
cache_ttl int

Secret cache time-to-live in seconds (default 300). Pass 0 to disable caching.

300

dbutils property

dbutils: Any

Return the full dbutils handle, resolving lazily on first use.

fs property

fs: Any

Return dbutils.fs, resolving lazily on first use.

download_file

download_file(src: str, dest: str) -> None

Download a file to the local filesystem.

Volume path → shutil.copy2 (direct OS copy). Other paths → dbutils.fs.cp with file: prefix on dest.

Parameters:

Name Type Description Default
src str

Source path (Unity Catalog Volume or DBFS/ADLS URI).

required
dest str

Absolute local OS filesystem path to write to.

required

Raises:

Type Description
PlatformError

On download failure.

read_bytes

read_bytes(path: str) -> bytes

Return the entire file contents as bytes.

Volume path → open(path, "rb").read() (zero-copy, no temp file). Other paths → dbutils.fs.cp to a temp file then read back.

read_file

read_file(path: str) -> str

Read a text file.

Volume path → open() directly. Other paths → dbutils.fs.cp to a temp file, then read locally. Using fs.cp (rather than fs.head) guarantees the full content is returned regardless of file size.

write_bytes

write_bytes(path: str, data: bytes, *, overwrite: bool = False) -> None

Write raw bytes to path.

Volume path → makedirs + open(path, "wb").write(data) (zero-copy). Other paths → write to a temp file then push via dbutils.fs.cp. Using fs.put is deliberately avoided: it only accepts strings and would corrupt arbitrary binary data.