Skip to content

Platforms

base

Abstract base class for platform (file system and secrets) operations.

Every concrete platform — Local, Fabric, Databricks, AWS — inherits from :class:BasePlatform and implements all abstract methods.

BasePlatform

BasePlatform(cache_ttl: int = 300, **kwargs: Any)

Bases: BaseSecretProvider

Abstract interface for file/directory operations and secret retrieval.

Platforms encapsulate the storage layer differences (local FS, ADLS, S3, DBFS) behind a uniform API so that the rest of the framework never touches os, shutil, or cloud SDKs directly.

Each platform also implements :meth:_fetch_secret so it can serve as its own :class:~datacoolie.core.secret_provider.BaseSecretProvider, using its existing SDK handle (env vars / notebookutils / dbutils / boto3).

16 abstract methods across five categories:

  • File I/O — read / write / append / delete text content
  • Directory Ops — create / delete / list files / list folders
  • Existence Checks — file_exists / folder_exists
  • File Management — upload / download / copy / move / get_file_info
  • Secrets_fetch_secret (inherited requirement from BaseSecretProvider)

append_file abstractmethod

append_file(path: str, content: str) -> None

Append text content to an existing file.

If the file does not exist, create it.

Parameters:

Name Type Description Default
path str

Absolute path or URI for the target file.

required
content str

Text to append.

required

Raises:

Type Description
PlatformError

On I/O failure.

copy_file abstractmethod

copy_file(src: str, dest: str, *, overwrite: bool = False) -> None

Copy a file from src to dest.

Parameters:

Name Type Description Default
src str

Source file path.

required
dest str

Destination file path.

required
overwrite bool

Allow overwriting an existing file at dest.

False

Raises:

Type Description
PlatformError

If source does not exist or dest exists without overwrite.

create_folder abstractmethod

create_folder(path: str) -> None

Create a directory (including any missing parents).

No-op if the directory already exists.

Parameters:

Name Type Description Default
path str

Absolute path or URI.

required

Raises:

Type Description
PlatformError

On failure.

delete_file abstractmethod

delete_file(path: str) -> None

Delete a file.

No-op if the file does not exist (idempotent).

Parameters:

Name Type Description Default
path str

Absolute path or URI to delete.

required

Raises:

Type Description
PlatformError

On I/O failure.

delete_folder abstractmethod

delete_folder(path: str, *, recursive: bool = False) -> None

Delete a directory.

Parameters:

Name Type Description Default
path str

Absolute path or URI.

required
recursive bool

If True, delete contents recursively.

False

Raises:

Type Description
PlatformError

If recursive is False and the directory is non-empty, or on I/O failure.

download_file abstractmethod

download_file(src: str, dest: str) -> None

Download a file from this platform to the local filesystem.

Parameters:

Name Type Description Default
src str

Source path on this platform.

required
dest str

Absolute path on the local OS filesystem to write to.

required

Raises:

Type Description
PlatformError

If the source does not exist or the download fails.

file_exists abstractmethod

file_exists(path: str) -> bool

Return True if the path refers to an existing file.

folder_exists abstractmethod

folder_exists(path: str) -> bool

Return True if the path refers to an existing directory.

get_file_info abstractmethod

get_file_info(path: str) -> FileInfo

Return metadata about a single file or directory.

Returns:

Name Type Description
A FileInfo

class:FileInfo instance with name, path, size,

FileInfo

modification_time (UTC-aware datetime or None), and

FileInfo

is_dir.

Raises:

Type Description
PlatformError

If the path does not exist.

list_files abstractmethod

list_files(path: str, *, recursive: bool = False, extension: str | None = None) -> list[FileInfo]

List files under path.

Each entry is a :class:FileInfo instance.

Parameters:

Name Type Description Default
path str

Directory path or URI.

required
recursive bool

Descend into sub-directories.

False
extension str | None

Filter by suffix (e.g. ".parquet").

None

Returns:

Type Description
list[FileInfo]

List of :class:FileInfo objects (directories excluded).

Raises:

Type Description
PlatformError

If the path does not exist or is not a directory.

list_folders abstractmethod

list_folders(path: str, *, recursive: bool = False) -> list[str]

List immediate (or recursive) sub-directories.

Parameters:

Name Type Description Default
path str

Directory path or URI.

required
recursive bool

Descend into sub-directories.

False

Returns:

Type Description
list[str]

List of directory paths.

Raises:

Type Description
PlatformError

If path is not a directory.

move_file abstractmethod

move_file(src: str, dest: str, *, overwrite: bool = False) -> None

Move (rename) a file from src to dest.

Parameters:

Name Type Description Default
src str

Source file path.

required
dest str

Destination file path.

required
overwrite bool

Allow overwriting an existing file at dest.

False

Raises:

Type Description
PlatformError

If source does not exist or dest exists without overwrite.

read_bytes abstractmethod

read_bytes(path: str) -> bytes

Return the entire file contents as bytes.

Implementations MUST NOT use bounded head()-style APIs that truncate large files. Prefer the platform's fastest native unbounded primitive (e.g. s3.get_object for AWS, direct open(path, "rb") for Unity Catalog Volumes, fs.cp to a temp file for other cloud paths).

Suitable for files that fit comfortably in memory.

Raises:

Type Description
PlatformError

If the file does not exist or cannot be read.

read_file abstractmethod

read_file(path: str) -> str

Read the entire content of a text file.

Parameters:

Name Type Description Default
path str

Absolute path or URI to the file.

required

Returns:

Type Description
str

File content as a string.

Raises:

Type Description
PlatformError

If the file does not exist or cannot be read.

upload_file abstractmethod

upload_file(local_path: str, dest: str, *, overwrite: bool = False) -> None

Upload a file from the local filesystem to the platform.

Parameters:

Name Type Description Default
local_path str

Absolute path on the local OS filesystem.

required
dest str

Destination path on this platform.

required
overwrite bool

Allow overwriting an existing file at dest.

False

Raises:

Type Description
PlatformError

If local_path does not exist or dest exists without overwrite.

write_bytes abstractmethod

write_bytes(path: str, data: bytes, *, overwrite: bool = False) -> None

Write raw bytes to path.

Symmetric counterpart of :meth:read_bytes. Prefer the platform's fastest native upload primitive. For very large payloads, write to a temp file and call :meth:upload_file directly.

Parameters:

Name Type Description Default
path str

Destination path or URI.

required
data bytes

Bytes to write.

required
overwrite bool

Allow overwriting an existing file at path.

False

Raises:

Type Description
PlatformError

If the file exists and overwrite is False, or on I/O failure.

write_file abstractmethod

write_file(path: str, content: str, *, overwrite: bool = False) -> None

Write text content to a file.

Parameters:

Name Type Description Default
path str

Absolute path or URI for the target file.

required
content str

Text to write.

required
overwrite bool

If True, overwrite an existing file; otherwise raise when the file already exists.

False

Raises:

Type Description
PlatformError

If the file exists and overwrite is False, or on I/O failure.

FileInfo dataclass

FileInfo(name: str, path: str, modification_time: Optional[datetime], size: int = 0, is_dir: bool = False)

Metadata for a single file-system entry.

Returned by :meth:BasePlatform.list_files and :meth:BasePlatform.get_file_info. frozen=True makes instances immutable and hashable; slots=True reduces per-instance memory.

local_platform

Local filesystem platform implementation.

Uses pathlib, os, and shutil for all operations. Suitable for local development, testing, and single-node environments.

LocalPlatform

LocalPlatform(base_path: str | None = None, cache_ttl: int = 300, **kwargs: Any)

Bases: BasePlatform

Platform backed by the local filesystem.

Also implements :meth:_fetch_secret via os.environ, so it can serve as its own secret provider in local / test environments.

Parameters:

Name Type Description Default
base_path str | None

Optional root directory. When set, all relative paths are resolved against this root.

None
cache_ttl int

Secret cache time-to-live in seconds (default 300). Pass 0 to disable caching.

300

read_bytes

read_bytes(path: str) -> bytes

Zero-overhead native read — no temp file.

write_bytes

write_bytes(path: str, data: bytes, *, overwrite: bool = False) -> None

Zero-overhead native write — no temp file.

aws_platform

AWS S3 platform implementation.

Uses boto3 for all file-system operations. The SDK is lazily imported so the class can be defined without boto3 installed (e.g. for registry registration).

AWSPlatform

AWSPlatform(bucket: str = '', region: str | None = None, profile: str | None = None, endpoint_url: str | None = None, cache_ttl: int = 300, **kwargs: Any)

Bases: BasePlatform

Platform backed by AWS S3 via boto3.

Also implements :meth:_fetch_secret via AWS Secrets Manager, so it can serve as its own secret provider.

Parameters:

Name Type Description Default
bucket str

Default S3 bucket name.

''
region str | None

AWS region for both S3 and Secrets Manager operations (e.g. "us-east-1").

None
profile str | None

Named AWS profile from ~/.aws/credentials.

None
endpoint_url str | None

Custom endpoint (for MinIO, LocalStack, etc.).

None
cache_ttl int

Secret cache time-to-live in seconds (default 300). Pass 0 to disable caching.

300

s3 property

s3: Any

Return a cached boto3 S3 client.

boto3_client

boto3_client(service: str, **kwargs: Any) -> Any

Create a boto3 service client using the platform's credentials.

Useful for accessing any AWS service (SQS, SNS, Glue, etc.) with the same profile/region/endpoint configuration as the platform. The underlying :pyattr:_session is created once and reused.

Parameters:

Name Type Description Default
service str

AWS service name (e.g. "sqs", "glue", "sns").

required
**kwargs Any

Extra keyword arguments forwarded to boto3.client(). Pass endpoint_url here to override the endpoint for any service. For S3, the platform-level endpoint_url is used automatically; for all other services it is ignored so that they always reach the real AWS endpoint.

{}

Returns:

Type Description
Any

A new boto3 service client.

Raises:

Type Description
PlatformError

If boto3 is unavailable.

download_file

download_file(src: str, dest: str) -> None

Download a file from S3 to the local filesystem.

Parameters:

Name Type Description Default
src str

S3 URI (s3://bucket/key) or plain key using the default bucket.

required
dest str

Absolute local filesystem path to write to.

required

Raises:

Type Description
PlatformError

On download failure.

execute_athena_ddl

execute_athena_ddl(sql: str, *, database: str | None = None, output_location: str = '') -> str

Execute a DDL statement via Athena and wait for completion.

Parameters:

Name Type Description Default
sql str

The SQL/DDL statement to execute.

required
database str | None

Optional Athena database context.

None
output_location str

S3 location for query results (e.g. "s3://bucket/athena-results/").

''

Returns:

Type Description
str

The Athena query execution ID.

Raises:

Type Description
PlatformError

If the query fails, is cancelled, or times out.

read_bytes

read_bytes(path: str) -> bytes

In-memory read — single GET for small files, multipart for large.

Avoids the temp-file round-trip of the base implementation.

register_delta_table

register_delta_table(table_name: str, path: str, *, database: str | None = None, output_location: str = '') -> None

Register a native Delta table in the Glue Catalog via Athena.

Creates an external table with TBLPROPERTIES ('table_type'='DELTA') so Athena v3 can query it natively.

Parameters:

Name Type Description Default
table_name str

Table name (without database prefix).

required
path str

S3 path to the Delta table root.

required
database str | None

Athena/Glue database.

None
output_location str

S3 location for Athena query results.

''
register_symlink_table(table_name: str, path: str, *, database: str | None = None, output_location: str = '', schema_ddl: str = '', partition_ddl: str = '') -> None

Register a symlink-based table in the Glue Catalog.

Creates a table using SymlinkTextInputFormat pointing at the _symlink_format_manifest/ directory of a Delta table, for use with Redshift Spectrum.

Parameters:

Name Type Description Default
table_name str

Table name.

required
path str

S3 path to the Delta table root (/_symlink_format_manifest/ is appended automatically).

required
database str | None

Athena/Glue database.

None
output_location str

S3 location for Athena query results.

''
schema_ddl str

Column definitions, e.g. "id INT, name STRING, event_date STRING".

''
partition_ddl str

PARTITIONED BY clause, e.g. "PARTITIONED BY (year STRING, month STRING)".

''

write_bytes

write_bytes(path: str, data: bytes, *, overwrite: bool = False) -> None

Native single-call upload — no temp file, no extra disk write.

Uses put_object for payloads at or below the inline threshold and upload_fileobj (managed multipart) for larger payloads.

fabric_platform

Microsoft Fabric platform implementation.

Uses notebookutils for all operations. The module is lazily imported so the class can be defined outside a Fabric notebook without error (e.g. for registry registration). All notebookutils services are accessible via the :attr:notebookutils property (fs, credentials, mssparkutils, etc.).

FabricPlatform

FabricPlatform(cache_ttl: int = 300, **kwargs: Any)

Bases: BasePlatform

Platform backed by Microsoft Fabric notebookutils.

Also implements :meth:_fetch_secret via notebookutils.credentials, so it can serve as its own secret provider inside Fabric notebooks.

The full notebookutils module is available via the :attr:notebookutils property, giving access to all services: fs, credentials, mssparkutils, and more.

Requires execution in a Fabric notebook environment where notebookutils is pre-installed.

Parameters:

Name Type Description Default
cache_ttl int

Secret cache time-to-live in seconds (default 300). Pass 0 to disable caching.

300

fs property

fs: Any

Return notebookutils.fs, resolving lazily on first use.

notebookutils property

notebookutils: Any

Return the notebookutils module, importing lazily on first use.

Provides access to all notebookutils services: fs, credentials, mssparkutils, and more.

download_file

download_file(src: str, dest: str) -> None

Download a file from Fabric storage to the local filesystem.

Uses notebookutils.fs.cp to copy src to the local path dest.

Parameters:

Name Type Description Default
src str

Source path on Fabric storage (e.g. "abfss://container@account.dfs.core.windows.net/path").

required
dest str

Absolute local filesystem path to write to.

required

Raises:

Type Description
PlatformError

On download failure.

read_bytes

read_bytes(path: str) -> bytes

Return the entire file contents as bytes.

Downloads path to a temp file via notebookutils.fs.cp (download_file) then reads it back — the only reliable unbounded path on Fabric (fs.head truncates large files).

read_file

read_file(path: str) -> str

Read a remote file by downloading it to a temp file and reading locally.

Using :meth:download_file (notebookutils.fs.cp) guarantees the full file content is read regardless of size, unlike fs.head which may truncate large files even with a high maxBytes limit.

write_bytes

write_bytes(path: str, data: bytes, *, overwrite: bool = False) -> None

Write raw bytes to path.

Writes data to a local temp file then pushes it via notebookutils.fs.cp (upload_file).

databricks_platform

Databricks platform implementation.

Primary backend: Unity Catalog Volumes (paths starting with /Volumes/). These are accessed via standard Python I/O (os, open, shutil), which is the Databricks-recommended approach for UC Volume paths — lower latency, standard semantics, no driver-side serialisation overhead.

Fallback backend: dbutils.fs for all other path schemes (dbfs:/, abfss://, wasbs://, s3://, legacy DBFS mounts, etc.).

Secrets always use dbutils.secrets (no alternative exists on Databricks).

DatabricksPlatform

DatabricksPlatform(dbutils: Any | None = None, cache_ttl: int = 300, **kwargs: Any)

Bases: BasePlatform

Platform backed by Databricks.

Routes file operations based on path type:

  • Unity Catalog Volumes (/Volumes/...) — uses os, open, shutil. This is the Databricks-recommended approach for UC Volumes.
  • Other paths (dbfs:/, abfss://, wasbs://, s3://, legacy DBFS mounts) — falls back to dbutils.fs.

Secrets always use dbutils.secrets.

The full dbutils handle is available via the :attr:dbutils property, giving access to all services: secrets, widgets, notebook, fs, and more.

Parameters:

Name Type Description Default
dbutils Any | None

Pre-existing dbutils handle. If None, the platform resolves it lazily from the Spark session or IPython.

None
cache_ttl int

Secret cache time-to-live in seconds (default 300). Pass 0 to disable caching.

300

dbutils property

dbutils: Any

Return the full dbutils handle, resolving lazily on first use.

fs property

fs: Any

Return dbutils.fs, resolving lazily on first use.

download_file

download_file(src: str, dest: str) -> None

Download a file to the local filesystem.

Volume path → shutil.copy2 (direct OS copy). Other paths → dbutils.fs.cp with file: prefix on dest.

Parameters:

Name Type Description Default
src str

Source path (Unity Catalog Volume or DBFS/ADLS URI).

required
dest str

Absolute local OS filesystem path to write to.

required

Raises:

Type Description
PlatformError

On download failure.

read_bytes

read_bytes(path: str) -> bytes

Return the entire file contents as bytes.

Volume path → open(path, "rb").read() (zero-copy, no temp file). Other paths → dbutils.fs.cp to a temp file then read back.

read_file

read_file(path: str) -> str

Read a text file.

Volume path → open() directly. Other paths → dbutils.fs.cp to a temp file, then read locally. Using fs.cp (rather than fs.head) guarantees the full content is returned regardless of file size.

write_bytes

write_bytes(path: str, data: bytes, *, overwrite: bool = False) -> None

Write raw bytes to path.

Volume path → makedirs + open(path, "wb").write(data) (zero-copy). Other paths → write to a temp file then push via dbutils.fs.cp. Using fs.put is deliberately avoided: it only accepts strings and would corrupt arbitrary binary data.