Platforms¶
base
¶
Abstract base class for platform (file system and secrets) operations.
Every concrete platform — Local, Fabric, Databricks, AWS — inherits from
:class:BasePlatform and implements all abstract methods.
BasePlatform
¶
Bases: BaseSecretProvider
Abstract interface for file/directory operations and secret retrieval.
Platforms encapsulate the storage layer differences (local FS, ADLS,
S3, DBFS) behind a uniform API so that the rest of the framework
never touches os, shutil, or cloud SDKs directly.
Each platform also implements :meth:_fetch_secret so it can serve as
its own :class:~datacoolie.core.secret_provider.BaseSecretProvider,
using its existing SDK handle (env vars / notebookutils / dbutils / boto3).
16 abstract methods across five categories:
- File I/O — read / write / append / delete text content
- Directory Ops — create / delete / list files / list folders
- Existence Checks — file_exists / folder_exists
- File Management — upload / download / copy / move / get_file_info
- Secrets —
_fetch_secret(inherited requirement from BaseSecretProvider)
append_file
abstractmethod
¶
Append text content to an existing file.
If the file does not exist, create it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI for the target file. |
required |
content
|
str
|
Text to append. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On I/O failure. |
copy_file
abstractmethod
¶
Copy a file from src to dest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
Source file path. |
required |
dest
|
str
|
Destination file path. |
required |
overwrite
|
bool
|
Allow overwriting an existing file at dest. |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If source does not exist or dest exists without overwrite. |
create_folder
abstractmethod
¶
Create a directory (including any missing parents).
No-op if the directory already exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On failure. |
delete_file
abstractmethod
¶
Delete a file.
No-op if the file does not exist (idempotent).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI to delete. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On I/O failure. |
delete_folder
abstractmethod
¶
Delete a directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI. |
required |
recursive
|
bool
|
If |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If recursive is |
download_file
abstractmethod
¶
Download a file from this platform to the local filesystem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
Source path on this platform. |
required |
dest
|
str
|
Absolute path on the local OS filesystem to write to. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the source does not exist or the download fails. |
file_exists
abstractmethod
¶
Return True if the path refers to an existing file.
folder_exists
abstractmethod
¶
Return True if the path refers to an existing directory.
get_file_info
abstractmethod
¶
get_file_info(path: str) -> FileInfo
Return metadata about a single file or directory.
Returns:
| Name | Type | Description |
|---|---|---|
A |
FileInfo
|
class: |
FileInfo
|
|
|
FileInfo
|
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the path does not exist. |
list_files
abstractmethod
¶
list_files(path: str, *, recursive: bool = False, extension: str | None = None) -> list[FileInfo]
List files under path.
Each entry is a :class:FileInfo instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Directory path or URI. |
required |
recursive
|
bool
|
Descend into sub-directories. |
False
|
extension
|
str | None
|
Filter by suffix (e.g. |
None
|
Returns:
| Type | Description |
|---|---|
list[FileInfo]
|
List of :class: |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the path does not exist or is not a directory. |
list_folders
abstractmethod
¶
List immediate (or recursive) sub-directories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Directory path or URI. |
required |
recursive
|
bool
|
Descend into sub-directories. |
False
|
Returns:
| Type | Description |
|---|---|
list[str]
|
List of directory paths. |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If path is not a directory. |
move_file
abstractmethod
¶
Move (rename) a file from src to dest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
Source file path. |
required |
dest
|
str
|
Destination file path. |
required |
overwrite
|
bool
|
Allow overwriting an existing file at dest. |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If source does not exist or dest exists without overwrite. |
read_bytes
abstractmethod
¶
Return the entire file contents as bytes.
Implementations MUST NOT use bounded head()-style APIs that
truncate large files. Prefer the platform's fastest native unbounded
primitive (e.g. s3.get_object for AWS, direct open(path, "rb")
for Unity Catalog Volumes, fs.cp to a temp file for other
cloud paths).
Suitable for files that fit comfortably in memory.
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the file does not exist or cannot be read. |
read_file
abstractmethod
¶
Read the entire content of a text file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI to the file. |
required |
Returns:
| Type | Description |
|---|---|
str
|
File content as a string. |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the file does not exist or cannot be read. |
upload_file
abstractmethod
¶
Upload a file from the local filesystem to the platform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_path
|
str
|
Absolute path on the local OS filesystem. |
required |
dest
|
str
|
Destination path on this platform. |
required |
overwrite
|
bool
|
Allow overwriting an existing file at dest. |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If local_path does not exist or dest exists without overwrite. |
write_bytes
abstractmethod
¶
Write raw bytes to path.
Symmetric counterpart of :meth:read_bytes. Prefer the platform's
fastest native upload primitive. For very large payloads, write to a
temp file and call :meth:upload_file directly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Destination path or URI. |
required |
data
|
bytes
|
Bytes to write. |
required |
overwrite
|
bool
|
Allow overwriting an existing file at path. |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the file exists and overwrite is |
write_file
abstractmethod
¶
Write text content to a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI for the target file. |
required |
content
|
str
|
Text to write. |
required |
overwrite
|
bool
|
If |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the file exists and overwrite is |
FileInfo
dataclass
¶
FileInfo(name: str, path: str, modification_time: Optional[datetime], size: int = 0, is_dir: bool = False)
Metadata for a single file-system entry.
Returned by :meth:BasePlatform.list_files and
:meth:BasePlatform.get_file_info. frozen=True makes instances
immutable and hashable; slots=True reduces per-instance memory.
local_platform
¶
Local filesystem platform implementation.
Uses pathlib, os, and shutil for all operations.
Suitable for local development, testing, and single-node environments.
LocalPlatform
¶
Bases: BasePlatform
Platform backed by the local filesystem.
Also implements :meth:_fetch_secret via os.environ, so it can
serve as its own secret provider in local / test environments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_path
|
str | None
|
Optional root directory. When set, all relative paths are resolved against this root. |
None
|
cache_ttl
|
int
|
Secret cache time-to-live in seconds (default 300).
Pass |
300
|
aws_platform
¶
AWS S3 platform implementation.
Uses boto3 for all file-system operations. The SDK is lazily
imported so the class can be defined without boto3 installed
(e.g. for registry registration).
AWSPlatform
¶
AWSPlatform(bucket: str = '', region: str | None = None, profile: str | None = None, endpoint_url: str | None = None, cache_ttl: int = 300, **kwargs: Any)
Bases: BasePlatform
Platform backed by AWS S3 via boto3.
Also implements :meth:_fetch_secret via AWS Secrets Manager, so it
can serve as its own secret provider.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bucket
|
str
|
Default S3 bucket name. |
''
|
region
|
str | None
|
AWS region for both S3 and Secrets Manager operations
(e.g. |
None
|
profile
|
str | None
|
Named AWS profile from |
None
|
endpoint_url
|
str | None
|
Custom endpoint (for MinIO, LocalStack, etc.). |
None
|
cache_ttl
|
int
|
Secret cache time-to-live in seconds (default 300).
Pass |
300
|
boto3_client
¶
Create a boto3 service client using the platform's credentials.
Useful for accessing any AWS service (SQS, SNS, Glue, etc.) with
the same profile/region/endpoint configuration as the platform.
The underlying :pyattr:_session is created once and reused.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
service
|
str
|
AWS service name (e.g. |
required |
**kwargs
|
Any
|
Extra keyword arguments forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
A new boto3 service client. |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If boto3 is unavailable. |
delete_glue_table
¶
Delete a Glue catalog table entry, ignoring if it does not exist.
Athena DDL does not support DROP TABLE for native Delta tables, so the Glue API is used directly for all table types.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
database
|
str
|
Glue database name. |
required |
table_name
|
str
|
Table name to delete. |
required |
download_file
¶
Download a file from S3 to the local filesystem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
S3 URI ( |
required |
dest
|
str
|
Absolute local filesystem path to write to. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On download failure. |
execute_athena_ddl
¶
Execute a DDL statement via Athena and wait for completion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sql
|
str
|
The SQL/DDL statement to execute. |
required |
database
|
str | None
|
Optional Athena database context. |
None
|
output_location
|
str
|
S3 location for query results
(e.g. |
''
|
Returns:
| Type | Description |
|---|---|
str
|
The Athena query execution ID. |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the query fails, is cancelled, or times out. |
glue_table_exists
¶
Return True if a Glue catalog table entry exists.
Uses the Glue get_table API. Never raises; returns False on any
error including a missing database, missing table, or network failure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
database
|
str
|
Glue database name. |
required |
table_name
|
str
|
Table name to check. |
required |
read_bytes
¶
In-memory read — single GET for small files, multipart for large.
Avoids the temp-file round-trip of the base implementation.
register_delta_table
¶
register_delta_table(table_name: str, path: str, *, database: str, output_location: str = '', recreate: bool = False) -> None
Register a native Delta table in the Glue Catalog via Athena.
Creates an external table with TBLPROPERTIES ('table_type'='DELTA')
so Athena v3 can query it natively.
When recreate is True the existing Glue catalog entry is deleted
first (via the Glue API, not Athena DDL which does not support DROP
TABLE for Delta) so that the registration reflects schema evolution.
When recreate is False (the default) the CREATE is issued
directly — idempotent and avoids a redundant delete on every write.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
Table name (without database prefix). |
required |
path
|
str
|
S3 path to the Delta table root. |
required |
database
|
str
|
Glue/Athena database name (required). |
required |
output_location
|
str
|
S3 location for Athena query results. |
''
|
recreate
|
bool
|
When |
False
|
register_symlink_table
¶
register_symlink_table(table_name: str, path: str, *, database: str, output_location: str = '', schema_ddl: str = '', partition_ddl: str = '', recreate: bool = False, run_msck: bool = True) -> None
Register a symlink-based table in the Glue Catalog.
Creates a table using SymlinkTextInputFormat pointing at the
_symlink_format_manifest/ directory of a Delta table, for use
with Redshift Spectrum.
When recreate is True the Glue catalog entry is deleted first so
that schema or partition changes are picked up. When False (the
default) only a CREATE is issued — idempotent on first write.
When run_msck is True (the default) and partition_ddl is
provided, MSCK REPAIR TABLE is run after the CREATE so that newly
written partition paths are visible to Athena and Redshift.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
Table name. |
required |
path
|
str
|
S3 path to the Delta table root ( |
required |
database
|
str
|
Glue/Athena database name (required). |
required |
output_location
|
str
|
S3 location for Athena query results. |
''
|
schema_ddl
|
str
|
Column definitions, e.g.
|
''
|
partition_ddl
|
str
|
|
''
|
recreate
|
bool
|
When |
False
|
run_msck
|
bool
|
When |
True
|
repair_table_partitions
¶
Run MSCK REPAIR TABLE to sync partition metadata in the Glue Catalog.
Should be called whenever new partition paths have been written to S3 so the Glue/Athena catalog discovers them without a full table recreate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
Table name to repair. |
required |
database
|
str
|
Glue/Athena database name. |
required |
output_location
|
str
|
S3 location for Athena query results. |
''
|
write_bytes
¶
Native single-call upload — no temp file, no extra disk write.
Uses put_object for payloads at or below the inline threshold and
upload_fileobj (managed multipart) for larger payloads.
fabric_platform
¶
Microsoft Fabric platform implementation.
Uses notebookutils for all operations. The module is lazily imported
so the class can be defined outside a Fabric notebook without error
(e.g. for registry registration). All notebookutils services are
accessible via the :attr:notebookutils property (fs, credentials,
mssparkutils, etc.).
FabricPlatform
¶
Bases: BasePlatform
Platform backed by Microsoft Fabric notebookutils.
Also implements :meth:_fetch_secret via notebookutils.credentials,
so it can serve as its own secret provider inside Fabric notebooks.
The full notebookutils module is available via the
:attr:notebookutils property, giving access to all services:
fs, credentials, mssparkutils, and more.
Requires execution in a Fabric notebook environment where
notebookutils is pre-installed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_ttl
|
int
|
Secret cache time-to-live in seconds (default 300).
Pass |
300
|
notebookutils
property
¶
Return the notebookutils module, importing lazily on first use.
Provides access to all notebookutils services: fs,
credentials, mssparkutils, and more.
download_file
¶
Download a file from Fabric storage to the local filesystem.
Uses notebookutils.fs.cp to copy src to the local path dest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
Source path on Fabric storage
(e.g. |
required |
dest
|
str
|
Absolute local filesystem path to write to. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On download failure. |
read_bytes
¶
Return the entire file contents as bytes.
Downloads path to a temp file via notebookutils.fs.cp
(download_file) then reads it back — the only reliable
unbounded path on Fabric (fs.head truncates large files).
read_file
¶
Read a remote file by downloading it to a temp file and reading locally.
Using :meth:download_file (notebookutils.fs.cp) guarantees the
full file content is read regardless of size, unlike fs.head which
may truncate large files even with a high maxBytes limit.
write_bytes
¶
Write raw bytes to path.
Writes data to a local temp file then pushes it via
notebookutils.fs.cp (upload_file).
databricks_platform
¶
Databricks platform implementation.
Primary backend: Unity Catalog Volumes (paths starting with /Volumes/).
These are accessed via standard Python I/O (os, open, shutil),
which is the Databricks-recommended approach for UC Volume paths — lower
latency, standard semantics, no driver-side serialisation overhead.
Fallback backend: dbutils.fs for all other path schemes (dbfs:/,
abfss://, wasbs://, s3://, legacy DBFS mounts, etc.).
Secrets always use dbutils.secrets (no alternative exists on Databricks).
DatabricksPlatform
¶
Bases: BasePlatform
Platform backed by Databricks.
Routes file operations based on path type:
- Unity Catalog Volumes (
/Volumes/...) — usesos,open,shutil. This is the Databricks-recommended approach for UC Volumes. - Other paths (
dbfs:/,abfss://,wasbs://,s3://, legacy DBFS mounts) — falls back todbutils.fs.
Secrets always use dbutils.secrets.
The full dbutils handle is available via the :attr:dbutils property,
giving access to all services: secrets, widgets, notebook,
fs, and more.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dbutils
|
Any | None
|
Pre-existing |
None
|
cache_ttl
|
int
|
Secret cache time-to-live in seconds (default 300).
Pass |
300
|
download_file
¶
Download a file to the local filesystem.
Volume path → shutil.copy2 (direct OS copy).
Other paths → dbutils.fs.cp with file: prefix on dest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
Source path (Unity Catalog Volume or DBFS/ADLS URI). |
required |
dest
|
str
|
Absolute local OS filesystem path to write to. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On download failure. |
read_bytes
¶
Return the entire file contents as bytes.
Volume path → open(path, "rb").read() (zero-copy, no temp file).
Other paths → dbutils.fs.cp to a temp file then read back.
read_file
¶
Read a text file.
Volume path → open() directly.
Other paths → dbutils.fs.cp to a temp file, then read locally.
Using fs.cp (rather than fs.head) guarantees the full content
is returned regardless of file size.
write_bytes
¶
Write raw bytes to path.
Volume path → makedirs + open(path, "wb").write(data) (zero-copy).
Other paths → write to a temp file then push via dbutils.fs.cp.
Using fs.put is deliberately avoided: it only accepts strings
and would corrupt arbitrary binary data.