Platforms¶
base
¶
Abstract base class for platform (file system and secrets) operations.
Every concrete platform — Local, Fabric, Databricks, AWS — inherits from
:class:BasePlatform and implements all abstract methods.
BasePlatform
¶
Bases: BaseSecretProvider
Abstract interface for file/directory operations and secret retrieval.
Platforms encapsulate the storage layer differences (local FS, ADLS,
S3, DBFS) behind a uniform API so that the rest of the framework
never touches os, shutil, or cloud SDKs directly.
Each platform also implements :meth:_fetch_secret so it can serve as
its own :class:~datacoolie.core.secret_provider.BaseSecretProvider,
using its existing SDK handle (env vars / notebookutils / dbutils / boto3).
16 abstract methods across five categories:
- File I/O — read / write / append / delete text content
- Directory Ops — create / delete / list files / list folders
- Existence Checks — file_exists / folder_exists
- File Management — upload / download / copy / move / get_file_info
- Secrets —
_fetch_secret(inherited requirement from BaseSecretProvider)
append_file
abstractmethod
¶
Append text content to an existing file.
If the file does not exist, create it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI for the target file. |
required |
content
|
str
|
Text to append. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On I/O failure. |
copy_file
abstractmethod
¶
Copy a file from src to dest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
Source file path. |
required |
dest
|
str
|
Destination file path. |
required |
overwrite
|
bool
|
Allow overwriting an existing file at dest. |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If source does not exist or dest exists without overwrite. |
create_folder
abstractmethod
¶
Create a directory (including any missing parents).
No-op if the directory already exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On failure. |
delete_file
abstractmethod
¶
Delete a file.
No-op if the file does not exist (idempotent).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI to delete. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On I/O failure. |
delete_folder
abstractmethod
¶
Delete a directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI. |
required |
recursive
|
bool
|
If |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If recursive is |
download_file
abstractmethod
¶
Download a file from this platform to the local filesystem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
Source path on this platform. |
required |
dest
|
str
|
Absolute path on the local OS filesystem to write to. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the source does not exist or the download fails. |
file_exists
abstractmethod
¶
Return True if the path refers to an existing file.
folder_exists
abstractmethod
¶
Return True if the path refers to an existing directory.
get_file_info
abstractmethod
¶
get_file_info(path: str) -> FileInfo
Return metadata about a single file or directory.
Returns:
| Name | Type | Description |
|---|---|---|
A |
FileInfo
|
class: |
FileInfo
|
|
|
FileInfo
|
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the path does not exist. |
list_files
abstractmethod
¶
list_files(path: str, *, recursive: bool = False, extension: str | None = None) -> list[FileInfo]
List files under path.
Each entry is a :class:FileInfo instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Directory path or URI. |
required |
recursive
|
bool
|
Descend into sub-directories. |
False
|
extension
|
str | None
|
Filter by suffix (e.g. |
None
|
Returns:
| Type | Description |
|---|---|
list[FileInfo]
|
List of :class: |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the path does not exist or is not a directory. |
list_folders
abstractmethod
¶
List immediate (or recursive) sub-directories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Directory path or URI. |
required |
recursive
|
bool
|
Descend into sub-directories. |
False
|
Returns:
| Type | Description |
|---|---|
list[str]
|
List of directory paths. |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If path is not a directory. |
move_file
abstractmethod
¶
Move (rename) a file from src to dest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
Source file path. |
required |
dest
|
str
|
Destination file path. |
required |
overwrite
|
bool
|
Allow overwriting an existing file at dest. |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If source does not exist or dest exists without overwrite. |
read_bytes
abstractmethod
¶
Return the entire file contents as bytes.
Implementations MUST NOT use bounded head()-style APIs that
truncate large files. Prefer the platform's fastest native unbounded
primitive (e.g. s3.get_object for AWS, direct open(path, "rb")
for Unity Catalog Volumes, fs.cp to a temp file for other
cloud paths).
Suitable for files that fit comfortably in memory.
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the file does not exist or cannot be read. |
read_file
abstractmethod
¶
Read the entire content of a text file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI to the file. |
required |
Returns:
| Type | Description |
|---|---|
str
|
File content as a string. |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the file does not exist or cannot be read. |
upload_file
abstractmethod
¶
Upload a file from the local filesystem to the platform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_path
|
str
|
Absolute path on the local OS filesystem. |
required |
dest
|
str
|
Destination path on this platform. |
required |
overwrite
|
bool
|
Allow overwriting an existing file at dest. |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If local_path does not exist or dest exists without overwrite. |
write_bytes
abstractmethod
¶
Write raw bytes to path.
Symmetric counterpart of :meth:read_bytes. Prefer the platform's
fastest native upload primitive. For very large payloads, write to a
temp file and call :meth:upload_file directly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Destination path or URI. |
required |
data
|
bytes
|
Bytes to write. |
required |
overwrite
|
bool
|
Allow overwriting an existing file at path. |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the file exists and overwrite is |
write_file
abstractmethod
¶
Write text content to a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Absolute path or URI for the target file. |
required |
content
|
str
|
Text to write. |
required |
overwrite
|
bool
|
If |
False
|
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the file exists and overwrite is |
FileInfo
dataclass
¶
FileInfo(name: str, path: str, modification_time: Optional[datetime], size: int = 0, is_dir: bool = False)
Metadata for a single file-system entry.
Returned by :meth:BasePlatform.list_files and
:meth:BasePlatform.get_file_info. frozen=True makes instances
immutable and hashable; slots=True reduces per-instance memory.
local_platform
¶
Local filesystem platform implementation.
Uses pathlib, os, and shutil for all operations.
Suitable for local development, testing, and single-node environments.
LocalPlatform
¶
Bases: BasePlatform
Platform backed by the local filesystem.
Also implements :meth:_fetch_secret via os.environ, so it can
serve as its own secret provider in local / test environments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_path
|
str | None
|
Optional root directory. When set, all relative paths are resolved against this root. |
None
|
cache_ttl
|
int
|
Secret cache time-to-live in seconds (default 300).
Pass |
300
|
aws_platform
¶
AWS S3 platform implementation.
Uses boto3 for all file-system operations. The SDK is lazily
imported so the class can be defined without boto3 installed
(e.g. for registry registration).
AWSPlatform
¶
AWSPlatform(bucket: str = '', region: str | None = None, profile: str | None = None, endpoint_url: str | None = None, cache_ttl: int = 300, **kwargs: Any)
Bases: BasePlatform
Platform backed by AWS S3 via boto3.
Also implements :meth:_fetch_secret via AWS Secrets Manager, so it
can serve as its own secret provider.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bucket
|
str
|
Default S3 bucket name. |
''
|
region
|
str | None
|
AWS region for both S3 and Secrets Manager operations
(e.g. |
None
|
profile
|
str | None
|
Named AWS profile from |
None
|
endpoint_url
|
str | None
|
Custom endpoint (for MinIO, LocalStack, etc.). |
None
|
cache_ttl
|
int
|
Secret cache time-to-live in seconds (default 300).
Pass |
300
|
boto3_client
¶
Create a boto3 service client using the platform's credentials.
Useful for accessing any AWS service (SQS, SNS, Glue, etc.) with
the same profile/region/endpoint configuration as the platform.
The underlying :pyattr:_session is created once and reused.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
service
|
str
|
AWS service name (e.g. |
required |
**kwargs
|
Any
|
Extra keyword arguments forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
A new boto3 service client. |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If boto3 is unavailable. |
download_file
¶
Download a file from S3 to the local filesystem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
S3 URI ( |
required |
dest
|
str
|
Absolute local filesystem path to write to. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On download failure. |
execute_athena_ddl
¶
Execute a DDL statement via Athena and wait for completion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sql
|
str
|
The SQL/DDL statement to execute. |
required |
database
|
str | None
|
Optional Athena database context. |
None
|
output_location
|
str
|
S3 location for query results
(e.g. |
''
|
Returns:
| Type | Description |
|---|---|
str
|
The Athena query execution ID. |
Raises:
| Type | Description |
|---|---|
PlatformError
|
If the query fails, is cancelled, or times out. |
read_bytes
¶
In-memory read — single GET for small files, multipart for large.
Avoids the temp-file round-trip of the base implementation.
register_delta_table
¶
register_delta_table(table_name: str, path: str, *, database: str | None = None, output_location: str = '') -> None
Register a native Delta table in the Glue Catalog via Athena.
Creates an external table with TBLPROPERTIES ('table_type'='DELTA')
so Athena v3 can query it natively.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
Table name (without database prefix). |
required |
path
|
str
|
S3 path to the Delta table root. |
required |
database
|
str | None
|
Athena/Glue database. |
None
|
output_location
|
str
|
S3 location for Athena query results. |
''
|
register_symlink_table
¶
register_symlink_table(table_name: str, path: str, *, database: str | None = None, output_location: str = '', schema_ddl: str = '', partition_ddl: str = '') -> None
Register a symlink-based table in the Glue Catalog.
Creates a table using SymlinkTextInputFormat pointing at the
_symlink_format_manifest/ directory of a Delta table, for use
with Redshift Spectrum.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
Table name. |
required |
path
|
str
|
S3 path to the Delta table root ( |
required |
database
|
str | None
|
Athena/Glue database. |
None
|
output_location
|
str
|
S3 location for Athena query results. |
''
|
schema_ddl
|
str
|
Column definitions, e.g.
|
''
|
partition_ddl
|
str
|
|
''
|
write_bytes
¶
Native single-call upload — no temp file, no extra disk write.
Uses put_object for payloads at or below the inline threshold and
upload_fileobj (managed multipart) for larger payloads.
fabric_platform
¶
Microsoft Fabric platform implementation.
Uses notebookutils for all operations. The module is lazily imported
so the class can be defined outside a Fabric notebook without error
(e.g. for registry registration). All notebookutils services are
accessible via the :attr:notebookutils property (fs, credentials,
mssparkutils, etc.).
FabricPlatform
¶
Bases: BasePlatform
Platform backed by Microsoft Fabric notebookutils.
Also implements :meth:_fetch_secret via notebookutils.credentials,
so it can serve as its own secret provider inside Fabric notebooks.
The full notebookutils module is available via the
:attr:notebookutils property, giving access to all services:
fs, credentials, mssparkutils, and more.
Requires execution in a Fabric notebook environment where
notebookutils is pre-installed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_ttl
|
int
|
Secret cache time-to-live in seconds (default 300).
Pass |
300
|
notebookutils
property
¶
Return the notebookutils module, importing lazily on first use.
Provides access to all notebookutils services: fs,
credentials, mssparkutils, and more.
download_file
¶
Download a file from Fabric storage to the local filesystem.
Uses notebookutils.fs.cp to copy src to the local path dest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
Source path on Fabric storage
(e.g. |
required |
dest
|
str
|
Absolute local filesystem path to write to. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On download failure. |
read_bytes
¶
Return the entire file contents as bytes.
Downloads path to a temp file via notebookutils.fs.cp
(download_file) then reads it back — the only reliable
unbounded path on Fabric (fs.head truncates large files).
read_file
¶
Read a remote file by downloading it to a temp file and reading locally.
Using :meth:download_file (notebookutils.fs.cp) guarantees the
full file content is read regardless of size, unlike fs.head which
may truncate large files even with a high maxBytes limit.
write_bytes
¶
Write raw bytes to path.
Writes data to a local temp file then pushes it via
notebookutils.fs.cp (upload_file).
databricks_platform
¶
Databricks platform implementation.
Primary backend: Unity Catalog Volumes (paths starting with /Volumes/).
These are accessed via standard Python I/O (os, open, shutil),
which is the Databricks-recommended approach for UC Volume paths — lower
latency, standard semantics, no driver-side serialisation overhead.
Fallback backend: dbutils.fs for all other path schemes (dbfs:/,
abfss://, wasbs://, s3://, legacy DBFS mounts, etc.).
Secrets always use dbutils.secrets (no alternative exists on Databricks).
DatabricksPlatform
¶
Bases: BasePlatform
Platform backed by Databricks.
Routes file operations based on path type:
- Unity Catalog Volumes (
/Volumes/...) — usesos,open,shutil. This is the Databricks-recommended approach for UC Volumes. - Other paths (
dbfs:/,abfss://,wasbs://,s3://, legacy DBFS mounts) — falls back todbutils.fs.
Secrets always use dbutils.secrets.
The full dbutils handle is available via the :attr:dbutils property,
giving access to all services: secrets, widgets, notebook,
fs, and more.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dbutils
|
Any | None
|
Pre-existing |
None
|
cache_ttl
|
int
|
Secret cache time-to-live in seconds (default 300).
Pass |
300
|
download_file
¶
Download a file to the local filesystem.
Volume path → shutil.copy2 (direct OS copy).
Other paths → dbutils.fs.cp with file: prefix on dest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
Source path (Unity Catalog Volume or DBFS/ADLS URI). |
required |
dest
|
str
|
Absolute local OS filesystem path to write to. |
required |
Raises:
| Type | Description |
|---|---|
PlatformError
|
On download failure. |
read_bytes
¶
Return the entire file contents as bytes.
Volume path → open(path, "rb").read() (zero-copy, no temp file).
Other paths → dbutils.fs.cp to a temp file then read back.
read_file
¶
Read a text file.
Volume path → open() directly.
Other paths → dbutils.fs.cp to a temp file, then read locally.
Using fs.cp (rather than fs.head) guarantees the full content
is returned regardless of file size.
write_bytes
¶
Write raw bytes to path.
Volume path → makedirs + open(path, "wb").write(data) (zero-copy).
Other paths → write to a temp file then push via dbutils.fs.cp.
Using fs.put is deliberately avoided: it only accepts strings
and would corrupt arbitrary binary data.