Logging¶
TL;DR Two orthogonal loggers. SystemLogger captures framework Python logs
as plain-text .log files for operators. ETLLogger writes structured execution
logs in two forms: debug JSONL (appended per run) and analyst outputs (JSONL for
job summaries, Parquet for dataflow detail).
SystemLogger¶
- Captures framework Python logs through
LogManager. - Two independent levels:
log_level— controls what is printed to the console (defaultINFO).file_level— controls what is captured to the file (defaultDEBUG, capturing all framework messages regardless of console level).
- Writes plain-text
.logfiles: one line per record, formattimestamp - LEVEL - logger [dataflow_id] - message. - Filename:
system_log_<job_id>.logunder the configuredoutput_path, optionally date-partitioned. - Periodic flush via background timer — appends new records to the remote
file using
platform.append_file()at each interval. Final remaining records are appended onclose(). - Intended for operators reading runtime logs and troubleshooting failures.
ETLLogger¶
- Writes two outputs for the same run session:
- Debug JSONL for full-fidelity troubleshooting.
- Analyst outputs for query-friendly reporting and dashboards.
- Uses
_typevalues"dataflow_run_log"(one per dataflow execution) and"job_run_log"(one per job run summary). - Stores files under
{output_path}/{purpose}/{log_type}/__run_date=yyyy-mm-dd/by default.
Debug JSONL¶
- Single JSONL file per session:
debug_json/job_run_log/__run_date=.../job_<stem>.jsonl - Periodic flush: new bytes appended via
platform.append_file()at each flush interval. Job summary line appended as the final line onclose(). - Per-dataflow entries followed by a final
job_run_logsummary line.
Analyst Outputs¶
| Log type | Format | Path | Flush strategy |
|---|---|---|---|
job_run_log |
JSONL (one line per job run) | analyst/job_run_log/__run_date=.../job_run_log.jsonl |
append_file on close |
dataflow_run_log |
Parquet (one row per dataflow) | analyst/dataflow_run_log/__run_date=.../dataflow_<stem>.parquet |
upload_file on close |
The job_run_log file is a shared daily file — multiple job runs on the
same day append their summary to the same file, making it easy to query
recent job history without listing many small per-run files.
Row shape (dataflow entry):
{
"_type": "dataflow_run_log",
"job_id": "job-1",
"dataflow_id": "orders_bronze_to_silver",
"stage": "bronze2silver",
"processing_mode": "batch",
"operation_type": null,
"status": "succeeded",
"source_rows_read": 12345,
"destination_rows_written": 12345,
"transformers_applied": ["SchemaConverter", "Deduplicator", "SystemColumnAdder", "PartitionHandler"],
"start_time": "2026-04-20T08:00:00+00:00",
"end_time": "2026-04-20T08:00:09+00:00",
"duration_seconds": 9.0,
"destination_load_type": "merge",
"destination_operation_type": null
}
job_run_log summary rows aggregate session totals such as
total_dataflows, total_succeeded, total_failed, and
total_rows_written.
LogPurpose¶
Enum that controls the output folder and intended audience:
| Enum | .value |
Meaning |
|---|---|---|
DEBUG |
debug_json |
JSONL debug output for troubleshooting |
ANALYST |
analyst |
Analyst outputs for dashboards and analysis |
ETLLogger uses DEBUG/job_run_log for the JSONL debug session file,
ANALYST/job_run_log for the appended job summary JSONL, and
ANALYST/dataflow_run_log for the per-run Parquet.
ExecutionType → operation_type¶
operation_type records the runtime operation:
- ETL runs typically leave
operation_typeasetl. - Maintenance runs set
operation_typetomaintenance.
Partitioning¶
LogConfig fields that affect storage:
output_path— root directorylog_level— console stream level (defaultINFO)file_level— capture / file level forSystemLogger(defaultDEBUG)partition_by_date— append a partition folder to output pathspartition_pattern— override the partition folder layout (default:__run_date={year}-{month}-{day})flush_interval_seconds— how often to upload pending buffersstorage_mode—memory/filefor temporary buffering before upload