Deploy to Microsoft Fabric¶
Prerequisites · Fabric workspace with an attached OneLake lakehouse · Spark notebook for SparkEngine or a native Python notebook for small PolarsEngine runs · pip install datacoolie in the notebook environment.
End state · DataCoolie pipeline running inside a Fabric notebook with FabricPlatform serving OneLake paths and Key Vault secrets.
1. Notebook bootstrap¶
Install extra packages individually only when your pipeline needs them. For example:
pip install sqlalchemyfor database metadata.pip install httpxfor API metadata.pip install openpyxlfor Excel metadata files.pip install pyicebergfor PyIceberg-based reads or writes.
Fabric already provides the Spark runtime, so you usually do not want a broad bundle that pulls in every Spark and Polars-related dependency.
2. Start with the built-in Spark session¶
from datacoolie.engines.spark_engine import SparkEngine
from datacoolie.platforms.fabric_platform import FabricPlatform
from datacoolie.metadata.file_provider import FileProvider
from datacoolie.orchestration.driver import DataCoolieDriver
platform = FabricPlatform() # uses notebookutils under the hood
engine = SparkEngine(spark, platform=platform) # `spark` is the built-in SparkSession
metadata = FileProvider(config_path="Files/metadata/orders.json", platform=platform)
with DataCoolieDriver(engine=engine, metadata_provider=metadata, base_log_path="Files/logs") as driver:
driver.run(stage="ingest2bronze")
spark is the SparkSession provided by Fabric — do not create your own.
For small file and delta validation runs, you can skip Spark entirely and use a
native Python notebook with PolarsEngine. The checked-in example is
sample_fabric_polars.ipynb under the usecase-sim/platforms/fabric/ assets.
3. Paths¶
- Use ABFSS paths or
Files/...relative to the lakehouse:abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Files/... FabricPlatformnormalises both forms.
4. Secrets¶
FabricPlatform._fetch_secret reads from Key Vault via
notebookutils.credentials.getSecret(vault_url, secret_name). In
secrets_ref, the outer key is the vault URL, and each listed field must
already exist in configure with the Key Vault secret name as its current
value:
{
"configure": {
"host": "db.contoso.net",
"password": "sql-password"
},
"secrets_ref": {
"https://myvault.vault.azure.net/": ["password"]
}
}
5. Spark session tuning¶
For Spark notebooks, keep the first run narrow and reuse the session Fabric
already gives you. If the workload is tiny, prefer the native Python notebook
sample (sample_fabric_polars.ipynb) instead of spinning up a larger Spark
session. When you do want Polars with a smaller pool, put this in the first
cell before native Python execution starts:
Reference assets¶
Use the Fabric platform guide in usecase-sim for the current sample notebooks, metadata file, and setup notes: