MLflow setup
Climatextract supports MLflow for experiment tracking. Basically, it provides a database to store the results from your experiments.
With Azure Databricks, all results can be stored and shared online.
Setup
Add the following to your .env file:
# Tracking URI: "databricks", "./mlruns", "sqlite:///mlflow.db", or server URL
# Use "databricks" if you are using Databricks on Azure.
MLFLOW_TRACKING_URI=sqlite:///mlflow.db
# If using databricks on Azure, add the following:
DATABRICKS_HOST=https://<your-databricks-instance>.azuredatabricks.net/
DATABRICKS_TOKEN=personal-access-token
To set up access to an remote Mlflow Tracking Server on Azure Databricks, you need to create a personal access token. Follow these steps:
- Set
MLFLOW_TRACKING_URI=databricks. - Log into Azure.
- Create a Databricks instance or find an existing instance you want to use.
- Copy the URL which contains azuredatabricks.net and save it in the
.envfile asDATABRICKS_HOSTvariable. - Launch the workspace and click on your initial in the upper right corner.
- Navigate to
Settings > User > Developer > Access tokensand click onManage. Generate a new access token and save it in the.envfile asDATABRICKS_TOKENvariable. Be aware that it takes some time for the token to get activated, so you might get 401 authentication errors in the beginning when running the code. This should be resolved after some time.
In your climatextract.toml configuration file you can specify the experiment name:
Next Steps
- Sharing Large Files – Share PDFs and embeddings with your team
- Architecture – Understand how the pipeline works