Installation

This guide walks you through installing climatextract.

Prerequisites

Before installing, ensure you have:

Python 3.11+ – Download Python
Azure credentials – Access to Azure AI Foundry, if using the default adapter (see step 3). Not required if you're injecting your own provider handler — see Custom Providers.

Step 1: Install the Package

pip install climatextract

This installs climatextract and all required dependencies.

Step 2: Install System Dependencies

climatextract uses Docling for PDF processing, which requires Poppler:

macOSUbuntu/DebianWindows

brew install poppler

sudo apt-get install poppler-utils

Download from poppler releases and add to PATH.

Step 3: Configure Access to Large Language Models

You will need to set up acess to a Large Language Model and to an embedding model. You can either use our adapters that we have built for Microsoft Azure, or write your own adapter. Since our adapters are based on liteLLM, building your own should be straightforward — see Custom Model Providers.

By default, the package will connect to Microsoft Azure's AI foundry.

Using Azure's AI foundry

We commonly use Azure's AI foundry. Create an .env file in your working directory with the correct endpoint and the respective API key.

AZURE_AI_FOUNDRY_ENDPOINT=https://your-foundry-endpoint.openai.azure.com/
API_KEY=your-api-key # you can also use personalized authentication workflows, see Step 4

In the configuration file you specify which models you want to use, e.g.:

llm_model = "gpt-5-chat"
emb_model = "text-embedding-ada-002"
max_parallel_llm_prompts_running = 20
max_parallel_embedding_calls = 30

Make sure that models with these names have been deployed in your AI foundry instance.

Rate limit errors may occur depending on the quota you have assigned to each deployed model. Use the max_parallel_...-parameters to limit the maximum number of API requests that will be sent in parallel, so that it matches with the model quota you have available.

The adapter for Azure's AI foundry is available at climatextract/adapters/azure_ai_foundry.py, useful for adaptions and debugging. A slightly different adapter / API endpoint may be needed if you wish to deploy models that are not from OpenAI.

Using Azure OpenAI

For older models and legacy deployments, the package also ships an Azure OpenAI Service adapter at climatextract/adapters/azure_openai.py. The .env file when using this service should be as follows:

AZURE_ENDPOINT=https://your-openai-endpoint.openai.azure.com/
API_KEY=your-api-key # you can also use personalized authentication workflows, see Step 4
API_VERSION=2024-12-01-preview

Step 4 (optional): Configure personalized authentication to Azure

Instead of using an API_KEY (problem: different endpoints require different API_KEYs), you can also log in to Azure with your personal account.

Add to your .env file:

AZURE_USERNAME=your-username
AZURE_PASSWORD=your-password

# API_KEY=not-needed-anymore

This functionality is based on the azure_authentication package. Please refer to its documentation for alternative authentication workflows.

Verify Installation

Test that everything is working:

python -c "from climatextract import extract; print('Installation successful!')"

Next Steps

Ready to extract some data? Head to the Quickstart guide.

You may also want to set up experiment tracking with MLflow or start sharing large (PDF) files with team members via Azure Blob Storage.