Custom Model Providers
climatextract ships with reference Azure AI Foundry and Azure OpenAI Service adapters. To use a different LLM or embedding provider — OpenAI direct, Anthropic, a local model, anything else — implement a handler and pass it to extract().
We purposefully built our software using liteLLM's Python SDK to make switching between different model providers as easy as possible.
When you need this
Reach for a custom handler when:
- You want to use a provider the bundled Azure adapters don't cover.
- You want to use Azure but need behavior the bundled adapters don't expose.
- You're running models locally and don't want to go through a cloud provider at all.
If you just want to tweak parameters on an Azure model (temperature, reasoning effort, concurrency), you can instantiate AzureAIFoundryLlmHandler or AzureOpenAILlmHandler directly without subclassing — see Running Extraction.
The interfaces
If you need access to models outside Azure, you will need to implement two subclasses. The abstract base classes in climatextract.llm_embedding_api_bridge define the contract.
LlmHandler(ABC)
Subclasses must implement:
| Method | Purpose |
|---|---|
get_completion_and_cost(messages) |
Sync call. Takes OpenAI-style messages list, returns (response, cost_in_usd). |
aget_completion_and_cost(messages) |
Async version of the above. |
get_model_dict() |
Returns a dict describing the model (must include a "model" key — used for logging and repr). |
get_max_concurrent_calls() |
Returns the max number of parallel calls the wrapper's semaphore should allow (used with the async method mentioned before). |
The returned response should be shaped like an OpenAI chat completion — specifically, response.choices[0].message.content (for the output text), response.choices[0].logprobs (optional), and response.usage.prompt_tokens / response.usage.completion_tokens. LiteLLM's ModelResponse already matches this shape, as do OpenAI SDK responses.
EmbeddingModelHandler(ABC)
Subclasses must implement:
| Method | Purpose |
|---|---|
get_embedding_and_cost(texts) |
Sync call. Takes list[str], returns (response, cost_in_usd). |
aget_embedding_and_cost(texts) |
Async version. |
get_model_dict() |
Dict with a "model" key. |
get_max_concurrent_calls() |
Max parallel embedding calls (used with the async method mentioned before). |
The returned response should expose response.data as a sequence of items with an embedding field (either attribute- or dict-style — both are handled), and response.usage.prompt_tokens.
Minimal example: OpenAI direct
An LLM handler that uses the OpenAI SDK directly, no LiteLLM:
import os
from typing import Tuple
from openai import AsyncOpenAI, OpenAI
from climatextract.llm_embedding_api_bridge import LlmHandler
class OpenAILlmHandler(LlmHandler):
def __init__(self, model: str = "gpt-4o-mini", max_concurrent_calls: int = 8):
self.model = model
self._max = max_concurrent_calls
self._client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
self._aclient = AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])
def get_model_dict(self) -> dict:
return {"model": self.model}
def get_max_concurrent_calls(self) -> int:
return self._max
def get_completion_and_cost(self, messages: list[dict]) -> Tuple[object, float]:
response = self._client.chat.completions.create(model=self.model, messages=messages)
return response, 0.0 # compute your own cost if you need it
async def aget_completion_and_cost(self, messages: list[dict]) -> Tuple[object, float]:
response = await self._aclient.chat.completions.create(model=self.model, messages=messages)
return response, 0.0
Pass it to extract():
from climatextract import extract
result_path = extract("./data/pdfs/", llm=OpenAILlmHandler(model="gpt-4o-mini"))
Our implementation: Using liteLLM with Azure AI Foundry
As another example, you should explore our adapters for the Azure AI Foundry, climatextract/adapters/azure_ai_foundry.py. class AzureAIFoundryLlmHandler implements the LlmHandler, and AzureAIFoundryEmbeddingHandler implements the EmbeddingModelHandler.
Note that we load model configuration parameters from the configuration file climatextract.toml
Notes
- Don't manage concurrency inside your handler. The
Llm/EmbeddingModelwrapper already applies a semaphore sized byget_max_concurrent_calls(). - Don't count tokens or cost inside your handler beyond returning them. The wrapper records them via
UsageCounterand surfaces them inlogs.jsonand MLflow. - Cost is optional. Return
0.0if your provider doesn't give you a price back. The pipeline will still run. - Reference implementations. For more complete examples — routing, Azure AD token handling, per-model parameter quirks — see
climatextract/adapters/azure_ai_foundry.pyandclimatextract/adapters/azure_openai.py. - Subclass instead of write from scratch. If you only need to tweak one or two things (e.g., an extra request param), subclassing
AzureAIFoundryLlmHandlerorAzureOpenAILlmHandlerand overriding__init__is usually enough.