API Reference
This page documents the public API for climatextract. These are the only functions needed for typical usage.
extract
Extract CO₂ emissions data from PDF reports.
from climatextract import extract
result_path = extract(
pdf_input="./data/pdfs/company_report.pdf",
config_path="climatextract.toml",
enable_mlflow=False
)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
pdf_input |
str \| List[str] \| None |
None |
A directory path (processes all PDFs), a single file path, or a list of file paths. If None, uses filename_list from config. |
config_path |
str |
"climatextract.toml" |
Path to configuration file. |
enable_mlflow |
bool |
False |
Whether to log results to MLflow. If True, uses MLflow settings from config. |
verbose |
bool |
False |
Show detailed per-PDF output. |
llm |
LlmHandler \| None |
None |
Custom LLM handler (keyword-only). If None, uses the default Azure AI Foundry adapter. See Custom Providers. |
embedder |
EmbeddingModelHandler \| None |
None |
Custom embedding handler (keyword-only). If None, uses the default Azure AI Foundry adapter. See Custom Providers. |
Returns
| Type | Description |
|---|---|
str \| None |
Path to the results directory, or None if extraction failed. |
Examples
Single PDF:
Directory of PDFs:
Multiple specific files:
With MLflow tracking:
With a custom provider handler:
from climatextract import extract
from climatextract.adapters.azure_ai_foundry import (
AzureAIFoundryEmbeddingHandler,
AzureAIFoundryLlmHandler,
)
llm = AzureAIFoundryLlmHandler()
embedder = AzureAIFoundryEmbeddingHandler()
result_path = extract("./data/pdfs/", llm=llm, embedder=embedder)
extract_and_evaluate
Extract CO₂ emissions data and evaluate against a gold standard dataset.
from climatextract import extract_and_evaluate
result_path = extract_and_evaluate(
pdf_input="./data/pdfs/sample_reports/",
gold_standard_path="./data/evaluation_dataset/gold_standard.csv",
config_path="climatextract.toml",
enable_mlflow=False
)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
pdf_input |
str \| List[str] \| None |
None |
A directory path, single file path, or list of file paths. If None, uses config. |
gold_standard_path |
str \| None |
None |
Path to gold standard CSV. If None, uses gold_standard from config. |
config_path |
str |
"climatextract.toml" |
Path to configuration file. |
enable_mlflow |
bool |
False |
Whether to log results and metrics to MLflow. |
verbose |
bool |
False |
Show detailed per-PDF output. |
llm |
LlmHandler \| None |
None |
Custom LLM handler (keyword-only). If None, uses the default Azure AI Foundry adapter. See Custom Providers. |
embedder |
EmbeddingModelHandler \| None |
None |
Custom embedding handler (keyword-only). If None, uses the default Azure AI Foundry adapter. See Custom Providers. |
Returns
| Type | Description |
|---|---|
str \| None |
Path to the results directory (includes evaluation files), or None if failed. |
Examples
Basic evaluation:
result = extract_and_evaluate(
pdf_input="./data/pdfs/test_set/",
gold_standard_path="./data/evaluation_dataset/gold_standard.csv"
)
Using config defaults:
With MLflow tracking:
result = extract_and_evaluate(
pdf_input="./data/pdfs/",
gold_standard_path="./data/evaluation/gold.csv",
enable_mlflow=True
)
Import
Both functions are available directly from the climatextract package:
Next Steps
- Configuration – Customize extraction behavior
- Background – Academic context and motivation