climatextract
Extract CO₂ emissions data from corporate sustainability reports using AI.
climatextract is an information extraction pipeline that surfaces Scope 1, 2, and 3 emissions data from PDF sustainability reports. Built by the LMU SODA Lab in collaboration with the Data Service Centre of Deutsche Bundesbank, it combines research around ESG reporting and Intelligent Document Processing. Employing semantic search and large language models, climatextract automates what was previously a tedious manual annotation process.
Key Features
- 📄 PDF Processing – Automatically extract and embed text from sustainability reports
- 🔍 Semantic Search – Find relevant pages using vector similarity
- 🤖 LLM Extraction – Use GPT models to extract structured emissions data
- 📊 Scope 1-3 Coverage – Extract direct and indirect emissions across all scopes
- ✅ Evaluation – Compare results against gold standard datasets
Quick Example
from climatextract import extract
# Extract emissions from a PDF report
result_path = extract("./data/pdfs/company_2023_report.pdf")
print(f"Results saved to: {result_path}")
The main output is a well structured table, saved in .csv format.
| report_id | year | indicator | value_std | ... | unit_std | ... | page |
|---|---|---|---|---|---|---|---|
| company_2023_report.pdf | 2015 | scope 1 | 135.0 | ... | t CO2e | ... | 34 |
| company_2023_report.pdf | 2015 | scope 2lb | 41962.0 | ... | t CO2e | ... | 34 |
| company_2023_report.pdf | 2015 | scope 2mb | 37674.0 | ... | t CO2e | ... | 34 |
| company_2023_report.pdf | 2015 | scope 3 | 1834.0 | ... | t CO2e | ... | 34 |
| company_2023_report.pdf | 2016 | scope 1 | 170.0 | ... | t CO2e | ... | 34 |
| ... | ... | ... | ... | ... | ... | ... | ... |
Getting Started
New to climatextract? Start here:
- Installation – Set up your environment
- Quickstart – Run your first extraction
Documentation Overview
| Section | Description |
|---|---|
| User Guide | Configuration, running extractions, understanding output |
| Concepts | Architecture, RAG pipeline, prompts, evaluation |
| API Reference | Public API functions |
| Research | Academic background, methodology, citation |