Understanding Output
After running extraction, climatextract saves results to the output/<run-id>/ directory. This guide explains the output files and their contents.
Output Directory Structure
output/
└── abc123-uuid/
├── raw_results.csv # Page-level details (with duplicates)
├── raw_results_temp.csv # Intermediate extraction results
├── results_long_format.csv # Main results (long format, with duplicates)
├── results_wide_format.csv # Results pivoted by year
├── config.json # Parameters, metrics, run info (extract only)
├── config_and_metrics.json # Same plus evaluation metrics (extract_and_evaluate only)
├── eval_results_vs_benchmark.csv # (extract_and_evaluate only)
└── eval_results_metrics_by_ReportName.csv # (extract_and_evaluate only)
Main Results: results_long_format.csv
The primary output file with one row per extracted value. No row included if the LLM did not extract the desired value (=rows are dropped if value_raw is NA) :
| Column | Description |
|---|---|
report_id |
Filename of the PDF |
year |
Year of the emissions data |
indicator |
Scope type (scope 1, scope 2lb, scope 2mb, scope 3) |
value_std |
Standardized emissions value |
unit_std |
Standardized unit (always t CO2e for scope indicators) |
page |
Page counter where value was found (may differ from printed page numbers) |
dupl_flag |
Duplicate flag, indicates presence of duplicates, i.e., multiple, possibly conflicting values for a given report_id-year-indicator combination, see details below |
select_flag |
Selection flag from the duplicate resolution mechanism, indicates the selected row containing the preferred value, see details below |
| Further details from extraction process | |
value_raw |
Original extracted value from LLM |
value_score |
LLM confidence score, based on logprobs |
unit_raw |
Original extracted unit from LLM |
unit_score |
LLM confidence score, based on logprobs |
unit_cat |
Categorized unit (e.g., kg CO2e, Mt CO2e, Other, Unknown). The unit conversion table converts unit_raw to unit_cat, so that standardized values (in t CO2e) can be calculated. |
Example:
| report_id | year | indicator | value_std | ... | unit_std | ... | page |
|---|---|---|---|---|---|---|---|
| company_2023_report.pdf | 2015 | scope 1 | 135.0 | ... | t CO2e | ... | 34 |
| company_2023_report.pdf | 2015 | scope 2lb | 41962.0 | ... | t CO2e | ... | 34 |
| company_2023_report.pdf | 2015 | scope 2mb | 37674.0 | ... | t CO2e | ... | 34 |
| company_2023_report.pdf | 2015 | scope 3 | 1834.0 | ... | t CO2e | ... | 34 |
| company_2023_report.pdf | 2016 | scope 1 | 170.0 | ... | t CO2e | ... | 34 |
Wide Format: results_wide_format.csv
The same data pivoted for easier comparison across scopes: Wide format, with a single row per report_id and year. Key columns:
| Column | Description |
|---|---|
report_id |
Filename of the PDF |
year |
Year of the emissions data |
scope_1_value_std |
Scope 1 standardized emissions value |
scope_2lb_value_std |
Scope 2 (location-based) standardized emissions value |
scope_2mb_value_std |
Scope 2 (market-based) standardized emissions value |
scope_3_value_std |
Scope 3 standardized emissions value |
Each scope also has additional detail columns following the pattern scope_{type}_{field}, where type is 1, 2lb, 2mb, or 3, and {field} is one of: value_raw, value_score, unit_std, unit_raw, unit_score, unit_cat, dupl_reason, page.
dupl_reason (e.g., scope_1_dupl_reason) contains the priorization rule used during duplicate resolution. It is related to dupl_flag and select_flag in the long format. The wide format contains only the selected/resolved values from the long format, with dupl_reason explaining how each duplicate was resolved.
Handling Duplicates
For a given indicator, the pipeline may extract multiple identical/conflicting values from multiple pages of a PDF report. These duplicates are resolved using three prioritization rules applied in order:
- Identical entries: Drop duplicate rows with the same value and unit on the same page, sets
dupl_reason = 1 - Preferred unit: Keep entries with the preferred unit (
t CO2e) when available, setsdupl_reason = 2 - Majority page: Keep entries from the page with the most matches, sets
dupl_reason = 3
In cases when only a single value per report_id-year combination got extracted, no duplicate resolution is necessary: dupl_reason = 0
Duplicate Investigation
Use results_wide_format.csv to investigate why duplicates occurred and which pages contained the data.
Query Responses: raw_results.csv
Detailed page-level information about the extraction process.
Expect one row for each indicator-year-page combination. For example, if 5 pages from a single document are being analyzed and you want the LLM to extract 4 indicators from the last 10 years, one would expect 5 * 4 * 10 = 200 rows. In many rows the desired value will be Not specified, as mentioned by the LLM. Row numbers can (slightly) deviate from this calculation if the LLM extracts two or more values from a given page or returns no information about the value, not even Not specified.
Key columns (column names are not final):
| Column | Same as | Description |
|---|---|---|
report_name_short |
report_id |
Filename of the PDF |
page_number_used_by_llm |
page |
Page number analyzed |
page_retrieval_scores |
Semantic similarity scores for page retrieval | |
page_texts_to_llm |
Text from page given to LLM | |
raw_llm_response |
Raw LLM answer output | |
extracted_scope_from_llm |
\(\hat{=}\) indicator |
Scope extracted by the LLM |
extracted_year_from_llm |
year |
Year extracted by the LLM |
extracted_value_from_llm_orig |
Value extracted by the LLM | |
extracted_value_from_llm |
value_raw |
Value extracted by the LLM after minimal processing |
standardized_value |
value_std |
Value after standardization, measured in t CO2e |
extracted_unit_from_llm |
unit_raw |
Unit extracted by the LLM |
normalized_unit_from_dictionary |
unit_cat |
Unit after dictionary normalization |
value_probability |
value_score |
LLM confidence for the extracted value |
unit_probability |
unit_score |
LLM confidence for the extracted unit |
duplicate_flag |
dupl_flag |
Whether the row is a duplicate |
select_flag |
select_flag |
Whether the row was selected after deduplication |
Logs
config.json or config_and_metrics.json
Stores the configuration, incured costs, and evaluation metrics if evaluation is configured. A summary of what happend during this run.
When evaluation is enabled, the file is called config_and_metrics.json, otherwise config.json.
If MLflow has been activated, the same information will also be saved in Mlflow for comparison between experiments.
Evaluation Output
When evaluation is enabled, additional files are created directly in the run directory:
eval_results_vs_benchmark.csv
Row-by-row comparison with gold standard.
eval_results_metrics_by_ReportName.csv
Aggregate evaluation metrics grouped per report.
Next Steps
- MLflow Setup – Configure experiment tracking
- Sharing Large Files – Share PDFs and embeddings with your team