Understanding Output

After running extraction, climatextract saves results to the output/<run-id>/ directory. This guide explains the output files and their contents.

Output Directory Structure

output/
└── abc123-uuid/
    ├── raw_results.csv                       # Page-level details (with duplicates)
    ├── raw_results_temp.csv                  # Intermediate extraction results
    ├── results_long_format.csv               # Main results (long format, with duplicates)
    ├── results_wide_format.csv               # Results pivoted by year
    ├── config.json                           # Parameters, metrics, run info (extract only)
    ├── config_and_metrics.json               # Same plus evaluation metrics (extract_and_evaluate only)
    ├── eval_results_vs_benchmark.csv         # (extract_and_evaluate only)
    └── eval_results_metrics_by_ReportName.csv # (extract_and_evaluate only)

Main Results: `results_long_format.csv`

The primary output file with one row per extracted value. No row included if the LLM did not extract the desired value (=rows are dropped if value_raw is NA) :

Column	Description
`report_id`	Filename of the PDF
`year`	Year of the emissions data
`indicator`	Scope type (`scope 1`, `scope 2lb`, `scope 2mb`, `scope 3`)
`value_std`	Standardized emissions value
`unit_std`	Standardized unit (always `t CO2e` for scope indicators)
`page`	Page counter where value was found (may differ from printed page numbers)
`dupl_flag`	Duplicate flag, indicates presence of duplicates, i.e., multiple, possibly conflicting values for a given report_id-year-indicator combination, see details below
`select_flag`	Selection flag from the duplicate resolution mechanism, indicates the selected row containing the preferred value, see details below
	Further details from extraction process
`value_raw`	Original extracted value from LLM
`value_score`	LLM confidence score, based on logprobs
`unit_raw`	Original extracted unit from LLM
`unit_score`	LLM confidence score, based on logprobs
`unit_cat`	Categorized unit (e.g., `kg CO2e`, `Mt CO2e`, `Other`, `Unknown`). The unit conversion table converts `unit_raw` to `unit_cat`, so that standardized values (in `t CO2e`) can be calculated.

Example:

report_id	year	indicator	value_std	...	unit_std	...	page
company_2023_report.pdf	2015	scope 1	135.0	...	t CO2e	...	34
company_2023_report.pdf	2015	scope 2lb	41962.0	...	t CO2e	...	34
company_2023_report.pdf	2015	scope 2mb	37674.0	...	t CO2e	...	34
company_2023_report.pdf	2015	scope 3	1834.0	...	t CO2e	...	34
company_2023_report.pdf	2016	scope 1	170.0	...	t CO2e	...	34

Wide Format: `results_wide_format.csv`

The same data pivoted for easier comparison across scopes: Wide format, with a single row per report_id and year. Key columns:

Column	Description
`report_id`	Filename of the PDF
`year`	Year of the emissions data
`scope_1_value_std`	Scope 1 standardized emissions value
`scope_2lb_value_std`	Scope 2 (location-based) standardized emissions value
`scope_2mb_value_std`	Scope 2 (market-based) standardized emissions value
`scope_3_value_std`	Scope 3 standardized emissions value

Each scope also has additional detail columns following the pattern scope_{type}_{field}, where type is 1, 2lb, 2mb, or 3, and {field} is one of: value_raw, value_score, unit_std, unit_raw, unit_score, unit_cat, dupl_reason, page.

dupl_reason (e.g., scope_1_dupl_reason) contains the priorization rule used during duplicate resolution. It is related to dupl_flag and select_flag in the long format. The wide format contains only the selected/resolved values from the long format, with dupl_reason explaining how each duplicate was resolved.

Handling Duplicates

For a given indicator, the pipeline may extract multiple identical/conflicting values from multiple pages of a PDF report. These duplicates are resolved using three prioritization rules applied in order:

Identical entries: Drop duplicate rows with the same value and unit on the same page, sets dupl_reason = 1
Preferred unit: Keep entries with the preferred unit (t CO2e) when available, sets dupl_reason = 2
Majority page: Keep entries from the page with the most matches, sets dupl_reason = 3

In cases when only a single value per report_id-year combination got extracted, no duplicate resolution is necessary: dupl_reason = 0

Duplicate Investigation

Use results_wide_format.csv to investigate why duplicates occurred and which pages contained the data.

Query Responses: `raw_results.csv`

Detailed page-level information about the extraction process.

Expect one row for each indicator-year-page combination. For example, if 5 pages from a single document are being analyzed and you want the LLM to extract 4 indicators from the last 10 years, one would expect 5 * 4 * 10 = 200 rows. In many rows the desired value will be Not specified, as mentioned by the LLM. Row numbers can (slightly) deviate from this calculation if the LLM extracts two or more values from a given page or returns no information about the value, not even Not specified.

Key columns (column names are not final):

Column	Same as	Description
`report_name_short`	`report_id`	Filename of the PDF
`page_number_used_by_llm`	`page`	Page number analyzed
`page_retrieval_scores`		Semantic similarity scores for page retrieval
`page_texts_to_llm`		Text from page given to LLM
`raw_llm_response`		Raw LLM answer output
`extracted_scope_from_llm`	\(\hat{=}\) `indicator`	Scope extracted by the LLM
`extracted_year_from_llm`	`year`	Year extracted by the LLM
`extracted_value_from_llm_orig`		Value extracted by the LLM
`extracted_value_from_llm`	`value_raw`	Value extracted by the LLM after minimal processing
`standardized_value`	`value_std`	Value after standardization, measured in `t CO2e`
`extracted_unit_from_llm`	`unit_raw`	Unit extracted by the LLM
`normalized_unit_from_dictionary`	`unit_cat`	Unit after dictionary normalization
`value_probability`	`value_score`	LLM confidence for the extracted value
`unit_probability`	`unit_score`	LLM confidence for the extracted unit
`duplicate_flag`	`dupl_flag`	Whether the row is a duplicate
`select_flag`	`select_flag`	Whether the row was selected after deduplication

Logs

`config.json` or `config_and_metrics.json`

Stores the configuration, incured costs, and evaluation metrics if evaluation is configured. A summary of what happend during this run. When evaluation is enabled, the file is called config_and_metrics.json, otherwise config.json.

If MLflow has been activated, the same information will also be saved in Mlflow for comparison between experiments.

Evaluation Output

When evaluation is enabled, additional files are created directly in the run directory:

`eval_results_vs_benchmark.csv`

Row-by-row comparison with gold standard.

`eval_results_metrics_by_ReportName.csv`

Aggregate evaluation metrics grouped per report.

Next Steps

MLflow Setup – Configure experiment tracking
Sharing Large Files – Share PDFs and embeddings with your team

Understanding Output

Output Directory Structure

Main Results: results_long_format.csv

Wide Format: results_wide_format.csv