The FinOps Code for AI Cost Control

As organizations increasingly adopt Large Language Models (LLMs), AI agents, and multi-agent systems, managing operational costs becomes significantly more challenging. Traditional FinOps practices were designed around infrastructure resources such as virtual machines, containers, storage, and network traffic. However, AI workloads introduce a new cost driver: tokens. This repository accompanies the presentation "The FinOps Code for AI Cost Control", which explores how organizations can move beyond infrastructure-level cost allocation and gain visibility into AI consumption at the application level using OpenTelemetry. The project demonstrates practical techniques for tracing, monitoring, and allocating AI costs by tracking token usage across applications, services, workflows, and autonomous agents.

From Resource Tagging to Token Tracing:

Shifting from Infrastructure Tagging to Application-Level Telemetry

FinOps Evolution - The 7 Phases

The public cloud has existed for more than two decades, since AWS launched its first services in 2002.

FinOps = the discipline and practice of managing cloud spend

Phase 1: Observational FinOps

The Infancy

Focus: Accessing and collecting cost and usage data.

Goal: Gaining a clear picture of consumption and cost before receiving the cloud bill.

Challenges: Early cloud providers did not expose cost data effectively, and reporting formats varied by vendor.

Improvements: FOCUS (FinOps Open Cost and Usage Specification) provides uniform cost and usage datasets.

Status: Observational FinOps is a necessary, foundational component.

Phase 2: Analytical FinOps

The Childhood

Focus: Analyzing collected data to understand the underlying drivers of cost.

Challenge: It is often hard to understand where the money is, as managed services include costs for compute, network, and storage resources, and idle resources still contribute to cost.

Outcome:

- Extracting meaning from data is vital for actual optimization.

- Leads to identifying potential waste, detecting anomalies, and defining
automated guardrails.

Phase 3: Attributional FinOps

The Adolescence

Focus: Attributing the undifferentiated cost of resources to specific services to manage infrastructure costs.

Process: Starts with foundational practices like resource tagging.

Complexity: Gets complicated with shared resources (load balancers, Kubernetes).

Impact: Closes the FinOps feedback loop by providing financial data back to engineers, allowing them to evaluate how their components impact the overall system cost.

Phase 4: Applied FinOps

The Early Adulthood

Focus: Applying changes based on analysis to achieve financial goals.

Core Practices:

- Smart use of Committed Use Discounts (CUDs)

- Spot/Preemptible instance utilization

- Right-sizing

- Data Tiering

- Waste identification and elimination

- Outsourcing or Insourcing (based on cost-effectiveness)

Challenge: Application is often reactive, done as an afterthought, rather than being integrated into system design.

Phase 5: Architectural FinOps

The Adulthood

Focus: Returning to the design board to build systems with cost, alongside reliability and performance, as a key consideration.

Process: Relies on feedback from all preceding FinOps practices to identify bottlenecks and costly system parts.

Examples:

- Rewriting resource-intensive code in native languages like C or Rust.

- Smart use of queueing and caching.

- Re-evaluating autoscaling strategies.

Note: Autoscaling and microservices can become a source of waste if not correctly designed.

Phase 6: Automated FinOps

The Maturity

The Necessity: The FinOps Feedback Loop is time-consuming and requires unwavering discipline, especially with growing system complexity. The solution is automation.

Definition: Codifying the analysis and application of FinOps knowledge to occur continuously throughout the software delivery lifecycle.

Essence: Continuous evaluation and automated balancing of the conflicting concerns of performance, reliability, and cost.

Conclusion: Automated FinOps is the only way to manage costs in 2026; otherwise, manual calculations lead to burnout or rigid, innovation-hurting guardrails.

The Future: Integrated FinOps

Near-term: Automation will continue to evolve, with AI/ML augmenting existing FinOps observability and analysis capabilities.

Major Shift: Integrating all FinOps practices—from observation to automation—into the platforms used to run software.

Benefit: This integration makes FinOps accessible and proactive, enabling continuous infrastructure optimization aligned with business goals.

Next: Practical Implementation with AI Cost Control.

The New FinOps Problem: Runaway Tokens

Old Cloud FinOps Challenge: The un-tagged resource (cost built up over weeks).

New AI FinOps Challenge: Runaway tokens (budget-busting cost spikes in hours).

Problem: Unoptimized prompts hitting expensive LLMs rapidly causes cost explosions.

Solution Shift: Move from infrastructure tagging to application-level telemetry to track every token in real-time.

Key Focus: Autonomous AI agents doing work for the whole team.

FinOps 1.0 vs. FinOps for AI

Integrated Solution: OpenTelemetry (OTel)

✅ What is OTel?
An open-source framework for collecting observability data (traces, metrics, logs).

✅ How it Helps:
Use tracing capabilities to wrap LLM calls and inject critical FinOps context.

✅ Goal:
Embed FinOps intelligence directly into the application layer to report on every token instantly.

Code Example:
The OpenTelemetry framework is used with Google's Agent Development Kit (ADK) for cost allocation tracking.

How FinOps Tags Get In (Code Breakdown)

Mechanism: Wrap the agent's activity in a custom OpenTelemetry Span that carries budget details.

How FinOps Tags Get In (Code Breakdown)

Mechanism: Wrap the agent's activity in a custom OpenTelemetry Span that carries budget details

1. Starting the FinOps Span:

Declares a parent span. Subsequent instrumented code (like ADK's internal LLM calls) creates child spans.

2. Adding FinOps Metadata (Tags):

Injects cost center details. Child spans (with token counts) are automatically linked for allocation.

The Critical FinOps Metrics

These attributes, found in the nested LLM call span, are essential for a billable cost report:

finops.project_code

The Cost Center for Allocation (e.g., BLOG-FINOPS-001).

llm.usage.input_tokens

Cost Metric 1: Tokens sent to the model (part of the bill).

llm.usage.output_tokens

Cost Metric 2: Tokens received from the model (the other part of the bill).

llm.model_used

The Pricing Tier for calculation (e.g., gemini-2.5-flash-latest).

The Critical FinOps Metrics

These attributes, found in the nested LLM call span, are essential for a billable cost report:

finops.project_code

The Cost Center for Allocation (e.g., BLOG-FINOPS-001).

Purpose: Allocation

llm.usage.input_tokens

Cost Metric 1: Tokens sent to the model (part of the bill).

llm.usage.output_tokens

Cost Metric 2: Tokens received from the model (the other part of the bill)

llm.model_used

The Pricing Tier for calculation (e.g., gemini-2.5-flash-latest).

Purpose: Calculation

Advanced FinOps: Multi-Agent Flows

Scenario:

Complex workflows where one agent delegates work to another.

OTel Power:

The trace context of the parent span is automatically carried down the call chain.

Result:

The entire multi-step agent choreography executes within the initial FinOps Span's context.

Benefit:

One unified, auditable cost report for the whole complex workflow, covered by a single, top-level FinOps tag.

The Tutorial

IMPORTANT: The tutorial uses `ConsoleSpanExporter` (prints to terminal).
DO NOT use in Production.

Production Setup

Replace with a dedicated OTLP Exporter that sends data to a robust backend service:

Google Cloud Trace

Managed Observability Backends (Jaeger, Datadog, New Relic)

Backend Value:

Enables querying and aggregation for FinOps reports.

Resources & Links

ADK Observability

Confirms ADK's native, built-in support for OpenTelemetry instrumentation.

https://google.github.io/adk-docs/observability/

Getting Started with OpenTelemetry

Explains Spans, Context, and Context Propagation.

https://opentelemetry.io/docs/languages/go/getting-started/

ADK Agents

Details agent types that benefit from this tracing.

https://google.github.io/adk-docs/agents/

Runnable Code

Complete implementation examples and tutorials.

https://github.com/antweiss/finops-ai-otel

Watch Project on GitHub

Images for Presentation for varios IT strategy management platphorms.

Back to Data Analysis

↑Back to Top