Why completions swing costs
A verbose assistant doubles spend even when the question was tiny. Setting max completion length, adopting structured outputs, and trimming reasoning traces in production are levers teams combine with routing and caching.
Tool calls and hidden bytes
Function schemas and intermediate tool results usually bill as input on the next turn—or inline if your SDK bundles them. Surface that in customer-facing docs so no one is surprised at month-end.
Hybrid and blended meters
If your product blends premium-path and economy models under one price list, explain which lane receives which traffic—see hybrid tokens and the disclosure pattern.