# Token Tracking This guide covers the token usage tracking and cost analysis system in PersonaGym. ## Overview PersonaGym automatically tracks all API token usage: - **Per-module tracking**: Persona generation, query generation, interaction, distractor - **Per-model tracking**: Usage by each LLM model - **Cost analysis**: Estimate costs based on token usage - **Export**: JSON format for downstream analysis ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ TokenTracker (Singleton) │ ├─────────────────────────────────────────────────────────────────┤ │ record(module, operation, model, input_tokens, output_tokens) │ │ ↓ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ TokenUsage Records │ │ │ │ - module: str │ │ │ │ - operation: str │ │ │ │ - model: str │ │ │ │ - provider: str │ │ │ │ - input_tokens: int │ │ │ │ - output_tokens: int │ │ │ │ - timestamp: str │ │ │ └──────────────────────────────────────────────────────┘ │ │ ↓ │ │ get_statistics() / export_to_file() │ └─────────────────────────────────────────────────────────────────┘ ``` ## Basic Usage ### Automatic Tracking Token tracking is automatic when using the pipeline: ```python from src.enhanced_pipeline import EnhancedPersonaGenerationPipeline pipeline = EnhancedPersonaGenerationPipeline("config.yaml") result = pipeline.run(num_personas=10) # Token statistics printed automatically at end # Also exported to output/training_data/token_usage_*.json ``` ### Manual Tracking ```python from src.token_tracker import get_tracker, record_tokens # Get singleton tracker tracker = get_tracker() # Record token usage record_tokens( module='custom_module', operation='generate_text', model='gpt-4o-mini', provider='openai', input_tokens=150, output_tokens=75 ) # Get statistics stats = tracker.get_statistics() print(f"Total tokens: {stats['summary']['total_tokens']}") ``` ## TokenUsage Structure ```python @dataclass class TokenUsage: module: str # Module name (persona_formulation, interaction_generation, etc.) operation: str # Operation type (formulate_prompt, assistant_response, etc.) model: str # Model name (gpt-4o-mini, claude-3.5-haiku, etc.) provider: str # Provider (openai, anthropic, openrouter) input_tokens: int # Input/prompt tokens output_tokens: int # Output/completion tokens timestamp: str # ISO timestamp metadata: Dict # Additional info (turn number, etc.) ``` ## Statistics Output ### Summary ```python stats = tracker.get_statistics() print(stats['summary']) # { # 'total_calls': 500, # 'total_input_tokens': 250000, # 'total_output_tokens': 150000, # 'total_tokens': 400000 # } ``` ### By Module ```python print(stats['by_module']) # { # 'persona_formulation': { # 'input_tokens': 50000, # 'output_tokens': 25000, # 'total_tokens': 75000, # 'call_count': 100 # }, # 'query_generation': { # 'input_tokens': 30000, # 'output_tokens': 15000, # 'total_tokens': 45000, # 'call_count': 100 # }, # 'interaction_generation': { # 'input_tokens': 150000, # 'output_tokens': 100000, # 'total_tokens': 250000, # 'call_count': 250 # }, # 'distractor': { # 'input_tokens': 20000, # 'output_tokens': 10000, # 'total_tokens': 30000, # 'call_count': 50 # } # } ``` ### By Model ```python print(stats['by_model']) # { # 'openai/gpt-4o-mini': { # 'input_tokens': 100000, # 'output_tokens': 60000, # 'total_tokens': 160000, # 'call_count': 200 # }, # 'openrouter/anthropic/claude-3.5-haiku': { # 'input_tokens': 80000, # 'output_tokens': 50000, # 'total_tokens': 130000, # 'call_count': 150 # } # } ``` ## Export Format ### JSON Output ```json { "summary": { "total_calls": 500, "total_input_tokens": 250000, "total_output_tokens": 150000, "total_tokens": 400000 }, "by_module": { "persona_formulation": { "input_tokens": 50000, "output_tokens": 25000, "total_tokens": 75000, "call_count": 100 }, ... }, "by_model": { "openai/gpt-4o-mini": {...}, ... }, "records": [ { "module": "persona_formulation", "operation": "formulate_prompt", "model": "gpt-4o-mini", "provider": "openai", "input_tokens": 150, "output_tokens": 75, "total_tokens": 225, "timestamp": "2026-02-06T10:30:00", "metadata": {} }, ... ] } ``` ### Export Methods ```python # Export to file tracker.export_to_file( "output/token_usage.json", include_records=True # Include individual records ) # Get as dictionary data = tracker.to_dict(include_records=True) ``` ## Cost Analysis ### Estimate Costs ```python # Approximate pricing (as of 2026) PRICING = { 'gpt-4o-mini': {'input': 0.00015, 'output': 0.0006}, # per 1K tokens 'gpt-4o': {'input': 0.005, 'output': 0.015}, 'claude-3.5-haiku': {'input': 0.00025, 'output': 0.00125}, } def estimate_cost(stats): total_cost = 0 for model, usage in stats['by_model'].items(): model_name = model.split('/')[-1] if model_name in PRICING: pricing = PRICING[model_name] input_cost = usage['input_tokens'] / 1000 * pricing['input'] output_cost = usage['output_tokens'] / 1000 * pricing['output'] total_cost += input_cost + output_cost return total_cost cost = estimate_cost(tracker.get_statistics()) print(f"Estimated cost: ${cost:.2f}") ``` ### Cost per Sample ```python stats = tracker.get_statistics() num_samples = result['training_data']['total_samples'] tokens_per_sample = stats['summary']['total_tokens'] / num_samples print(f"Average tokens per sample: {tokens_per_sample:.0f}") cost_per_sample = cost / num_samples print(f"Cost per sample: ${cost_per_sample:.4f}") ``` ## Module Breakdown ### Tracked Modules | Module | Operations | Description | |--------|------------|-------------| | `persona_formulation` | `formulate_prompt` | System prompt generation | | `query_generation` | `style_transfer` | Query style adaptation | | `interaction_generation` | `assistant_response`, `user_feedback` | Conversation simulation | | `distractor` | `extract_semantics`, `generate_noise` | Noise injection | ### Cost Distribution Typical distribution (from analysis): | Module | % of Total Tokens | |--------|------------------| | Interaction Generation | ~78% | | Query Generation | ~10% | | Distractor | ~6% | | Persona Formulation | ~5% | ## Analysis Scripts ### analyze_token_usage.py ```bash python analysis/analyze_token_usage.py \ --input output/training_data/token_usage_*.json \ --output analysis_report.md ``` ### analyze_for_paper.py ```bash python analysis/analyze_for_paper.py ``` Output: ``` === Token Usage Analysis for Paper === Total Samples: 500 Total Token Usage: 8,831,400 tokens Average Token Cost Per Sample: 17,662.8 tokens Breakdown by Module: persona_formulation: 5.3% query_generation: 10.3% interaction_generation: 78.6% distractor: 5.8% ``` ## API Reference ### TokenTracker ```python class TokenTracker: """Singleton token usage tracker (thread-safe).""" @classmethod def get_instance(cls) -> 'TokenTracker': """Get singleton instance.""" def record( self, module: str, operation: str, model: str, provider: str, input_tokens: int, output_tokens: int, metadata: Optional[Dict] = None ) -> None: """Record token usage.""" def get_statistics(self) -> Dict[str, Any]: """Get aggregated statistics.""" def export_to_file( self, filepath: str, include_records: bool = True ) -> None: """Export statistics to JSON file.""" def print_summary(self) -> None: """Print formatted summary to console.""" def reset(self) -> None: """Clear all records.""" ``` ### Convenience Functions ```python def get_tracker() -> TokenTracker: """Get singleton TokenTracker instance.""" def record_tokens( module: str, operation: str, model: str, provider: str, input_tokens: int, output_tokens: int, metadata: Optional[Dict] = None ) -> None: """Record token usage (convenience function).""" ``` ## Thread Safety The TokenTracker is thread-safe for concurrent recording: ```python import threading def worker(tracker, module): for i in range(100): tracker.record( module=module, operation='test', model='gpt-4o-mini', provider='openai', input_tokens=100, output_tokens=50 ) # Safe for concurrent use threads = [ threading.Thread(target=worker, args=(tracker, f'module_{i}')) for i in range(4) ] for t in threads: t.start() for t in threads: t.join() ``` ## Best Practices ### 1. Enable Tracking Early ```python # Tracker is automatically initialized from src.token_tracker import get_tracker tracker = get_tracker() ``` ### 2. Export Regularly ```python # Export after each major operation if tracker.get_statistics()['summary']['total_calls'] > 100: tracker.export_to_file(f"token_usage_{timestamp}.json") ``` ### 3. Monitor Cost During Development ```python # Quick cost check stats = tracker.get_statistics() print(f"Tokens so far: {stats['summary']['total_tokens']:,}") ``` ### 4. Analyze Before Scale-Up ```python # Run small test result = pipeline.run(num_personas=5) # Check cost tokens_per_persona = stats['summary']['total_tokens'] / 5 estimated_total = tokens_per_persona * 1000 # For 1000 personas print(f"Estimated for 1000 personas: {estimated_total:,} tokens") ``` ## See Also - [Configuration](configuration.md) - Pipeline configuration - [Training Data](training_data.md) - Output alongside token stats - [Utils API](../api/utils.md) - TokenTracker API details