PersonaGym

Getting Started

  • Installation
    • Requirements
      • Python Dependencies
    • Install from Source
      • Step 1: Clone the Repository
      • Step 2: Create Virtual Environment
      • Step 3: Install Dependencies
    • API Keys Configuration
      • Option 1: Environment Variables (Recommended)
      • Option 2: Configuration File
    • Verify Installation
      • Test Basic Pipeline
    • Troubleshooting
      • Import Errors
      • API Authentication Errors
      • Rate Limiting
      • Encoding Issues (Windows)
    • Development Installation
    • Directory Structure After Installation
    • Next Steps
  • Quick Start
    • 5-Minute Introduction
    • Core Workflows
      • 1. Basic Pipeline (Persona Generation Only)
      • 2. Enhanced Pipeline (Full Data Generation)
      • 3. Stage-by-Stage Execution
      • 4. Skip Specific Features
    • Configuration Overview
    • Understanding Output
      • Output Directory Structure
      • Sample Output Formats
    • Common Patterns
      • Pattern 1: End-to-End Pipeline
      • Pattern 2: Incremental Generation
      • Pattern 3: Custom Persona Features
      • Pattern 4: Multi-Provider Model Pool
    • Next Steps
    • Troubleshooting
      • Empty Responses
      • Slow Generation
      • Memory Issues
  • Tutorials
    • Tutorial Overview
    • Prerequisites
    • What You’ll Learn
    • Example Notebooks
    • Getting Help
    • Let’s Get Started!

User Guide

  • Configuration
    • Configuration File Structure
    • API Configuration
      • Single Provider
      • Environment Variable Reference
      • OpenRouter Configuration
    • Paths Configuration
    • Persona Generation Configuration
      • Sampling Strategies
    • Formulation Configuration
    • Query Generation Configuration
    • Interaction Generation Configuration
    • Distractor Configuration
      • Semantic Distractor Layers
    • Training Data Configuration
    • Experiment Configuration
    • Configuration Validation
      • Validation Checks
    • Environment Variables
    • Complete Example
    • See Also
  • Persona System
    • Overview
    • Persona Dimensions
      • Basic Info (Constraint Dimensions)
      • Communication Style
      • Query Preferences
    • Persona Sampling
      • Configuration
      • Sampling Strategies
      • Diversity Enforcement
    • System Prompt Generation
      • Template
      • LLM Formulator
    • PersonaSpec Data Structure
      • JSON Output Format
    • Persona Storage
      • Saving Personas
      • Incremental Mode
    • Customizing Persona Dimensions
      • Adding New Dimensions
      • Modifying Existing Dimensions
      • Dimension Constraints
    • API Reference
      • PersonaBank
      • PersonaSampler
      • PersonaSpec
    • Best Practices
      • 1. Balance Diversity and Realism
      • 2. Include Essential Dimensions
      • 3. Validate System Prompts
      • 4. Use Incremental Mode for Large Runs
    • See Also
  • Query Generation
    • Overview
    • Query Dataset
      • Input Format
      • Dataset Statistics
    • Query Assignment
      • Configuration
      • Selection Logic
    • Style Transfer
      • Purpose
      • Configuration
      • Template
      • Programmatic Usage
    • Query Storage
      • Persistence
      • Output Format
    • Domain and Scenario Inference
      • Automatic Inference
      • Impact on Persona
    • Batch Generation
      • For Multiple Personas
      • Tracking Used Queries
    • Advanced Usage
      • Custom Query Selection
      • Partial Style Transfer
    • API Reference
      • UserQueryGenerator
      • QueryDataset
      • QueryStorage
    • Best Practices
      • 1. Diverse Query Dataset
      • 2. Balanced Style Transfer
      • 3. Query Deduplication
      • 4. Monitor Query Usage
    • See Also
  • Interaction Generation
    • Overview
    • Architecture
    • Configuration
    • Conversation Flow
      • 1. Initial Query
      • 2. Assistant Response
      • 3. User Feedback
    • Model Pool
      • Weighted Model Selection
      • Model Locking
    • Interaction Data Structure
      • Output Format
    • Programmatic Usage
      • Single Interaction
      • Batch Generation
    • Interaction Storage
      • Incremental Saving
      • Index File
    • Distractor Integration
      • Real-time Noise Application
      • Metadata Tracking
    • Error Handling
      • Retry Logic
      • Supplement Rounds
    • Concurrent Execution
      • Thread Safety
      • Worker Configuration
    • API Reference
      • InteractionGenerator
      • AssistantModel
      • UserFeedbackModel
    • Best Practices
      • 1. Configure Appropriate Turn Limits
      • 2. Use Model Pool for Diversity
      • 3. Enable Incremental Storage
      • 4. Monitor Success Rate
    • See Also
  • Distractor System
    • Overview
    • Architecture
    • Three-Layer Semantic Distractor
      • Layer Overview
      • Layer 1: Surface Noise
      • Layer 2: Incomplete Information
      • Layer 3: Semantic Ambiguity
    • Configuration
      • Enable Semantic Distractor
      • Strategy Configuration
    • Programmatic Usage
      • Create Distractor
      • Apply Noise
      • Batch Processing
    • NoiseResult Data Structure
      • Output Example
    • Intent/Slot Extraction
      • ExtractedSemantics
      • Preservation by Layer
    • Legacy Rule-Based Distractor
      • Configuration
      • Available Strategies
      • Usage
    • Integration with Interactions
      • Real-time Application
      • Metadata Structure
    • API Reference
      • SemanticDistractorModel
      • DistractorModel
      • Factory Function
    • Best Practices
      • 1. Start with Low Activation Probability
      • 2. Balance Layer Weights
      • 3. Use Mandatory Strategies Sparingly
      • 4. Monitor Noise Quality
    • See Also
  • Training Data
    • Overview
    • TrainingSample Structure
      • Output Format
    • Configuration
    • Collection Process
      • From Interactions
      • From Storage
    • Export Process
      • Basic Export
      • With Train/Val/Test Split
      • Export Statistics
    • Output Files
      • Directory Structure
      • Statistics File
    • Programmatic Usage
      • Complete Workflow
      • Custom Transformation
    • Data Validation
      • Sample Validation
      • Batch Validation
    • Format Options
      • Include/Exclude Fields
      • Minimal Format
    • HuggingFace Integration
      • Prepare for Upload
      • Upload to Hub
    • API Reference
      • TrainingSample
      • TrainingDataCollector
      • TrainingDataExporter
    • Best Practices
      • 1. Validate Before Export
      • 2. Use Timestamps for Versioning
      • 3. Monitor Statistics
      • 4. Incremental Export
    • See Also
  • Token Tracking
    • Overview
    • Architecture
    • Basic Usage
      • Automatic Tracking
      • Manual Tracking
    • TokenUsage Structure
    • Statistics Output
      • Summary
      • By Module
      • By Model
    • Export Format
      • JSON Output
      • Export Methods
    • Cost Analysis
      • Estimate Costs
      • Cost per Sample
    • Module Breakdown
      • Tracked Modules
      • Cost Distribution
    • Analysis Scripts
      • analyze_token_usage.py
      • analyze_for_paper.py
    • API Reference
      • TokenTracker
      • Convenience Functions
    • Thread Safety
    • Best Practices
      • 1. Enable Tracking Early
      • 2. Export Regularly
      • 3. Monitor Cost During Development
      • 4. Analyze Before Scale-Up
    • See Also

Examples

  • Basic Pipeline Example
    • Overview
    • Complete Example
    • Expected Output
    • Command Line Equivalent
    • Configuration
    • Key Takeaways
    • Next Steps
  • Enhanced Pipeline Example
    • Overview
    • Complete Example
    • Expected Output
    • Command Line Equivalent
    • Stage Control
    • Key Takeaways
    • Next Steps
  • Custom Persona Example
    • Overview
    • Custom Dimensions
      • Edit persona.yaml
    • Custom Sampling
      • Edit sampling_config.yaml
    • Programmatic Example
    • Key Takeaways
    • Next Steps
  • Multi-Provider LLM Example
    • Overview
    • Configuration
    • Programmatic Usage
    • Model Selection
    • Token Tracking
    • Key Takeaways
    • See Also

API Reference

  • API Reference
    • Overview
    • Quick Links
      • Pipeline
      • LLM Client
      • Persona
      • Query
      • Interaction
      • Distractor
      • Training Data
      • Utils
    • Quick Module Reference
      • Core Classes
      • Data Classes
      • Factory Functions
    • Module Dependencies
    • Type Hints
    • Error Handling
    • Search
  • Pipeline API
    • EnhancedPersonaGenerationPipeline
      • Class Definition
      • Constructor
      • Methods
        • run
      • Attributes
      • Usage Example
    • PersonaGenerationPipeline
      • Class Definition
      • Constructor
      • Methods
        • run
        • reset
      • Usage Example
    • Command Line Interface
      • Usage
      • Options
      • Stages
      • Examples
    • See Also
  • LLM Client API
    • Overview
    • Factory Function
      • create_llm_client
    • LLMClient (Abstract Base)
    • OpenAIClient
      • Constructor
      • Methods
        • generate
        • generate_with_tokens
      • Attributes
    • LLMFormulator
      • Constructor
      • Methods
        • formulate
    • Usage Examples
      • Basic Usage
      • OpenRouter Usage
      • System Prompt Generation
    • Error Handling
    • See Also
  • Persona API
    • PersonaBank
      • Class Definition
      • Constructor
      • Methods
    • PersonaSampler
      • Class Definition
      • Constructor
      • Methods
    • PersonaSpec
      • Class Definition
      • Methods
      • Utility Functions
    • PersonaSpecStorage
      • Class Definition
      • Constructor
      • Methods
    • Usage Examples
      • Complete Workflow
    • See Also
  • Query API
    • UserQueryGenerator
      • Class Definition
      • Constructor
      • Methods
    • QueryDataset
      • Class Definition
      • Constructor
      • Methods
    • QueryStorage
      • Class Definition
      • Constructor
      • Methods
    • Usage Examples
      • Generate Queries
      • Batch Generation
      • Persist Queries
    • See Also
  • Interaction API
    • InteractionGenerator
      • Class Definition
      • Constructor
      • Methods
    • Interaction
      • Class Definition
      • Methods
    • Message
      • Class Definition
    • InteractionStorage
      • Class Definition
      • Constructor
      • Methods
    • AssistantModel
      • Class Definition
      • Methods
    • UserFeedbackModel
      • Class Definition
      • Methods
    • Usage Examples
      • Single Interaction
      • Batch Generation
    • See Also
  • Distractor API
    • Factory Function
    • SemanticDistractorModel
      • Class Definition
      • Constructor
      • Methods
      • Attributes
    • DistractorModel
      • Class Definition
      • Methods
    • NoiseResult
      • Class Definition
      • Methods
    • NoisyVersion
      • Class Definition
    • IntentSlotExtractor
      • Class Definition
      • Methods
    • LLMNoiseGenerator
      • Class Definition
      • Methods
    • Usage Examples
      • Semantic Distractor
      • Legacy Distractor
      • Intent Extraction
    • See Also
  • Training Data API
    • TrainingSample
      • Class Definition
      • Methods
    • TrainingDataCollector
      • Class Definition
      • Constructor
      • Methods
    • TrainingDataExporter
      • Class Definition
      • Constructor
      • Methods
    • Usage Examples
      • Collect from Interactions
      • Export to Files
      • Compute Statistics
      • Complete Pipeline Integration
    • Output Format
      • Sample JSON
      • Statistics JSON
    • See Also
  • Utils API
    • TokenTracker
      • Class Definition
      • Class Methods
      • Instance Methods
      • Convenience Functions
    • TokenUsage
      • Class Definition
    • ColoredLogger
      • Functions
      • Color Types
    • Config Validation
      • Functions
      • ValidationCheck
      • ValidationIssue
    • Usage Examples
      • Token Tracking
      • Colored Output
      • Config Validation
    • Thread Safety
    • See Also

Additional Information

  • License
    • MIT License
    • Third-Party Licenses
    • Commercial Use
    • Attribution
    • Questions
PersonaGym
  • Search


© Copyright 2026, PersonaGym Team.

Built with Sphinx using a theme provided by Read the Docs.