API Reference

This section provides detailed API documentation for all PersonaGym modules.

Overview

PersonaGym is organized into the following modules:

Module

Description

Pipeline

Main pipeline classes for orchestrating data generation

LLM Client

Multi-provider LLM client interface

Persona

Persona generation and management

Query

Query generation and style transfer

Interaction

Multi-turn dialogue generation

Distractor

Noise injection for robustness

Training Data

Training sample collection and export

Utils

Utility classes (TokenTracker, Logger, etc.)

Quick Module Reference

Core Classes

Class

Module

Description

EnhancedPersonaGenerationPipeline

enhanced_pipeline

Full 6-stage pipeline

PersonaGenerationPipeline

pipeline

Basic persona generation

LLMClient

llm_client

Abstract LLM client

OpenAIClient

llm_client

OpenAI implementation

PersonaBank

persona_bank

Persona dimension manager

PersonaSpec

persona_spec

Persona specification

PersonaSampler

sampling

Persona feature sampler

UserQueryGenerator

query_generator

Query generation

InteractionGenerator

interaction_generator

Dialogue simulation

DistractorModel

distractor

Rule-based noise

SemanticDistractorModel

distractor

Semantic noise

TokenTracker

token_tracker

Token usage tracking

Data Classes

Class

Module

Description

PersonaSpec

persona_spec

Persona specification

Interaction

interaction_generator

Conversation record

Message

interaction_generator

Single message

TrainingSample

training_data

Training data sample

NoiseResult

noise_generator

Noise application result

NoisyVersion

distractor

Legacy noise result

TokenUsage

token_tracker

Token usage record

Factory Functions

Function

Module

Description

create_llm_client()

llm_client

Create LLM client from config

create_distractor_model()

distractor

Create distractor from config

generate_persona_id()

persona_spec

Generate persona ID

get_tracker()

token_tracker

Get singleton TokenTracker

record_tokens()

token_tracker

Record token usage

validate_config()

config_validation

Validate configuration

Module Dependencies

enhanced_pipeline
├── persona_bank
├── sampling
├── persona_spec
├── llm_client
├── query_generator
│   └── query_storage
├── interaction_generator
│   ├── llm_client
│   └── distractor
│       ├── intent_extractor
│       └── noise_generator
├── training_data
└── token_tracker

Type Hints

All modules use Python type hints for better IDE support:

from typing import Dict, List, Optional, Any

def generate_interaction(
    persona_id: str,
    persona_features: Dict[str, str],
    initial_query: str,
    system_prompt: Optional[str] = None
) -> Optional[Interaction]:
    ...

Error Handling

Common exceptions:

Exception

Module

Description

FileNotFoundError

Various

Missing input files

ValueError

Various

Invalid configuration

APIError

llm_client

LLM API errors

RateLimitError

llm_client

API rate limiting