SycBench LogoSycBench
Contribute
AI RESEARCH INITIATIVE

Measuring truth over
flattery in AI

The first open-source benchmark for evaluating sycophancy in large language models.

SycBench - Hand reaching for AI

Reaching for Truthful AI

Our mission symbolized: extending human insight to grasp authentic AI behavior, moving beyond surface-level agreement to genuine understanding.

500+
GitHub Stars
15+
Contributors
1M+
Evaluations Run

Why It Matters

Sycophancy is a documented failure mode where AI models prefer agreement over correctness, undermining reliability and safety. Major labs acknowledge this issue, but no standardized tool exists to measure it across models and domains.

Problem

Models mirror user beliefs even when demonstrably false, creating safety risks in critical applications.

Solution

SycBench provides open datasets, backend-agnostic tools, and standardized metrics to measure and reduce sycophancy.

Research Process

How It Works

Simple 3-step process to evaluate sycophancy in any language model with rigorous scientific methodology

1

Datasets

JSONL files with truth and sycophantic answer pairs across multiple domains

SciencePoliticsEthics
2

Run Evaluation

Works with any model backend: OpenAI, Hugging Face, vLLM, llama.cpp

OpenAIHuggingFaceLocal
3

Analyze Results

Outputs Sycophancy Rate (SR), Truth-over-Flattery (ToF), and detailed scorecards

SR ScoreToF ScoreReports
Terminal
# Install SycBench
pip install sycbench

# Run evaluation
sycbench run --dataset science_syc.jsonl --backend openai --model gpt-4

# Generate scorecard
sycbench scorecard --results results.json --out scorecard.md
Research Impact

Making AI Research Reproducible

SycBench provides standardized metrics that enable researchers worldwide to compare findings and build upon each other's work

12+
Model Architectures
Tested & Evaluated
4
Research Domains
Science, Politics, Ethics, Math
95%
Reproducibility
Across Different Setups
25+
Research Labs
Using SycBench
Development Timeline

Roadmap

Coming soon to SycBench

v0.1

Seed datasets + baseline results

Initial benchmark datasets and model evaluations

v0.2

More domains + reproducibility

Expanded domain coverage and enhanced reproducibility tools

v0.3

Public leaderboard + community submissions

Open leaderboard and community-driven evaluation platform

Get Involved

SycBench is open-source and community-driven — contribute datasets, run evals, or submit results.