Zerui Cheng (程泽瑞)

Zerui Cheng (程泽瑞)

Ph.D. Candidate at Princeton Univ., LLM and AI Researcher

Princeton University

Evaluating AI. Generating Data. Scaling Intelligence.

I am Zerui Cheng (程泽瑞), a Ph.D. candidate at Princeton University advised by Prof. Pramod Viswanath. My research focuses on Evaluation of LLMs and Agents and Synthetic Data, two pillars for building self-evolving agents for long-horizon tasks.

I’m a Quant Research Intern at Citadel Securities, and previously a Student Researcher at ByteDance Seed and Tencent Hunyuan, contributing to Seed 2.0 Pro and Hy3 Preview. Before Princeton, I received my B.Eng. in Computer Science from the Yao Class at Tsinghua University, graduating summa cum laude and receiving the Yao Award.

My research has been published in Nature and leading venues including NeurIPS, ICLR, ICML, COLM, AAAI, ACM CCS, EuroSys, and IEEE Transactions on Networking. I am also a core contributor to the technical whitepapers for Sentient, Kite AI, and PolyHedra.

My work has been covered by MIT Technology Review (on AI evaluation crisis) and Sciences et Avenir (on the philosophy of AI evaluation). Beyond research, I am a member of the Competitive Programming Hall of Fame, a contestant on TV Show Super Brain Season 10, and previously served as President of the Yao Class Students' Congress.

Google Scholar profile       Curriculum Vitae

Interests
  • Evaluation of LLMs and Agents
  • Synthetic Data
  • Decentralized AI Systems
  • Blockchain & Cryptography
Education
  • Ph.D. student (2023 - now)

    Electrical and Computer Engineering, Princeton University

  • B.Eng. in Computer Science (2019 - 2023)

    Yao Class, the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University

Recent Highlights

[Jun 2026] (media coverage)

I’m honored to be interviewed by Sciences et Avenir, the leading popular science magazine in France, which later featured PeerBench and my broader views on the philosophy of LLM evaluation in the article Que valent les comparateurs d’IA ? in June 2026.

[May 2026] (paper acceptance)

Two papers (FrontierCS, the Generalization Spectrum) accepted to ICML 2026!

One paper (ValueMine) accepted to the journal IEEE Transactions on Networking!

[Feb 2026] (new papers)

Two first-authored papers done at ByteDance Seed are online now!

[Jan 2026] (paper acceptance)

Three papers are accepted in various venues this month!

  • One paper (HLE) accepted to Nature!

  • One paper (TAO) accepted to EuroSys 2026!

  • One paper (AutoCode) accepted to ICLR 2026!

[Dec 2025] (talk)
  • Dec 4: Gave a talk on “Open-Source AI for Competitive Programming” at the OpenAGI Symposium at NeurIPS! Ticket here
[Dec 2025] (paper acceptance)

CAIA gets accepted and selected for oral presentation (top 10%) to AAAI 2026 AI4Finance!

[Sep 2025] (paper acceptance)

Two papers (LiveCodeBench Pro, PeerBench) accepted to NeurIPS 2025!

[Jun 2025] (media coverage)

LiveCodeBench Pro is covered by MIT Technology Review in their article Can we fix AI’s evaluation crisis?.

Papers

For most recent updates, please refer to my Google Scholar profile. Here are some selected publications.

High-Real-Value Technical Whitepapers for Superstar Startups

  • OML: Open, Monetizable, Loyal AI (2024, NeurIPS 2025 Lock-LLM)

  • zkBridge (ACM CCS 2022)

    • Trustless cross-chain bridges using zero-knowledge proofs
    • Foundation for the blockchain startup Polyhedra Network (valued at $1 billion by the end of 2024)
  • Kite AI Whitepaper

    • Revolutionary infrastructure design for a stablecoin payment network dedicated for AI agents
    • The technical whitepaper of Kite AI, a blockchain payment startup which secured $33M funding led by PayPal Ventures in seed ($15M) and series A ($18M) combined.

(Selected) Research in Industry Grounded in Real Practice

  • VeRA: Verified Reasoning Data Augmentation at Scale

    • Done at ByteDance Seed team, with the research question originated from the real practice of building a frontier large language model (i.e. Seed 2.0 Pro)
    • It demonstrates a new way of generating high-quality reasoning data without bothering human expertise which is usually scarce and expensive.
  • CAIA: Crypto AI Agent Benchmark

    • Take the advisory role for the great Surf AI team, Cybertino Labs, which secured $15M funding in their seed round.
    • The paper builds the first ever benchmark for AI agents dedicated for crypto, and lays the foundation for the entire Surf AI agentic ecosystem.

(Selected) Publications in Academia with High Impact

Other Publications with One-sentence Description

  • LLM and Agent Evaluation

    • FrontierCS (ICML 2026): An evolving benchmark for evolving intelligence on open problems in computer science;
    • FutureX Pro: Done at ByteDance Seed; An agent benchmark for real-life future prediction in various high-value domains;
    • PeerBench (also part of Decentralized AI, NeurIPS 2025): A new paradigm on how we fairly evaluate LLM and agents in a robust and reliable way;
    • SPIN-Bench (COLM 2025): A benchmark on LLM’s long-horizon reasoning and planning abilities.
  • Synthetic Data Generation

    • AutoCode (ICLR 2026): An agentic framework for generating tests on competitive programming problems to scale training and evaluation in coding;
    • TabularMath : Done at ByteDance Seed; A framework for generating high-quality tabular datasets for tabular foundation models.
  • Decentralized AI

    • TAO (EuroSys 2026): Verifiable and reproducible LLM inference results to ensure accountability in MLaaS.
    • Sakshi: A roadmap for ideal decentralized AI platform where every step is transparent and auditable, ensuring AI benefits the humanity at the end of the day.
    • PoCW (IEEE Transactions in Networks): A paradigm for making Proof-of-Work in blockchains useful (e.g. for model training, inference, etc.) to avoid the huge waste in computation power caused by cryptocurrencies.