Jiale Zhao

jialeuuz@gmail.com jialeuuz OpenReview LinkedIn ORCID CV

Hi! I’m Jiale Zhao, a computer science graduate (B.Eng., 2026) from Chongqing University of Posts and Telecommunications and an LLM algorithm intern at Li Auto. I will join Penn State as a CS Ph.D. student in Fall 2026. I currently work on co‑evolving documentation for coding agents (CoEvoDoc) and behavior‑oriented persona learning.

My interests center on large language models and applied NLP, including:

Human-centered human–AI interaction (HCI)
Agent-based LLMs and multi-step reasoning
Rubric-based RLVR
Self-evolving systems
Interpretability and controllability
End-to-end multimodal interactive systems (e.g., GPT-4o)

Background

Fall 2026: Ph.D. in Computer Science, Penn State University
Sep 2021 – Jun 2026: B.Eng., Computer Science, Chongqing University of Posts and Telecommunications

Selected Publications

For full links and the complete publication list, see the Publications page.

Accepted

When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification

Jiale Zhao, Ke Fang, Lu Cheng

Findings of ACL 2026 First author

RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation

Sunzhu Li, Jiale Zhao, Miteto Wei, Huimin Ren, Yang Zhou, Jingwen Yang, Shunyu Liu, Kaike Zhang, Wei Chen

ACL 2026 Main Second author

ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization

Sunzhu Li, Zhiyu Lin, Jiale Zhao, Shuling Yang, Chen Wei

Findings of EACL 2026 Co-author

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Yang Zhou, Sunzhu Li, Shunyu Liu, Wenkai Fang, Kongcheng Zhang, Jiale Zhao, Jingwen Yang, Yihe Zhou, Jianwei Lv, Tongya Zheng, Hengtong Lu, Chen Wei, Xie Yan, Mingli Song

ICML 2026 Main Sixth author

Under Review

Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment

Zhiyu Lin, Jingwen Yang, Jiale Zhao, Meng Liu, Sunzhu Li, Benyou Wang

Interspeech 2026 Submission Third author

Ongoing Work

CoEvoDoc — first-author collaboration with Prof. Lu Cheng (UIC) on a co‑evolving repository documentation system for coding agents that adapts structure complexity to project change magnitude.
ProcessRubrics — second-author collaboration with Ke Fang and Prof. Lu Cheng (UIC) on process-level rubric learning for improving structured reasoning quality.
Learning Persona as Behavior — first-author collaboration with Prof. Lu Cheng (UIC) on behavior-oriented persona learning with stronger cross-domain stability.

Selected Projects

Production business deliverables I shipped during my Li Auto internship.

Deep Search

Building high-quality search trajectories on a customized Codex framework to boost agent deep search capabilities.

Data Flywheel for Code LLM

Evaluation-centric loop (SFT → evaluate → data build → filtering → back to SFT) to continually raise coding capabilities.

Evaluation-first: Code-eval today is noisy—difficulty too low, specs ambiguous—so I standardized harnesses and rubrics to push harder tasks and capture real capability.
Linked loops: Evaluation feedback drives harder data construction; the same tooling filters low-quality samples; filtered data re-enters the generation stack for repair and resurfacing.

Multi-step Reasoning + Tool Invocation Agent

Code-LLM agent that plans, writes code, and executes tool calls for precise answers.

Multi-step reasoning: Breaks complex or code-debugging tasks into structured plans so context can be stitched into a single executable query.
Tool grounding: Integrates function calls/code execution for real-time data, external APIs, and environment actions when model priors or knowledge bases fall short.

MindGPTo (GPT‑4o-style multimodal app)

End-to-end audio + vision application with paralinguistic control, built from scratch with a modular FE/BE split.

Mode coverage: Ships traditional audio→ASR→LLM→TTS, production audio2text→TTS pipelines, end-to-end audio2audio, and multimodal audio+image+video→text→TTS workflows.
Paralinguistic SFT: Large-scale audio data pipelines boost colloquial speech and nuanced cues (beyond laughter/pauses) such as age, gender, compound emotions, emotional actions, and ambient sounds.