Hi! I’m Jiale Zhao, a computer science graduate (B.Eng., 2025) from Chongqing University of Posts and Telecommunications. I currently work on rubric‑based RLVR for large language models during my internship at Li Auto.
I plan to begin a PhD in Fall 2026. My interests center on large language models and applied NLP, including:
- Human-centered human–AI interaction (HCI)
- Agent-based LLMs and multi-step reasoning
- Rubric-based RLVR
- Self-evolving systems
- Interpretability and controllability
- End-to-end multimodal interactive systems (e.g., GPT-4o)
Background
- Fall 2026 (planned): PhD studies (applications in progress)
- Sep 2021 – Jun 2025: B.Eng., Computer Science, Chongqing University of Posts and Telecommunications
Publications
- When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification — first author. Under review (ACL). arXiv: https://arxiv.org/abs/2602.11199.
- RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation — second author. Under review (ACL). arXiv: https://arxiv.org/abs/2601.08430.
- ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization — co-first author. Findings of EACL 2026. arXiv: https://arxiv.org/abs/2510.12063.
Ongoing Work
- ProcessRubrics — first-author work at Li Auto on process-level rubric learning for improving structured reasoning quality.
- Learning Persona as Behavior — second-author collaboration with Prof. Lu Cheng (UIC) on behavior-oriented persona learning with stronger cross-domain stability.
Selected Projects
All three were production business deliverables I shipped during my Li Auto internship.
- Data Flywheel for Code LLM — evaluation-centric loop (SFT → evaluate → data build → filtering → back to SFT) to continually raise coding capabilities.
- Evaluation-first: Code-eval today is noisy—difficulty too low, specs ambiguous—so I standardized harnesses and rubrics to push harder tasks and capture real capability.
- Linked loops: Evaluation feedback drives harder data construction; the same tooling filters low-quality samples; filtered data re-enters the generation stack for repair and resurfacing.
- Multi-step Reasoning + Tool Invocation Agent — code-LLM agent that plans, writes code, and executes tool calls for precise answers.
- Multi-step reasoning: Breaks complex or code-debugging tasks into structured plans so context can be stitched into a single executable query.
- Tool grounding: Integrates function calls/code execution for real-time data, external APIs, and environment actions when model priors or knowledge bases fall short.
- MindGPTo (GPT‑4o-style multimodal app) — end-to-end audio + vision application with paralinguistic control, built from scratch with a modular FE/BE split.
- Mode coverage: Ships traditional audio→ASR→LLM→TTS, production audio2text→TTS pipelines, end-to-end audio2audio, and multimodal audio+image+video→text→TTS workflows.
- Paralinguistic SFT: Large-scale audio data pipelines boost colloquial speech and nuanced cues (beyond laughter/pauses) such as age, gender, compound emotions, emotional actions, and ambient sounds.
Manuscripts for Resubmission
- Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment — third author; awaiting resubmission.
- Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning — sixth author; awaiting resubmission.