Publications
Accepted
- ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization — co-first author (equal contribution; not first-listed). Findings of EACL 2026. Contribution highlights: designed experiments and framework, integrated explainability into iterative algorithms, wrote appendix and contributed to main text.
Under Review
When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification — first author. Under review (ACL). arXiv: https://arxiv.org/abs/2602.11199. We introduce AskBench, an interactive benchmark that converts standard QA pairs into multi-turn interactions with explicit checkpoints and a unified judge loop (grading final answers and simulating user responses only when explicitly asked). AskBench includes AskMind (intent-deficient queries) and AskOverconfidence (queries with injected false premises). We further propose rubric-guided RLVR with verifier-based rewards to improve both answer correctness and targeted clarification.
RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation — second author. Under review (ACL). arXiv: https://arxiv.org/abs/2601.08430. We propose an automated Coarse-to-Fine Rubric Generation framework (principle-guided + response-grounded synthesis, multi-model aggregation, and difficulty evolution) to produce comprehensive, highly discriminative rubrics for open-ended generation. Based on it, we build RubricHub (∼110k, multi-domain) and validate it with a two-stage post-training pipeline: rubric-based rejection sampling fine-tuning (RuFT) and rubric-based reinforcement learning (RuRL).
Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment — third author. ICASSP under review. Released accompanying open dataset ExpressiveSpeech on Hugging Face. Project page: https://freedomintelligence.github.io/ExpressiveSpeech/.
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning — sixth author. ICLR under review.