In-Context Learning Boosts Speech Recognition
Human-like adaptation to speakers and language varieties via interleaved text-audio prompts. State-of-the-art ASR.
PaperComputational Neurolinguist and AI Speech Researcher @ Stanford
I'm a Ph.D. student and EDGE fellow at Stanford University researching the intersection of computation, cognition, and speech. My work spans mechanistic interpretability, neural architecture design, and reinforcement learning for speech systems — grounded in how humans process and produce language. I am affiliated with The Department of Linguistics and The Stanford NLP Group, with collaborations across The Wu Tsai Neurosciences Institute, The Department of Computer Science, The Department of Psychology, and The Department of Surgery.
Feel free to reach out if you're interested in collaborating.
Advancing how machines understand spoken language — building robust, fair, and efficient speech recognition systems across languages and accents.
Human-like adaptation to speakers and language varieties via interleaved text-audio prompts. State-of-the-art ASR.
PaperHow automatic speech recognition systems exhibit confirmation bias at scale. SLaTE 2025.
Performance analysis across diverse accents and speaker traits. JASA Express Letters, 94 citations.
Integrated approach to prosodic boundary detection via lexico-syntactic knowledge transfer. CoNLL 2023.
91% of conv kernels converge to temporal Gabor filters. Gabormer achieves better WER while converging 32% faster.
Conformers “Categorize Early” while Transformers “Integrate Late.” Architecture predicts representational profiles (AUC=0.88).
Multilingual Whisper shows strong universality and weak in-group advantage. Emotion encoded across all layers.
Conformers, Transformers, and SSMs carve distinct representational manifolds. Mamba encoders distribute phonemic information uniformly; Conformers concentrate 78% in layers 4–8.
Hybrid SSM-attention encoder initialized from Gabor-parameterized state matrices. Achieves WER parity with Conformer-XXL at 41% fewer FLOPs.
GRPO with perceptual reward shaping reduces hallucinations by 34% and WER by 12.7% relative on accented speech. Outperforms SFT with 5× less labeled data.
Exploring the mechanics of intelligent systems — from language model internals and quantum circuits to novel architectures and training paradigms.
Adapting activation patching to parameterized quantum circuits via mid-circuit tomography. Gate-level causal attribution recovers 94% of task-relevant unitary structure on 8-qubit classifiers.
Quantum feature encodings exhibit representational polysemanticity analogous to classical networks. Sparse quantum autoencoders disentangle 3.2× overcomplete feature bases in 12-qubit latent spaces.
Alignment shifts κ from −0.03 (base) to +0.12 (instruct). Manifold remains locally flat for practical steering.
Transplanting Conformer's early-categorization bias into Transformers via auxiliary probing losses recovers 61% of the WER gap without architectural changes.
Automating knowledge extraction from multilingual language models with dynamic prompt generation.
PaperProgressive fine-tuning for multilingual detection of propaganda techniques. SemEval 2024.
Using symbol-level language models for table representation. NeurIPS TRL Workshop 2024.
Bridging artificial and biological intelligence — modeling language disorders, probing neural representations, and connecting machine learning to cognitive science.
LMs as “animal models” of the human language system. Attention ≈ perception/retrieval, FFN ≈ syntactic production.
A clinically-grounded benchmark for aphasia-like deficits in language models with validated automated evaluation.
PaperMechanistic decomposition of quantum probability models for conjunction fallacy and order effects. Identified 3 circuit motifs that map onto dual-process cognitive architectures.
Embeddings reduce false positives by 86% vs bag-of-words. F1 improves from 0.37 to 0.63.
Modeling documentation materials in an endangered Siberian language. Field Matters Workshop 2023.
Paper