Tianyi Xu

txu223@wisc.edu

prof_pic.jpg

CDIS, University of Wisconsin–Madison

Madison, WI 53706

I’m Tianyi Xu, graduated from the University of Wisconsin–Madison (B.S. in Computer Science, Data Science, and Mathematics). I am advised by Prof. Junjie Hu at UW–Madison. I work closely with Shaobo Wang and Prof. Linfeng Zhang at Shanghai Jiao Tong University, and Prof. Claudia Solís-Lemus at UW–Madison.

I’m interested in building modern AI systems that are efficient, generalizable, and capable of understanding and acting across multiple modalities, especially under limited supervision and real-world constraints. Concretely, my work focuses on:

  1. Data-centric & label-efficient learning — Designing methods for data selection, mixing, and self/weak supervision so that large-scale models can learn from noisy, heterogeneous data instead of only clean benchmarks.

  2. Foundation models & agents— Building and steering pretrained models (LLMs, VLMs, etc.) for specific tasks, with an emphasis on efficiency, reliability, and agentic behaviors such as planning, reasoning, and safe decision-making.

  3. Multimodal learning — Representation learning and building systems that perceive, act, and reason across different modalities.

  4. AI for science and society — Applying these ideas to real-world problems where data is scarce or noisy: biodiversity monitoring from soundscapes, tone-aware speech modeling for accessibility, and multimodal toolkits/benchmarks for biomedical and clinical AI.

I’m seeking Ph.D. opportunities for Fall 2026.

selected publications

  1. ICML
    OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
    Shaobo Wang*, Xuan Ouyang*, Tianyi Xu*, Yuzheng Hu, Jialin Liu, Guo Chen, Tianyu Zhang, Junhao Zheng, Kexin Yang, Xingzhang Ren, Dayiheng Liu, and Linfeng Zhang
    In ICML, 2026
    Under review
  2. ACL
    SITA: Learning Speaker-Invariant and Tone-Aware Speech Representations for Low-Resource Tonal Languages
    Tianyi Xu*, Xuan Ouyang*, Binwei Yao, Shoua Xiong, Sara Misurelli, Maichou Lor, and Junjie Hu
    In ACL, 2026
    Under review
  3. WACV
    Self-Supervised Sound Detection with AudioMAE for Robust, Label-Efficient Biodiversity Monitoring
    Tianyi Xu, Claudia Solís-Lemus, Daniel Pimentel-Alarcon, and Zuzana Burivalova
    In CV4EO Workshop, WACV, 2026
    Under review
  4. Soil Use Manag.
    network.png
    Combined effects of methyl bromide and soil amendments on soil bacterial and fungal communities in turfgrass
    Tianyi Xu*, Salma Mukhtar*, Evan Gorstein, Claudia Solis-Lemus, Ming Yi Chou, and Paul Koch
    Soil Use and Management, Oct 2025
    Under review
  5. Rhizosphere
    rhizosphere.png
    Stem rot affects the structure of rhizosphere microbiome in berseem clover (Trifolium alexandrinum)
    Salma Mukhtar, Zain Ahmad, Noor Khan, Tianyi Xu, and Dalaq Aiysha
    Rhizosphere, Apr 2025
  6. biomedbank.png
    BiomedBank: A Large-scale, Multimodal Data Ecosystem for Advancing Biomedical AI
    TBD
    Apr 2025
    In preparation