About me
Hello! I am Yujia Xiao, a third-year PhD student in the DSP & Speech Technology Laboratory (DSP-STL) at The Chinese University of Hong Kong (CUHK), under the supervision of Prof. Tan Lee. Prior to this, I worked as an applied scientist at Microsoft from 2018 to 2022. I earned both my M.S. and B.S. degrees from South China University of Technology. My current research focuses on long-form audio and speech generation as well as multimodal agents. If you are interested in my work, feel free to contact me!
News
- π₯ May 16, 2025: PodAgent is accepted by ACL 2025 Findings.
- π₯ Mar 4, 2025: PodAgent is released. Given the topic to be discussed, PodAgent will simulate human behavior to create podcast-like audio presented as a talk show, featuring one host and several guests. The show will include diverse and insightful viewpoints, delivered in appropriate voices, along with structured sound effects and background music to enrich the listening experience.
Experience
- π» 2023.07 - 2024.03: Research Intern at Microsoft (TTS Algorithm Team)
- πΌ 2018.05 - 2022.07: Applied Scientist at Microsoft (TTS Algorithm Team)
- π» 2016.08 - 2018.04: Research Intern at Microsoft Research Asia (Speech Group & IEG)
- π» 2014.07 - 2015.08: Research Intern at Microsoft Research Asia (Speech Group & IEG)
Selected Publications
- π PodAgent: A Comprehensive Framework for Podcast Generation Yujia Xiao, Lei He, Haohan Guo, Fenglong Xie, Tan Lee. ACL 2025 Findings.
- π Contrastive context-speech pretraining for expressive text-to-speech synthesis Yujia Xiao Xi Wang, Xu Tan, Lei He, Xinfa Zhu, Sheng Zhao, Tan Lee. ACM Multimedia, 2024.
- π Contextspeech: Expressive and efficient text-to-speech for paragraph reading Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee. INTERSPEECH 2023.
- π Improving fastspeech tts with efficient self-attention and compact feed-forward network Yujia Xiao, Xi Wang, Lei He, Frank K Soong. ICASSP 2022.
- π Improving prosody with linguistic and bert derived features in multi-speaker based mandarin chinese neural tts Yujia Xiao, Lei He, Huaiping Ming, Frank K. Soong. ICASSP 2020.
- π Paired phone-posteriors approach to ESL pronunciation quality assessment Yujia Xiao, Frank K Soong, Wenping Hu. INTERSPEECH 2018.
- π Proficiency Assessment of ESL Learnerβs Sentence Prosody with TTS Synthesized Voice as Reference Yujia Xiao, Frank K Soong. INTERSPEECH 2017.
- π Zsvc: Zero-shot style voice conversion with disentangled latent diffusion models and adversarial training Xinfa Zhu, Lei He, Yujia Xiao, Xi Wang, Xu Tan, Sheng Zhao, Lei Xie. ICASSP 2025.
- π Audio-FLAN: An Instruction-Following Dataset for Unified Understanding and Generation of Speech, Music, and Sound Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan, Sitong Cheng, Yinghao Ma, Ruibin Yuan, Dongchao Yang, Haohan Guo, Yujia Xiao, Xinsheng Wang, Zixuan Shen, Chuanbo Zhu, ZHANG Xinshen, Tianchi Liu, Zeyue Tian, Ziyang Ma, Haohe Liu, Ge Zhang, Xu Tan, Emmanouil Benetos, Wenhao Huang, Yike Guo, Wei Xue. Submitted to NeurIPS 2025.
- π Unistyle: Unified style modeling for speaking style captioning and stylistic speech synthesis) Xinfa Zhu, Wenjie Tian, Xinsheng Wang, Lei He, Yujia Xiao, Xi Wang, Xu Tan, Sheng Zhao, Lei Xie. ACM Multimedia, 2024.
- π QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning Haohan Guo, Fenglong Xie, Jiawen Kang, Yujia Xiao, Xixin Wu, Helen Meng. IEEE Transactions on Audio, Speech and Language Processing.
Awards
- π 2021.12 [Microsoft Hacathon] Executive Challenge - Hack for Consumer Business Growth - 2nd Place
- π 2020.09 [Microsoft Hacathon] Honorable Mention
- π 2019.09 [Microsoft Hacathon] Hackathon Challenge - Hack for Big Ideas - 2nd Place
- π₯ 2016 National Scholarship for Postgraduates
- π₯ 2013 National Scholarship
- π₯ 2012 National Scholarship
Teaching & Services
- π§βπ«οΈ Teaching Assistant of UGEB1408-ENGG1920 Artificial Intelligence in Action at CUHK
- π Invited Reviewer of ICASSP 2025