📝 Publications
🎙 Speech Synthesis

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen* , Ji-Hoon Kim*, Jeongsoo Choi, Shukjae Choi, Jinseok Park, Younglo Lee, Joon Son Chung+ Demo page
- Proposes a method to accelerate codec-based speech synthesis by predicting multiple tokens per step and optimizing token selection with a Viterbi-based speculative decoding technique.
- Achieves a 4-5x reduction in synthesis time with minimal or improved speech quality.

FreGrad: Lightweight and fast frequency-aware diffusion vocoder
Tan Dat Nguyen* , Ji-Hoon Kim*, Youngjoon Jang, Jaehun Kim, Joon Son Chung+ Demo page, Official Code ( Oral Presentation )
- We employ discrete wavelet transform that helps FreGrad to operate on a simple and concise feature space.
- We design a frequency-aware dilated convolution and introduce a bag of tricks that boosts the generation quality of the proposed model.

Calib-StyleSpeech: A Zero-shot Approach In Voice Cloning Of High Adaptive Text To Speech System With Imbalanced Dataset ( Oral Presentation )
Nguyen Tan Dat, Lam Quang Tuong, Nguyen Duc Dung
- We propose to use CLUB to minimize the mutual information between content embedding and style embedding.
- Our work well-perform on zero-shot scenerio even when using skew ASR dataset

A Linguistic-based Transfer Learning Approach for Low-resource Bahnar Text-to-Speech ( Oral Presentation )
Tan Dat Nguyen, Quang Tuong Lam, Duc Hao Do, Huu Thuc Cai, Hoang Suong Nguyen, Thanh Hung Vo, Duc Dung Nguyen.
- We apply phonetic-based transfer learning approach to create Bahnar-Kriem (very low resource language) TTS model.