📝 Publications

🎙 Speech Synthesis

ICASSP 2025
sym

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding

Tan Dat Nguyen* , Ji-Hoon Kim*, Jeongsoo Choi, Shukjae Choi, Jinseok Park, Younglo Lee, Joon Son Chung+ Demo page

  • Proposes a method to accelerate codec-based speech synthesis by predicting multiple tokens per step and optimizing token selection with a Viterbi-based speculative decoding technique.
  • Achieves a 4-5x reduction in synthesis time with minimal or improved speech quality.
ICASSP 2024
sym

FreGrad: Lightweight and fast frequency-aware diffusion vocoder

Tan Dat Nguyen* , Ji-Hoon Kim*, Youngjoon Jang, Jaehun Kim, Joon Son Chung+ Demo page, Official Code ( Oral Presentation )

  • We employ discrete wavelet transform that helps FreGrad to operate on a simple and concise feature space.
  • We design a frequency-aware dilated convolution and introduce a bag of tricks that boosts the generation quality of the proposed model.
ICABDE 2021
sym

Calib-StyleSpeech: A Zero-shot Approach In Voice Cloning Of High Adaptive Text To Speech System With Imbalanced Dataset ( Oral Presentation )

Nguyen Tan Dat, Lam Quang Tuong, Nguyen Duc Dung

Demo page.

  • We propose to use CLUB to minimize the mutual information between content embedding and style embedding.
  • Our work well-perform on zero-shot scenerio even when using skew ASR dataset
NAFOSTED 2021
sym

A Linguistic-based Transfer Learning Approach for Low-resource Bahnar Text-to-Speech ( Oral Presentation )

Tan Dat Nguyen, Quang Tuong Lam, Duc Hao Do, Huu Thuc Cai, Hoang Suong Nguyen, Thanh Hung Vo, Duc Dung Nguyen.

Demo page.

  • We apply phonetic-based transfer learning approach to create Bahnar-Kriem (very low resource language) TTS model.