ailia_voice  1.1.0.0
About feature

Features of ailia AI Voice

In this page, we present the features that are provided by both the C and the C# APIs.

Text-to-speech conversion

With ailia AI Voice, it is possible to use the Tacotron2 and GPT-SoVITS algorithms for speech synthesis.

Text-to-speech model

To synthesize Japanese speech, it is necessary to convert Japanese text into phonemes, and OpenJtalk is used for the conversion to phonemes. OpenJtalk is integrated into the ailia AI Voice library.

Japanese speech synthesis

To synthesize Japanese speech, it is necessary to convert Japanese text into phonemes, and OpenJtalk is used for the conversion to phonemes. OpenJtalk is incorporated into the ailia AI Voice library.

Voice synthesis in any tone of voice

When using GPT-SoVITS, it is possible to synthesize speech in any voice timbre by providing an audio file of about 10 seconds.

GPU usage

On Windows and Linux, it is possible to perform inference on the GPU with cuDNN. In order to use cuDNN, please install the CUDA Toolkit and cuDNN from the NVIDIA website:

Please install the CUDA Toolkit by following the installer instructions. For cuDNN, after downloading it (and uncompressing it) please adjust the environment variable PATH to reflect its location. You need to register as NVIDIA developper in order to download these libraries.