ailia_voice
1.3.0.0
|
In this page, we present the features that are provided by both the C and the C# APIs.
With ailia AI Voice, it is possible to use the Tacotron2 and GPT-SoVITS algorithms for speech synthesis.
To synthesize Japanese speech, it is necessary to convert Japanese text into phonemes, and OpenJtalk is used for the conversion to phonemes. OpenJtalk is integrated into the ailia AI Voice library.
To synthesize Japanese speech, it is necessary to convert Japanese text into phonemes, and OpenJtalk is used for the conversion to phonemes. OpenJtalk is incorporated into the ailia AI Voice library.
When using GPT-SoVITS, it is possible to synthesize speech in any voice timbre by providing an audio file of about 10 seconds.
By defining a user dictionary, it is possible to correct the pronunciation of Japanese.
On Windows and Linux, it is possible to perform inference on the GPU with cuDNN. In order to use cuDNN, please install the CUDA Toolkit and cuDNN from the NVIDIA website:
Please install the CUDA Toolkit by following the installer instructions. For cuDNN, after downloading it (and uncompressing it) please adjust the environment variable PATH to reflect its location. You need to register as NVIDIA developper in order to download these libraries.
To create a user dictionary, prepare a userdic.csv like the one below. The 0/5 at the end indicates that there are 5 morae, and the accent is on the 0th position.
The user dictionary is converted from a CSV file to a dic file using pyopenjtalk.
The converted dic file can be loaded by executing the ailiaVoiceSetUserDictionary API.