Basic Usage

Here is an example of speech synthesis using C#. You can obtain an AudioClip by creating an AiliaVoiceModel, loading the AI model with OpenModel, converting to phonemes with G2P, and calling Inference. In the case of GPT-SoVITS, you provide a reference AudioClip with SetReference before calling Inference.

void Initialize(){
    bool status = voice.Create(Ailia.AILIA_ENVIRONMENT_ID_AUTO, AiliaVoice.AILIA_VOICE_FLAG_NONE);
 
    string asset_path=Application.streamingAssetsPath;
 
    string path = asset_path+"/AiliaVoice/";
    status = voice.OpenDictionary(path+"open_jtalk_dic_utf_8-1.11", AiliaVoice.AILIA_VOICE_DICTIONARY_TYPE_OPEN_JTALK);
 
    switch(model){
    case MODEL_TACOTRON2_ENGLISH:
        status = voice.OpenModel(path+"onnx/nvidia/encoder.onnx", path+"onnx/nvidia/decoder_iter.onnx", path+"onnx/nvidia/postnet.onnx", path+"onnx/nvidia/waveglow.onnx", null, AiliaVoice.AILIA_VOICE_MODEL_TYPE_TACOTRON2, AiliaVoice.AILIA_VOICE_CLEANER_TYPE_BASIC);
        break;
    case MODEL_GPT_SOVITS_JAPANESE:
        status = voice.OpenModel(path+"onnx/gpt-sovits/t2s_encoder.onnx", path+"onnx/gpt-sovits/t2s_fsdec.onnx", path+"onnx/gpt-sovits/t2s_sdec.opt3.onnx", path+"onnx/gpt-sovits/vits.onnx", path+"onnx/gpt-sovits/cnhubert.onnx", AiliaVoice.AILIA_VOICE_MODEL_TYPE_GPT_SOVITS, AiliaVoice.AILIA_VOICE_CLEANER_TYPE_BASIC);
        break;
    }
 }
 
 void Infer(string text){
    if (model == MODEL_GPT_SOVITS_JAPANESE){
        text = voice.G2P(text, AiliaVoice.AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA);
        string ref_text = voice.G2P("水をマレーシアから買わなくてはならない。", AiliaVoice.AILIA_VOICE_G2P_TYPE_GPT_SOVITS_JA);
        voice.SetReference(ref_clip, ref_text);
    }
 
    bool status = voice.Inference(text);
 
    audioSource.clip = voice.GetAudioClip();
    audioSource.Play();
}
 
void Uninitialize(){
    voice.Close();
}

Using the User Dictionary

The userdic.dic created with pyopenjtalk can be loaded by executing the SetUserDictionary API before the OpenDictionary API.

voice.SetUserDictionary(path + "/userdic.dic", AiliaVoice.AILIA_VOICE_DICTIONARY_TYPE_OPEN_JTALK);

voice.OpenDictionary(path, AiliaVoice.AILIA_VOICE_DICTIONARY_TYPE_OPEN_JTALK);

Using a GPU

To use a GPU, specify the GPU's env_id in the env_id argument of AiliaVoice.Open. If you specify AILIA_ENVIRONMENT_ID_AUTO, the inference will be performed on the CPU. For how to obtain the GPU's env_id, please refer to GetEnvId() in AiliaSpeechSample.cs. In the example below, the environment is enumerated using the ailia API, and the GPU's env_id is obtained when env_type is 1.

private int GetEnvId(int env_type){
    int env_id = Ailia.AILIA_ENVIRONMENT_ID_AUTO;
    if (env_type == 1) { // GPU
        int count = 0;
        Ailia.ailiaGetEnvironmentCount(ref count);
        for (int i = 0; i < count; i++){
            IntPtr env_ptr = IntPtr.Zero;
            Ailia.ailiaGetEnvironment(ref env_ptr, (uint)i, Ailia.AILIA_ENVIRONMENT_VERSION);
            Ailia.AILIAEnvironment env = (Ailia.AILIAEnvironment)Marshal.PtrToStructure(env_ptr, typeof(Ailia.AILIAEnvironment));
 
            if (env.backend == Ailia.AILIA_ENVIRONMENT_BACKEND_MPS || env.backend == Ailia.AILIA_ENVIRONMENT_BACKEND_CUDA || env.backend == Ailia.AILIA_ENVIRONMENT_BACKEND_VULKAN){
                env_id = env.id;
                env_name = Marshal.PtrToStringAnsi(env.name);
            }
        }
    } else {
        env_name = "cpu";
    }
    return env_id;
}

Notes by Platform

iOS

When operating on iOS, please specify Increased Memory Limit in Capability.

Android

Since direct access to StreamingAssets files is not possible on Android, the model file is transferred to TemporaryCachePath at startup.