ailia_speech
1.3.2.0
|
With ailia Speech, you can create an instance with ailiaSpeechCreate
, then open a model with ailiaSpeechOpenModelFile
, input a PCM with ailiaSpeechPushInputData
, check that enough PCM has been fed with ailiaSpeechBuffered
, transcribe into text with ailiaSpeechTranscribe
, and then get the resulting text with ailiaSpeechGetText
.
With ailiaSpeechPushInputData
it is not necessary to input the whole audio data at once, it is possible to feed it little by little, so that it can be used in real-time with the input from a microphone.
To enable live transcription, pass the flag AILIA_SPEECH_FLAG_LIVE to ailiaSpeechCreate.
The transcription preview is notified by being passed in argument to IntermediateCallback.
When using voice activity detection, call the ailiaSpeechOpenVAD API after the ailiaSpeechCreate API.
If you want to apply post-processing such as speech recognition error correction or translation to the speech recognition result, call the ailiaSpeechOpenPostProcessFile API after the ailiaSpeechCreate API, and call the ailiaSpeechPostProcess API after ailiaSpeechTranscribe.
When using speech recognition error correction:
When using translation:
English to Japanese:
Japanese to English:
Common:
In order to use the GPU, pass the env_id corresponding to the GPU as the env_id
argument of ailiaSpeechCreate
. By default, the value AILIA_ENVIRONMENT_ID_AUTO is used, which indicates to perform the inference on the CPU. See ailia_speech_sample.cpp
as an example of how to determine the GPU env_id to be passed as the env_id
argument.
The relationship diagram for each API is as follows.