ailia_speech  1.3.0.0
Setup and Build

Compiler setup

Windows

Requirements:

  • VisualStudio 2019 and higher

macOS

Requirements:

  • Xcode 14.2 and higher

Linux

Requirements:

  • clang

License file placement

A license file is required to use the evaluation version. Please place the license file in the folder as described below.

Windows

Place the license file ailia.lic in the same foolder as ailia.dll (or in the cpp folder if this is the sample).

macOS

Place the license file ailia.lic in ~/Library/SHALO/.

Linux

Place the license file ailia.lic in ~/.shalo/.

Building the sample

Move to the cpp folder, then run the command corresponding to your OS as described below.

Windows

cl ailia_speech_sample.cpp wave_reader.cpp ailia.lib ailia_audio.lib ailia_tokenizer.lib ailia_speech.lib

macOS

clang++ -o ailia_speech_sample ailia_speech_sample.cpp wave_reader.cpp libailia.dylib libailia_audio.dylib libailia_speech.dylib libailia_tokenizer.dylib -Wl,-rpath,./ -std=c++17

Linux

export LD_LIBRARY_PATH=./
g++ -o ailia_speech_sample ailia_speech_sample.cpp wave_reader.cpp libailia.so libailia_audio.so libailia_tokenizer.so libailia_speech.so

Running the sample

Run the sample by executing the command below.

./ailia_speech_sample

Example of output:

Usage ./ailia_speech_sample input.wav [base/tiny/small/medium] [auto/ja] [transcribe/translate/live] [vad_enable/vad_disable] [none/silent_threshold/prompt/constraint_char/constraint_word/dictionary] [auto/cpu/blas/gpu]
Input path:./demo.wav
Model type:small
Language type:auto
Task:transcribe
Vad:vad_enable
Option:none
Env:auto
Environment ID:0 TYPE:0 NAME:CPU
Environment ID:1 TYPE:1 NAME:CPU-AppleAccelerate
Environment ID:2 TYPE:2 NAME:MPSDNN-Apple M2
Selected Environment:auto
Input wave sec 10.512000
[00:00.000 --> 00:05.640] [0.9310] He hoped there would be stew for dinner, turnips and carrots and bruised potatoes and fat
[00:05.640 --> 00:20.840] [0.7744] mutton pieces to be ladled out in thick peppered flour fat and sauce.

Available options when executing the sample

The sample executable can take the following options.

./ailia_speech_sample input.wav [base/tiny/small/medium] [auto/ja] [transcribe/translate/live] [vad_enable/vad_disable] [none/silent_threshold/prompt/constraint_char/constraint_word] [auto/cpu/blas/gpu]

See below for details about each argument:

Contents Details
Input file name Indicates the audio input file. Only accepts WAV files.
Model The size category of the model to use. Among: base, tiny, small, medium. Given here by order of increasing model accuracy.
Language Indicates the language to use. If the value is auto the language is automatically detected. Use the value ja for Japanese.
Mode Indicates the runtime mode. The available values are: transcribe, translate, live. The two former are self-explaining, and live indicates real-time transcription.
VAD Indicates whether VAD should be used.
Option Indicates the option to use. silent_threshold will activate the detection of silent intervals. prompt allows to pass a context. constraint_char indicates that only the vocabulary provided will be recognized.
Execution environment auto will select automatically between cpu and blas. gpu will enable the use of GPU hardware for the inference.

Download additional model

The SDK contains a Small model. The Medium and Large model, which is more accurate than the Small model, can be downloaded from the URL below.

Medium

Large

Large V3

Post-process model

The SDK does not include a post-process model. If you want to use post-processing, you can download it from the following URL.

T5

FuguMT EN JA

FuguMT JA EN

Platform-specific remarks

Release the 'downloaded' attribute in MacOS

Binaries downloaded from the browser in macOS may have a 'downloaded' attribute and may not be able to be executed. In that case, it is possible to remove the attribute by right-clicking on the .dylib file and executing it. You can also release the 'downloaded' attribute by executing the following command from the command line:

xattr -d com.apple.quarantine libailia_speech.dylib