ailia_tokenizer  1.3.0.0
Setup and Build

Compiler setup

Windows

Requirements:

  • VisualStudio 2019 and higher

macOS

Requirements:

  • Xcode 14.2 and higher

Linux

Requirements:

  • clang

Building the sample

Move to the cpp folder, then run the command corresponding to your OS as described below.

Windows

cl ailia_tokenizer_sample.cpp ailia_tokenizer.lib

macOS

clang++ -o ailia_tokenizer_sample ailia_tokenizer_sample.cpp libailia_tokenizer.dylib -Wl,-rpath,./ -std=c++17

Linux

export LD_LIBRARY_PATH=./
g++ -o ailia_tokenizer_sample ailia_tokenizer_sample.cpp libailia_tokenizer.so

Running the sample

Run the sample by executing the command below.

./ailia_tokenizer_sample

Example of output:

Tokenizer type 0
Input Text : ハードウェア ソフトウェア
Tokens : 15927 44165 20745 28571 12817 220 42668 17320 7588 20745 28571 12817
Output Text : ハードウェア ソフトウェア

Available options when executing the sample

The sample executable can take the following options.

./ailia_tokenizer_sample [tokenizer_type]

See below for details about each argument:

Contents Detail
tokenizer type Specify the tokenizer type numerically; if you use anything other than AILIA_TOKENIZER_TYPE_WHISPER and AILIA_TOKENIZER_TYPE_CLIP, you will need to download the model file separately from huggingface or other sources.

Platform-specific remarks

Release the 'downloaded' attribute in MacOS

Binaries downloaded from the browser in macOS may have a 'downloaded' attribute and may not be able to be executed. In that case, it is possible to remove the attribute by right-clicking on the .dylib file and executing it. You can also release the 'downloaded' attribute by executing the following command from the command line:

xattr -d com.apple.quarantine libailia_tokenizer.dylib