Getting Started with ELYZA-japanese-Llama-2-7b

• 2 minutes to read

To quick start, you can run ELYZA-japanese-Llama-2-7b with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference.

The ELYZA-japanese-Llama-2-7b model, developed by ELYZA, is a commercially available Japanese language model with 7 billion parameters. It is based on the Llama 2 architecture and is licensed under the LLAMA 2 Community License. The model has been fine-tuned for Japanese language capabilities and has shown impressive performance in various benchmarks It is accessible to the public, and its features, performance, and inference code are available for review.


We will cover:

  • Run ELYZA-japanese-Llama-2-7b on your own device
  • Create an OpenAI-compatible API service for ELYZA-japanese-Llama-2-7b

We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run ELYZA-japanese-Llama-2-7b on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf | bash -s -- --plugin wasi_nn-ggml

Step 2: Download the ELYZA-japanese-Llama-2-7b-Instrcut-v0.1 GGUF file. It may take a long time, since the size of the model is 4.78 GB.

curl -LO

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:ELYZA-japanese-Llama-2-7b-instruct-q5_K_M.gguf llama-chat.wasm -p llama-2-chat

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.

日本でスキーをするなら北海道と長野のどちらに行くべきですか? 両者の長所と短所を教えてください。




Create an OpenAI-compatible API service for ELYZA-japanese-Llama-2-7b

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices. The Rust source code for the app is here.

curl -LO

Then, download the chatbot web UI to interact with the model with a chatbot UI.

curl -LO
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

Next, use the following command lines to start an API server for the model. Then, open your browser to http://localhost:8080 to start the chat!

wasmedge --dir .:. --nn-preload default:GGML:AUTO:ELYZA-japanese-Llama-2-7b-instruct-q5_K_M.gguf llama-api-server.wasm -p llama-2-chat

[Image: Image.jpg] You can also interact with the API server using curl from another terminal .

curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"system", "content": "You are an AI programming assistant."}, {"role":"user", "content": "What is the capital of Paris?"}], "model":"ELYZA-japanese-Llama-2-7b"}'

That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try!

Talk to us!

Join the WasmEdge discord to ask questions and share insights. Any questions getting this model running? Please go to second-state/LlamaEdge to raise an issue or book a demo with us to enjoy your own LLMs across devices!

LLMAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications