Getting Started with ELYZA-japanese-Llama-2-7b

Jan 10, 2024 • 2 minutes to read

To quick start, you can run ELYZA-japanese-Llama-2-7b with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference.

The ELYZA-japanese-Llama-2-7b model, developed by ELYZA, is a commercially available Japanese language model with 7 billion parameters. It is based on the Llama 2 architecture and is licensed under the LLAMA 2 Community License. The model has been fine-tuned for Japanese language capabilities and has shown impressive performance in various benchmarks It is accessible to the public, and its features, performance, and inference code are available for review.

Credit: https://huggingface.co/elyza/ELYZA-japanese-Llama-2-7b

We will cover:

  • Run ELYZA-japanese-Llama-2-7b on your own device
  • Create an OpenAI-compatible API service for ELYZA-japanese-Llama-2-7b

We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run ELYZA-japanese-Llama-2-7b on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasmedge_rustls wasi_nn-ggml

Step 2: Download the ELYZA-japanese-Llama-2-7b-Instrcut-v0.1 GGUF file. It may take a long time, since the size of the model is 4.78 GB.

curl -LO https://huggingface.co/second-state/ELYZA-japanese-Llama-2-7b-instruct-GGUF/resolve/main/ELYZA-japanese-Llama-2-7b-instruct-q5_K_M.gguf

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:ELYZA-japanese-Llama-2-7b-instruct-q5_K_M.gguf llama-chat.wasm -p llama-2-chat

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.

[You]:
日本でスキーをするなら北海道と長野のどちらに行くべきですか? 両者の長所と短所を教えてください。

[Bot]:
日本でスキーをするならば、北海道と長野のどちらが適しているかについてお答えします。

北海道は、広大な土地と多くの山々が特徴です。そのため、ゲレンデも非常に広く、他のスキー場へのアクセスも便利であることが挙げられます。一方で、観光地化されていない分、食事や宿泊等の設備は他の地域に比べると劣るというデメリットが挙げられます。

長野県は、観光地化されており、食事や宿泊等の設備も整っている分、利便性に富んでいると言えます。一方で、ゲレンデが狭く、スキー客で賑わうなどの理由から、コブが少ない等の欠点もあることに注意してください。

Create an OpenAI-compatible API service for ELYZA-japanese-Llama-2-7b

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as flows.network, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices. The Rust source code for the app is here.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

Then, download the chatbot web UI to interact with the model with a chatbot UI.


curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

Next, use the following command lines to start an API server for the model. Then, open your browser to http://localhost:8080 to start the chat!

wasmedge --dir .:. --nn-preload default:GGML:AUTO:ELYZA-japanese-Llama-2-7b-instruct-q5_K_M.gguf llama-api-server.wasm -p llama-2-chat

[Image: Image.jpg] You can also interact with the API server using curl from another terminal .

curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"system", "content": "You are an AI programming assistant."}, {"role":"user", "content": "What is the capital of Paris?"}], "model":"ELYZA-japanese-Llama-2-7b"}'

That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try!

Talk to us!

Join the WasmEdge discord to ask questions and share insights. Any questions getting this model running? Please go to second-state/LlamaEdge to raise an issue or book a demo with us to enjoy your own LLMs across devices!

LLMAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications