Getting Started with CALM2-7B-Chat

• 3 minutes to read

To quick start, you can run CALM2-7B-Chat with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference.

CALM2-7B-Chat is an advanced language model fine-tuned for dialogue use cases from CyberAgentLM2, which is a decoder-only language model pre-trained on the 1.3T tokens of publicly available Japanese and English datasets. It is trained by CyberAgent, a Japanese digital advertising company.

In this article, we will cover

  • How to run CALM2-7B-Chat on your own device
  • How to create an OpenAI-compatible API service for Calm2-7B-Chat

We will use the Rust + Wasm stack to develop and deploy applications for this model. There are no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run the CALM2-7B-Chat model on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml

Step 2: Download the CALM2-7B-Chat model GGUF file.It may take a long time, since the size of the model is several GBs.

curl -LO https://huggingface.co/second-state/Calm2-7B-Chat-GGUF/resolve/main/calm2-7b-chat.Q4_K_M.gguf

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:calm2-7b-chat.Q4_K_M.gguf llama-chat.wasm -p vicuna-1.1-chat

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.

On my Mac M1 32G memory device, it clocks in at about 30 tokens per second.

[You]: 
二日間の京都旅行計画

 
 京都は、日本の文化と歴史を体験できる素晴らしい場所です。二日間の京都旅行を計画してみましょう。


一日目:
1. 京都に到着したら、まずは京都タワーに向かいましょう。京都タワーの展望台からは、京都の街並みを一望できます。
2. その後、京都御苑を散策しましょう。京都御苑は、京都の中心地にあり、美しい庭園や自然を楽しむことができます。
3. 昼食は、京都の伝統的な料理である京料理を楽しみましょう。おすすめは、京都の有名な料亭「菊乃井」です。
4. その後、京都国立博物館を訪れ、日本の伝統芸能である能や歌舞伎の展示を楽しみましょう。
5. 夕食は、祇園の街で、京都の伝統的な料理である精進料理を楽しみましょう。


二日目:
1. 京都の朝を散策しましょう。京都の朝はとても静かで、美しい庭園や神社仏閣を楽しむことができます。
2. 朝食は、京都の伝統的な料理であるおばんざいを楽しみましょう。おすすめは、京都の有名な朝食レストラン「朝食喜心」です。
3. その後、京都の有名な観光地、清水寺を訪れましょう。清水寺は、京都の街を一望できる絶景スポットです。
4. 昼食は、京都の有名なラーメン店「新福菜館」でラーメンを食べましょう。
5. 最後に、京都駅に戻り、新幹線や電車で帰宅しましょう。

Create an OpenAI-compatible API service for the CALM2-7B-Chat model

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as flows.network, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

Then, download the chatbot web UI to interact with the model with a chatbot UI.

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

Next, use the following command lines to start an API server for the model. Then, open your browser to http://localhost:8080 to start the chat!

wasmedge --dir .:. --nn-preload default:GGML:AUTO:calm2-7b-chat.Q4_K_M.gguf llama-api-server.wasm -p vicuna-1.1-chat

You can also interact with the API server using curl from another terminal .

  curl -X POST http://localhost:8080/v1/chat/completions \
    -H 'accept:application/json' \
    -H 'Content-Type: application/json' \
    -d '{"messages":[{"role":"system", "content": "You are a helpful assistant."}, {"role":"user", "content": "二日間の京都旅行計画"}], "model":"CALM2-7B-Chat"}'

That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try!

Fun Fact

WasmEdge kicked off Cloud Native Days Tokyo on Dec 11 with a talk on Wasm becoming the runtime for LLMs, which addresses the limitations of Python in this context and discusses how popular LLM frameworks are shifting towards zero Python dependency, and how wasm could be the container for AGI.

Later, CyberAgent talked about their newly trained language model, CALM2. The performance of this model is really impressive. It can handle any questions in Japanese.

The performance of this model is really impressive. It can handle any questions in Japanese. So WasmEdge ran a demo of using Mac to run the Japan produced LLM, CALM2-7B-Chat, which is warmly received by developers.

Talk to us!

Join the WasmEdge discord to ask questions and share insights. Any questions getting this model running? Please go to second-state/LlamaEdge to raise an issue or book a demo with us to enjoy your own LLMs across devices!

LLMAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications