Getting Started with Dolphin-2.2-yi-34b

Nov 13, 2023 • 3 minutes to read

The dolphin-2.2-yi-34b model is based on the 34B LLM, Yi, released by the 01.AI team. Yi is converted to the llama2 format by Charles Goddard and then further fine-tuned by Eric Hartford.

In this article, we will cover

  • How to run dolphin-2.2-yi-34b on your own device
  • How to create an OpenAI-compatible API service for dolphin-2.2-yi-34b

We will use the Rust + Wasm stack to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run the model on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf | bash -s -- --plugins wasmedge_rustls wasi_nn-ggml

Step 2: Download the model GGUF file. It may take a long time since the size of the model is several GBs.

curl -LO

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO

That's it. You can chat with the model in the terminal by typing the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:dolphin-2.2-yi-34b-ggml-model-q4_0.gguf llama-chat.wasm -p chatml -r '<|im_end|>' -s 'You are a helpful AI assistant'

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.

On my Jetson Orin IoT device, it clocks in at about 7 tokens per second.

What's the capital of France?

The capital of France is Paris.

How many planets are in the solar system?

There are eight planets in the solar system: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune, and Pluto (now considered a dwarf planet). However, traditionally, the answer would be eight.

which one is the biggest?

The largest planet in the solar system is Jupiter.


Create an OpenAI-compatible API service

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.

curl -LO

Then, use the following command lines to start an API server for the model.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:dolphin-2.2-yi-34b-ggml-model-q4_0.gguf llama-api-server.wasm -p chatml -r '<|im_end|>'

From another terminal, you can interact with the API server using curl.

curl -X POST -H 'accept:application/json' -H 'Content-Type: application/json' -d '{"messages":[{"role":"system", "content":"You are a helpful AI assistant"}, {"role":"user", "content":"What is the capital of France?"}], "model":"Dolphin-2.2-Yi-34B"}'

That’s all. WasmEdge is the easiest, fastest, and safest way to run LLM applications. Give it a try!

Join the WasmEdge discord to ask questions or share insights.

No time to DIY? Book a Demo with us to enjoy your own LLMs across devices!

LLMAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications