Getting Started with Dolphin-2.6-Phi-2

• 3 minutes to read

To quick start, you can run Dolphin-2.6-Phi-2 with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference.

Dolphin 2.6 Phi-2, developed by Eric Hartford and Fernando Fernandes, is an advanced language model based on the Phi-2 architecture. Sponsored by Convai, this model has undergone significant improvements in its latest 2.6 version, including a fix to a training configuration issue and the reintroduction of samantha-based empathy data. It replaces previous models synthia and pure-dove with Capybara. Notably, Dolphin 2.6 Phi-2 is uncensored, with alignment and bias removed for increased compliance. However, users are advised to implement their own alignment layer for ethical use.

We will cover:

  • Run Dolphin-2.6-Phi-2 on your own device
  • Create an OpenAI-compatible API service for Dolphin-2.6-Phi-2

We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run Dolphin-2.6-Phi-2 on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf | bash -s -- --plugin wasi_nn-ggml

Step 2: Downloadthe Dolphin-2.6-Phi-2-Instrcut-v0.1 GGUF file. It may take a long time, since the size of the model is 2.07 GB.

curl -LO

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:dolphin-2_6-phi-2.Q5_K_M.gguf llama-chat.wasm -p chatml

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.

What is CES?

CES, or Consumer Electronics Show, is an annual event held in the United States by the Consumer Technology Association (CTA). It features a showcase of new consumer electronic products, innovations, and trends in technology. The event typically takes place in Las Vegas during January. CES serves as an exhibition venue where companies can reveal cutting-edge devices, services, and technologies to consumers, industry professionals, investors, and media personnel. It has grown into one of the world's largest technology conferences and events, attracting over 200,000 attendees each year. The event often influences the direction of technological advancements in various industries, such as telecommunications, consumer electronics, computing, home entertainment, and automotive

Create an OpenAI-compatible API service for Dolphin-2.6-Phi-2

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices. The Rust source code for the app is here.

curl -LO

Then, download the chatbot web UI to interact with the model with a chatbot UI.

curl -LO
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

Next, use the following command lines to start an API server for the model. Then, open your browser to http://localhost:8080 to start the chat!

wasmedge --dir .:. --nn-preload default:GGML:AUTO:dolphin-2_6-phi-2.Q5_K_M.gguf llama-api-server.wasm -p chatml

You can also interact with the API server using curl from another terminal .

curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"system", "content": "You are an AI assistant."}, {"role":"user", "content": "What is the capital of France?"}], "model":"dolphin-2.6-phi-2"}'

That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try! Join the WasmEdge discord to ask questions or share insights.

LLMAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications