Getting Started with Dolphin-2.2.1-Mistral-7B

Nov 23, 2023 • 3 minutes to read

The Dolphin-2.2.1-Mistral-7B, developed by Eric Hartford, is an iteration of the Dolphin family of models, building upon the previous Dolphin 2.1 Mistral version. This model is distinguished by its enhanced conversation and empathy skills, based on the Mistral-7B-v0.1 and is designed to offer a more empathetic AI experience, aiming to provide highly engaging and personal chat interactions.

In this article, we will cover

  • How to run Dolphin-2.2.1-Mistral-7B on your own device
  • How to create an OpenAI-compatible API service for Dolphin-2.2.1-Mistral-7B

We will use the Rust + Wasm stack to develop and deploy applications for this model. There are no complex Python packages or C++ toolchains to install! See why we choose the Rust + Wasm tech stack.

Run the model on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasmedge_rustls wasi_nn-ggml

Step 2: Download the model GGUF file. It may take a long time, since the size of the model is several GBs.

curl -LO https://huggingface.co/second-state/Dolphin-2.2.1-Mistral-7B/resolve/main/dolphin-2.2.1-mistral-7b-ggml-model-q4_0.gguf

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:dolphin-2.2.1-mistral-7b-ggml-model-q4_0.gguf llama-chat.wasm -p chatml -r '<|im_end|>'

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device. The initial response may experience some delay as the application must load the extensive model file into memory. However, subsequent responses will be immediate.

On my Mac M1 32G memory device, it clocks in at about 16.81 tokens per second.

[USER]: 
My right knee hurts when going downstairs. Could be from walking too much. Or my right leg muscle is weaker. How to get better?

[ASSISTANT]:
1. Rest: Give your knee time to recover by avoiding activities that cause pain.
2. Ice: Apply ice to the affected area for 15-20 minutes, several times a day, to reduce inflammation and pain.
3. Compression: Use a compression bandage or wrap to support your knee and reduce swelling.
4. Elevation: Keep your knee elevated when possible to minimize swelling.
5. Over-the-counter pain relievers: Take anti-inflammatory medications like ibuprofen or acetaminophen to manage pain and inflammation.
6. Physical therapy: Consult a physical therapist for exercises and stretches to strengthen your leg muscles and improve knee stability.
7. Consult a doctor: If the pain persists or worsens, consult a doctor to rule out any underlying conditions.

Remember, it's essential to address the root cause of the pain to prevent further complications. If you're unsure about the cause or severity of your pain, consult a healthcare professional.

[USER]:

Create an OpenAI-compatible API service

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as flows.network, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

Then, use the following command lines to start an API server for the model.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Dolphin-2.2.1-Mistral-7B.Q5_K_M.gguf llama-api-server.wasm -p chatml

From another terminal, you can interact with the API server using curl.

curl -X POST http://0.0.0.0:8080/v1/chat/completions -H 'accept:application/json' -H 'Content-Type: application/json' -d '{"messages":[{"role":"system", "content":"You are a helpful AI assistant"}, {"role":"user", "content":"What is the capital of France?"}], "model":"Dolphin-2.2.1-Mistral-7B"}'

That’s all. WasmEdge is the easiest, fastest, and safest way to run LLM applications. Give it a try!

Join the WasmEdge discord to ask questions and share insights.

No time to DIY? Book a Demo with us to enjoy your own LLMs across devices!

LLMAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications