Getting Started with Samantha-1.2-Mistral-7b

• 4 minutes to read

The Samantha-1.2-Mistral-7b isretrained version of the Samantha Mistral-7b model, now using the ChatML prompt format instead of Vicuna-1.1. This version, trained on the Mistral-7b as a base model, underwent training for 4 hours on 4x A100 80gb GPUs across 6 epochs using the Samantha-1.1 dataset. The model focuses on philosophy, psychology, and personal relationships, positioning itself as not just an assistant but also as a friend and companion.

In this article, we will cover

  • How to run Samantha-1.2-Mistral-7b on your own device
  • How to create an OpenAI-compatible API service for Samantha-1.2-Mistral-7b

We will use the Rust + Wasm stack to develop and deploy applications for this model. There are no complex Python packages or C++ toolchains to install! Check outwhy we choose this tech stack.

Run the model on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml

Step 2: Download the model GGUF file. It may take a long time, since the size of the model is several GBs.

curl -LO https://huggingface.co/second-state/Samantha-1.2-Mistral-7B/resolve/main/samantha-1.2-mistral-7b-ggml-model-q4_0.gguf

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:samantha-1.2-mistral-7b-ggml-model-q4_0.gguf llama-chat.wasm -p chatml -r '<|im_end|>'

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.

On my Mac M1 32G memory device, it clocks in at about 17.09 tokens per second. The initial response might be sluggish as the application requires time to load the hefty model file into memory. Nevertheless, subsequent responses will be prompt.

[USER]: 
My right knee hurts when going downstairs. Could be from walking too much. Or my right leg muscle is weaker. How to get better?

[ASSISTANT]:
It's important to address knee pain promptly to prevent further complications. Here are some steps you can take to alleviate the pain and strengthen your knee:

1. Rest: Give your knee enough time to heal by reducing activities that aggravate the pain, especially going downstairs.

2. Ice: Apply an ice pack to your knee for 15-20 minutes at a time, several times a day, to reduce inflammation and pain.

3. Compression and elevation: Wear a knee brace for support and compression, and elevate your leg when possible to minimize swelling.

4. Stretching and strengthening exercises: Gradually introduce exercises that target the muscles surrounding your knee, such as quadriceps, hamstrings, and glutes, to improve stability and balance.

5. Consult a healthcare professional: If the pain persists or worsens, consider consulting a doctor or physical therapist who can provide a more tailored treatment plan based on the underlying cause.

Remember to listen to your body and avoid overexertion during the healing process.

[USER]: 

Create an OpenAI-compatible API service

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as flows.network, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

Then, use the following command lines to start an API server for the model.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Samantha-1.2-Mistral-7b.Q5_K_M.gguf llama-api-server.wasm -p chatm

From another terminal, you can interact with the API server using curl.

curl -X POST http://0.0.0.0:8080/v1/chat/completions -H 'accept:application/json' -H 'Content-Type: application/json' -d '{"messages":[{"role":"system", "content":"You are a helpful AI assistant"}, {"role":"user", "content":"What is the capital of France?"}], "model":"Samantha-1.2-Mistral-7b"}'

That’s all. WasmEdge is the easiest, fastest, and safest way to run LLM applications. Give it a try!

Join the WasmEdge discord to ask questions and share insights.

No time to DIY? Book a Demo with us to enjoy your own LLMs across devices!

LLMAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications