Getting Started with SOLAR-10.7B-Instruct-v1.0

• 3 minutes to read

To quick start, you can run SOLAR-10.7B-Instruct-v1.0 with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference.

SOLAR-10.7B-Instruct-v1.0 is a cutting-edge language model with 10.7 billion parameters, known for its exceptional performance in natural language processing tasks. This model stands out due to its depth up-scaling methodology, which includes architectural enhancements and additional pre-training. Specifically, it incorporates Mistral 7B weights into its upscaled layers, followed by extended pre-training across the entire model. Impressively, SOLAR-10.7B outperforms models with up to 30 billion parameters, surpassing even the recent Mixtral 8X7B model. It is particularly well-suited for fine-tuning applications, offering robustness and adaptability. The model's fine-tuning process is simplified, yet yields significant improvements in performance.

In this article, we will cover

  • How to run SOLAR-10.7B-Instruct-v1.0 on your own device
  • How to create an OpenAI-compatible API service for SOLAR-10.7B-Instruct-v1.0

We will use the Rust + Wasm stack to develop and deploy applications for this model. There are no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run the SOLAR-10.7B-Instruct-v1.0 model on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml

Step 2: Download the SOLAR-10.7B-Instruct-v1.0 model GGUF file. It may take a long time, since the size of the model is several GBs.

curl -LO https://huggingface.co/second-state/SOLAR-10.7B-Instruct-v1.0-GGUF/resolve/main/solar-10.7b-instruct-v1.0.Q5_K_M.gguf

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:solar-10.7b-instruct-v1.0.Q5_K_M.gguf llama-chat.wasm -p solar-instruct

The portable Wasm app automatically takes advantage of the hardware accelerators (e.g. GPUs) I have on the device.

[You]:
what is New Year's resolution?

[Bot]
Answer: A New Year's resolution is a promise or commitment that an individual makes to themselves, often at the beginning of a new year, to change a habit or behavior, improve their lifestyle, achieve a specific goal or personal development. These resolutions can range from losing weight and quitting smoking, to saving more money, volunteering more, or practicing gratitude regularly. The idea behind New Year's resolutions is to start fresh with a positive intention and create a sense of direction for the upcoming year.

Create an OpenAI-compatible API service for the SOLAR-10.7B-Instruct-v1.0 model

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as flows.network, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

Then, download the chatbot web UI to interact with the model with a chatbot UI.

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

Next, use the following command lines to start an API server for the model. Then, open your browser to http://localhost:8080 to start the chat!

wasmedge --dir .:. --nn-preload default:GGML:AUTO:solar-10.7b-instruct-v1.0.Q5_K_M.gguf llama-api-server.wasm -p solar-instruct

You can also interact with the API server using curl from another terminal .

curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"system", "content": "You are an AI programming assistant."}, {"role":"user", "content": "What is the capital of France?"}], "model":"SOLAR-10.7B-Instruct-v1.0"}'

That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try!

Talk to us!

Join the WasmEdge discord to ask questions and share insights. Any questions getting this model running? Please go to second-state/LlamaEdge to raise an issue or book a demo with us to enjoy your own LLMs across devices!

LLMAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications