Getting Started with Starling-LM-7B-alpha

Starling-LM-7B-alpha is a large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). In other words, it is trained by GPT-4 generated synthetic conversations. It is developed by the Berkeley-Nest team. According to commonly accepted benchmarks, the model excels in education, STEM, humanities, writing and role play.

In this article, we will cover

How to run Starling-LM-7B-alpha on your own device
How to create an OpenAI-compatible API service for Starling-LM-7B-alpha

We will use the Rust + Wasm stack to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run the Starling-LM-7B-alpha model on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasmedge_rustls wasi_nn-ggml

Step 2: Download the Starling-LM-7B-alpha model GGUF file.It may take a long time, since the size of the model is several GBs.

curl -LO https://huggingface.co/second-state/Starling-LM-7B-alpha-GGUF/resolve/main/starling-lm-7b-alpha.Q5_K_M.gguf

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:starling-lm-7b-alpha.Q5_K_M.gguf llama-chat.wasm -p openchat -r '<|end_of_turn|>'

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.

On my Mac M1 32G memory device, it clocks in at about 20.37 tokens per second.

[USER]: Tell me how the humanities help us imagine the future. List three reasons. 

[ASSISTANT]: The humanities help us imagine the future in several ways, as they provide valuable perspectives and insights that can guide our thinkin g and decision-making processes. Here are three reasons how the humanities contribute to this endeavor: 1. Cultural and histor ical context: The humanities, including disciplines like history, literature, and philosophy, offer a deep understanding of human culture and the evolution of societies. By examining past events, ideas, and practices, we can identify patterns, learn from successes and failu res, and anticipate potential challenges and opportunities in the future. This knowledge helps us to create more informed and nuanced vis ions of the future, taking into account the complexities of human nature and social dynamics.<oxoA><0x0A>2. Ethical and moral considerati ons: The humanities also explore the ethical and moral dimensions of human existence. Through disciplines like philosophy, religion, and the arts, we can reflect on the values and principles that guide our actions and decisions. By engaging with these questions, we can deve lop a more compassionate, equitable, and sustainable vision of the future, one that takes into account the well-being of all people and t he planet.<0x0A><0x0A>3. Creative problem-solving and critical thinking: The humanities foster creativity, critical thinking, and empathy , which are essential skills for imagining and navigating the future. By engaging with diverse perspectives, ideas, and narratives, we ca n develop a more holistic understanding of the world and the challenges we face. This enables us to think more creatively and critically about potential solutions, and to develop more inclusive and innovative visions of the future that account for the diverse needs and aspirations of humanity.<0x0A><0x0A>In summary, the humanities help us imagine the future by providing valuable insights into human culture, history, ethics, and critical thinking. These perspectives enable us to create more informed, compassionate, and innovative visions of th e future, which can guide our actions and decisions as we strive to build a better world.

[USER]:

As we run out of human words to train the next-gen LLMs, many believe that the future is for LLMs to train on synthetic text (think AlphaGo vs AlphaZero). The #Starling7B model from @Berkeley_EECS is great attempt on this. It matches GPT4 performance in subjects such as… pic.twitter.com/YH7EJ6pMty
— wasmedge (@realwasmedge) November 28, 2023

Create an OpenAI-compatible API service for the Starling-LM-7B-alpha model

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as flows.network, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

Then, use the following command lines to start an API server for the model.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:starling-lm-7b-alpha.Q5_K_M.gguf llama-api-server.wasm -p openchat

From another terminal, you can interact with the API server using curl.

curl -X POST http://0.0.0.0:8080/v1/chat/completions -H 'accept:application/json' -H 'Content-Type: application/json' -d '{"messages":[{"role":"system", "content":"You are a helpful AI assistant"}, {"role":"user", "content":"What is the capital of France?"}], "model":"Starling-LM-7B-alpha"}'

That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try!

Join the WasmEdge discord to ask questions and share insights.

Any questions getting this model running? Please go to second-state/LlamaEdge to raise an issue or book a demo with us to enjoy your own LLMs across devices!