Or you can run this newest model with just one single command on your mac/ across devices.
The Mistral-7B-Instruct-v0.2 model is a new model released by the Mistral AI team. It’s built upon the successful foundation of its predecessor, the Mistral-7B-v0.1. This model stands out for its improved abilities in understanding and following complex instructions, making it an even more powerful tool for a wide range of applications.This combination of advanced technology and user-friendly design makes Mistral-7B-Instruct-v0.2 a highly sought-after model in the world of AI and natural language processing.
In this article, we will cover
- How to run Mistral-7B-Instruct-v0.2 on your own Mac/ across devices
- How to create an OpenAI-compatible API service for Mistral-7B-Instruct-v0.2
 We will use the Rust + Wasm stack to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.
We will use the Rust + Wasm stack to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.
Run the model on your own device
Step 1: Install WasmEdge via the following command line.
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasmedge_rustls wasi_nn-ggml
Step 2: Download the model GGUF file. It may take a long time, since the size of the model is several GBs.
curl -LO https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_0.gguf
Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.
curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm
That's it. You can chat with the model in the terminal by entering the following command.
wasmedge --dir .:. --nn-preload default:GGML:AUTO:mistral-7b-instruct-v0.2.Q4_0.gguf llama-chat.wasm -p mistral-instruct
The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.
On my Mac M2 16G memory device, it clocks in at about 7 tokens per second.
[You]: 
What is Mistral AI?
 Mistral AI is a cutting-edge company based in Paris, France, developing large language models. I am very excited about the progress they have made and the potential of their models to understand and generate human-like text. However, please note that as a text-based AI, I don't have the ability to directly interact with or use Mistral AI's models. I can only share information I have been programmed with or generate text based on that information. For more details about Mistral AI
[You]: 
Is it a good company to join in?
 Based on the information that is publicly available, Mistral AI is considered to be a promising and innovative company in the field of artificial intelligence, specifically in the area of large language models. They have received significant attention and investment, and their team includes experienced researchers and engineers in the field. Joining Mistral AI could provide opportunities to work on advanced AI projects and learn from experts in the field. However, it's important to consider that joining any company is a significant decision that depends on many factors, including your personal career goals, skills, and preferences. I would recommend researching the company thoroughly, including their mission, culture, and job opportunities, before making a decision. Additionally, keep in mind that the AI industry is rapidly evolving, so staying informed about the latest developments and trends in the field can help you make an informed decision.
Create an OpenAI-compatible API service
An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as flows.network, LangChain and LlamaIndex.
Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.
curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm
Then, use the following command lines to start an API server for the model.
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Mistral-7B-Instruct-v0.2.Q5_K_M.gguf llama-api-server.wasm -p mistral-instruct
From another terminal, you can interact with the API server using curl.
curl -X POST http://0.0.0.0:8080/v1/chat/completions -H 'accept:application/json' -H 'Content-Type: application/json' -d '{"messages":[{"role":"system", "content":"You are a helpful AI assistant"}, {"role":"user", "content":"What is the capital of France?"}], "model":"Mistral-7B-Instruct-v0.2"}'
That’s all. WasmEdge is the easiest, fastest, and safest way to run LLM applications. Give it a try!
Join the WasmEdge discord to ask questions or share insights.
No time to DIY? Book a Demo with us to enjoy your own LLMs across devices!
