The Qwen 2.5 series includes models ranging from 0.5B to 110B parameters, optimized for diverse tasks like coding, logical reasoning, and natural language understanding. These models, including smaller ones (0.5B, 1.8B, 4B, 7B, 14B) for edge devices and larger ones (72B, 110B) for enterprise use, have seen significant improvements in instruction-following, logic, and over 29 languages support. They have long-context support (up to 128K input tokens and over 8k token generation), and can generate structured outputs like JSON. The Qwen 2.5 series demonstrated strong performance in tasks like multimodal and audio/video comprehension.
Qwen2.5-14B-Instruct is an instruction-tuned large language model with 14.7 billion parameters, full 131,072 tokens and generation 8192 token context length. It uses advanced architecture, and is optimized for long-text processing with techniques. This model is well-suited for chatbot and long-text generation tasks.
In this tutorial, you’ll learn how to
- Run the Qwen2.5-14B-instruct model locally
- A drop-in replacement for OpenAI in your apps or agents
We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There are no complex Python packages or C++ toolchains to install! See why we choose this tech stack.
Run the Qwen2.5-14B-instruct model locally
Step 1: Install WasmEdge via the following command line.
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- -v 0.14.1
Step 2: Download the Qwen2.5-14B-instruct GGUF file. Since the size of the model is 10.5G, it could take a while to download.
curl -LO https://huggingface.co/second-state/Qwen2.5-14B-Instruct-GGUF/resolve/main/Qwen2.5-14B-Instruct-Q5_K_M.gguf
Step 3: Download the LlamaEdge API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.
curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm
Step 4: Download the chatbot UI for interacting with the Qwen2.5-14B-instruct model in the browser.
curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz
Next, use the following command lines to start an LlamaEdge API server for the model.
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Qwen2.5-14B-Instruct-Q5_K_M.gguf \
llama-api-server.wasm \
--prompt-template chatml \
--ctx-size 128000
Then, open your browser to http://localhost:8080
to start the chat! You can also send an API request to the model
A drop-in replacement for OpenAI
LlamaEdge is lightweight and does not require a daemon or sudo process to run. It can be easily embedded into your own apps! With support for both chat and embedding models, LlamaEdge could become an OpenAI API replacement right inside your app on the local computer!
Next we will show you how to start a full API server for the Qwen2.5-14B-instruct model along with an embedding model. The API server will have chat/completions
and embeddings
endpoints. In addition to the steps in the previous section, we will also need to:
Step 5: Download an embedding model.
curl -LO https://huggingface.co/second-state/Nomic-embed-text-v1.5-Embedding-GGUF/resolve/main/nomic-embed-text-v1.5.f16.gguf
Then, we can use the following command line to start the LlamaEdge API server with both chat and embedding models. For more detailed explanation, check out the doc start a LlamaEdge API service.
wasmedge --dir .:. \
--nn-preload default:GGML:AUTO:Qwen2.5-14B-instruct-Q5_K_M.gguf \
--nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5.f16.gguf \
llama-api-server.wasm \
--model-alias default,embedding \
--model-name Qwen2.5-14B-instruct,nomic-embed \
--prompt-template chatml,embedding \
--batch-size 128,8192 \
--ctx-size 4096,8192
Finally, you can followthese tutorialsto integrate the LlamaEdge API server as a drop-in replacement for OpenAI with other agent frameworks. Specially, use the following values in your app or agent configuration to replace the OpenAI API.
Config option | Value |
---|---|
Base API URL | http://localhost:8080/v1 |
Model Name (for LLM) | Qwen2.5-14B-instruct |
Model Name (for Text embedding) | nomic-embed |
That’s it! Access the LlamaEdge repo and build your first agent today! If you have fun building and exploring, be sure tostar the repo HERE.
Learn more from the LlamaEdge docs. Join the WasmEdge discord to ask questions and share insights.