Getting Started with DeepSeek-Coder-6.7B

• 3 minutes to read

DeepSeek-Coder-6.7B is among DeepSeek Coder series of large code language models, pre-trained on 2 trillion tokens of 87% code and 13% natural language text. DeepSeek Coder models are trained with a 16,000 token window size and an extra fill-in-the-blank task to enable project-level code completion and infilling. DeepSeek Coder achieves state-of-the-art performance on various code generation benchmarks compared to other open-source code models.

In this article, we will cover

  • How to run DeepSeek-Coder-6.7B on your own device
  • How to create an OpenAI-compatible API service for DeepSeek-Coder-6.7B

We will use the Rust + Wasm stack to develop and deploy applications for this model. There are no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run the DeepSeek-Coder-6.7B model on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf | bash -s -- --plugin wasi_nn-ggml

Step 2: Download theDeepSeek-Coder-6.7B model GGUF file. It may take a long time, since the size of the model is several GBs.

curl -LO

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:deepseek-coder-6.7b-instruct.Q5_K_M.gguf llama-chat.wasm -p deepseek-coder

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.

On my Mac M2 16G memory device, it clocks in at about 5 tokens per second.

Write a rust function for a simple to-do app

Sure, here is a simple Rust function for a to-do app:

use std::collections::HashMap;

struct TodoApp {
    tasks: HashMap<String, String>,

impl TodoApp {
    fn new() -> TodoApp {
        TodoApp {
            tasks: HashMap::new(),

    fn add_task(&mut self, task_name: String, task_description: String) {
        self.tasks.insert(task_name, task_description);

    fn remove_task(&mut self, task_name: String) {

    fn get_task(&self, task_name: String) -> Option<&String> {

fn main() {
    let mut app = TodoApp::new();
    app.add_task("Task 1".to_string(), "This is task 1".to_string());
    app.add_task("Task 2".to_string(), "This is task 2".to_string());

This Rust function creates a simple to-do app with three methods: `add_task`, `remove_task`, and `get_task`. The `add_task` method adds a task to the app, the `remove_task` method removes a task, and the `get_task` method retrieves a task. The tasks are stored in a `HashMap` where the task name is the key and the task description is the value.


Create an OpenAI-compatible API service for the DeepSeek-Coder-6.7B model

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.

curl -LO

Then, use the following command lines to start an API server for the model.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:deepseek-coder-6.7b-instruct.Q5_K_M.gguf llama-api-server.wasm -p deepseek-coder

From another terminal, you can interact with the API server using curl.

curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"system", "content": "You are a helpful assistant."}, {"role":"user", "content": "Who is Robert Oppenheimer?"}], "model":"Deepseek-Coder-6.7B-Instruct"}'

That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try!

Join the WasmEdge discord to ask questions and share insights. Any questions getting this model running? Please go to second-state/LlamaEdge to raise an issue or book a demo with us to enjoy your own LLMs across devices!

LLMAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications