Effortless JSON Generation with Osmosis‑Structure‑0.6B

Osmosis recently open-sourced a specialized small language model called Osmosis-Structure-0.6B. It is optimized for generating structured output. Structured output—such as JSON—is essential for use cases like agents and coding. With just 0.6 billion parameters, this lightweight model is ideal for self-hosting on your own device.

Why does this matter? Interestingly, prompting a regular LLM to directly produce structured output like JSON often reduces its performance on complex tasks. A better approach is to let the main LLM generate responses in natural language first, then pass that output to Osmosis-Structure-0.6B to convert it into structured JSON. This two-step process improves the overall performance and reliability of agent workflows.

In this article, I will show you how to run Osmosis-Structure-0.6B on your own device and you can easily integrate the model into your LLM workflows.

We will use the Rust + Wasm stack to develop and deploy applications for this model. There are no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Step 1: Install WasmEdge via the following command line.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- -v 0.14.1

Step 2: Download the Quantized Osmosis-Structure-0.6B

The Osmosis-Structure-0.6B-Q5_K gguf model is 0.4 GB in size and it won’t take a long time to download. Check out all the available quantizations of the Osmosis-Structure-0.6B model from Second State’s Hugging Face page.

curl -LO https://huggingface.co/second-state/Osmosis-Structure-0.6B-GGUF/resolve/main/Osmosis-Structure-0.6B-Q5_K_M.gguf

Step 3: Download the LlamaEdge API server

It is a cross-platform LLM inference and API server that can run on many CPU and GPU devices.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

Step 4: Run the model

Next, use the following command lines to start a LlamaEdge API server for the model. LlamaEdge provides an OpenAI compatible API, and you can connect any chatbot client or agent to it! Copy

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Osmosis-Structure-0.6B-Q5_K_M.gguf \
  llama-api-server.wasm \
  --prompt-template chatml \
  --model-name Osmosis-Structure-0.6B \
  --ctx-size 128000

You can send an API request to the Osmosis-Structure-0.6B-Q5_K_M model.

curl -X POST http://localhost:8080/v1/chat/completions \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
  "model": "Osmosis-Structure-0.6B",
  "messages": [
    {
      "role": "system",
      "content": "According to the user input, return the following information as a JSON object, where the keys are inside a `properties` field:\n\n- name (string)\n- age (integer)\n- hobbies (array of strings)."
    },
    {
      "role": "user",
      "content": "Alice is 30 years old and she loves hiking, swimming, and cooking."
    }
  ]
}'

The output should be like the following content and you can see the Osmosis-Structure-0.6B model is excelling at generating structured data.

{"id":"chatcmpl-14ddb742-5402-4cea-ae8d-8766b60a8e87","object":"chat.completion","created":1748604669,"model":"Osmosis-Structure-0.6B","choices":[{"index":0,"message":{"content":"{\"properties\": {\"name\": \"Alice\", \"age\": 30, \"hobbies\": [\"hiking\", \"swimming\", \"cooking\"], \"title\": \"NsklrxrjModel\", \"type\": \"object\"}","role":"assistant"},"finish_reason":"stop","logprobs":null}],"usage":{"prompt_tokens":72,"completion_tokens":56,"total_tokens":128}}