AI inference on the edge

Fast, lightweight, portable, rust-powered and OpenAI compatible

Powered by WasmEdge.


LLM inference
Rust+Wasm is the tech stack for LLM applications everywhere.
Lightweight. Total runtime size is 30MB as opposed 4GB for Python and 350MB for Ollama.
Fast. Full native speed on GPUs.
Portable. Single cross-platform binary on different CPUs, GPUs and OSes.
Secure. Sandboxed and isolated execution on untrusted devices.
Modern languages for inference apps.
Container-ready. Supported in Docker, containerd, Podman, and Kubernetes.
OpenAI compatible. Seamlessly integrate into the OpenAI tooling ecosystem.
Learn more | Give it a try
LLM Agent is a serverless platform for building complex data flow applications. Examples include
SaaS workflow automation apps
Streaming data analytics
Real-time AI processing
Quantitative and automated trading apps
R&D process automation
DevRel and community management
Cloud-native microservices
We work with cloud providers, especially edge cloud / CDN compute providers, to support microservices for web apps. Use cases include AI inference, database access, CRM, e-commerce, workflow management, and server-side rendering.
Data analytics
We work with streaming frameworks and databases to support embedded serverless functions for data filtering and analytics. The serverless functions could be database UDFs. They could also be embedded in data ingest or query result streams.
Blockchain smart contracts are decentralized serverless functions. Filtering / normalization / visualization functions for blockchain transaction streams are proven highly valuable. We provide serverless function runtimes to the largest web3 projects.

Try it out

> wasmedge --dir .:. --nn-preload default:GGML:AUTO:model_name.gguf llama-chat.wasm
Run Llama 2 inference on your own device

Zero python dependency! Take full advantage of the GPUs. Write once, run anywhere. Get started with Llama 2 series of models on your own device in 5 minutes.

Build a RAG-based LLM agent

Retrieval-argumented fmgeneration (RAG) is a very popular approach to build AI agents with external knowledge bases. Create your own in
Rust | Example: Learn Rust

Edge AI service

Create an HTTP microservice for image classification. It runs YOLO and Mediapipe models at native GPU speed.

Get in touch

Open Source Repositories


A cloud-native and edge-native WebAssembly Runtime



A WebAssembly runtime for dapr microservices.



A node.js compatible JavaScript runtime for WasmEdge



Building Rust functions with WebAssembly



The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge.

漏2024 Second State Inc., DBA super node LLC