AI inference on the edge

Fast, lightweight, portable, rust-powered and OpenAI compatible

Powered by WasmEdge.

Applications

LLM inference: Rust+Wasm is the tech stack for LLM applications everywhere.; · Lightweight. Total runtime size is 30MB as opposed 4GB for Python and 350MB for Ollama.; · Fast. Full native speed on GPUs.; · Portable. Single cross-platform binary on different CPUs, GPUs and OSes.; · Secure. Sandboxed and isolated execution on untrusted devices.; · Modern languages for inference apps.; · Container-ready. Supported in Docker, containerd, Podman, and Kubernetes.; · OpenAI compatible. Seamlessly integrate into the OpenAI tooling ecosystem.

Learn more | Give it a try

LLM Agent: Flows.network is a serverless platform for building complex data flow applications. Examples include; · SaaS workflow automation apps; · Streaming data analytics; · Real-time AI processing; · Quantitative and automated trading apps; · R&D process automation; · DevRel and community management

Cloud-native microservices: We work with cloud providers, especially edge cloud / CDN compute providers, to support microservices for web apps. Use cases include AI inference, database access, CRM, e-commerce, workflow management, and server-side rendering.

Data analytics: We work with streaming frameworks and databases to support embedded serverless functions for data filtering and analytics. The serverless functions could be database UDFs. They could also be embedded in data ingest or query result streams.

Try it out

> wasmedge --dir .:. --nn-preload default:GGML:AUTO:model_name.gguf llama-chat.wasm

Run Llama 2 inference on your own device

Zero python dependency! Take full advantage of the GPUs. Write once, run anywhere. Get started with Llama 2 series of models on your own device in 5 minutes.
Rust

Build a RAG-based LLM agent

Retrieval-argumented fmgeneration (RAG) is a very popular approach to build AI agents with external knowledge bases. Create your own in flows.network.
Rust | Example: Learn Rust