Articles and tutorials

Save $900/Day with WasmEdge: Live Demos on Self-Hosted AI & GenAI Stacks at KubeCon China 2025

The cloud-native and open-source community is buzzing with anticipation for the upcoming KubeCon + CloudNativeCon China 2025, scheduled to take place in Hong Kong from June 10-11, 2025. This once-in-a-year premier event promises to be a remarkable gathering of open-source luminaries, industry leaders, and developers, offering unparalleled opportunities for face-to-face interactions and insights into the future of open source and cloud-native computing. A standout presence at this year's conference will undoubtedly be Second State/ WasmEdge team.…
LLM Osmosis-Structur AI inference Rust WebAssembly
Effortless JSON Generation with Osmosis‑Structure‑0.6B

Osmosis recently open-sourced a specialized small language model called Osmosis-Structure-0.6B. It is optimized for generating structured output. Structured output—such as JSON—is essential for use cases like agents and coding. With just 0.6 billion parameters, this lightweight model is ideal for self-hosting on your own device. Why does this matter? Interestingly, prompting a regular LLM to directly produce structured output like JSON often reduces its performance on complex tasks. A better approach is to let the main LLM generate responses in natural language first, then pass that output to Osmosis-Structure-0.…
LLM Osmosis-Structure Edge AI AI inference Rust WebAssembly structured data JSON generation
LFX Mentorship 2025: Supercharge Your Summer with WasmEdge Projects

Applications are now open for the LFX Mentorship Program – Summer 2025 Term (June to August)! If you're passionate about WebAssembly, AI agents, LLMs, or edge computing, this is your chance to learn from experienced mentors, contribute to CNCF-hosted projects like WasmEdge, and earn a stipend while doing so. 📅 Application Period: May 15 – May 27, 2025 🔗 Apply on the LFX Mentorship Portal Why Should You Join? WasmEdge is a lightweight and fast WebAssembly runtime optimized for cloud-native, edge, and AI workloads.…
LLM Gemma AI inference Rust WebAssembly
Getting Started with Qwen3

Get ready for Qwen3! This is Alibaba's latest and most advanced large language model series! These models range in scale from 0.6 billion to 235 billion parameters, and are designed to excel in a wide range of tasks. Qwen3 is the world's first open-source hybrid reasoning model – integrating both ‘reasoning’ and ‘non-reasoning’ modes within the same model, allowing it to choose between ‘fast thinking’ and ‘slow thinking’ like humans, depending on the question.…
LLM AI inference Rust WebAssembly Alibaba Qwen
Getting Started with Llama 4

Meta AI has once again pushed the boundaries of open-source large language models with the unveiling of Llama 4. This latest iteration builds upon the successes of its predecessors, introducing a new era of natively multimodal AI innovation. Llama 4 arrives with a suite of models, with Llama 4 Scout and Llama 4 Maverick firstly launched and 2 more coming, each engineered for leading intelligence and unparalleled efficiency. This series boasts native multimodality, mixture-of-experts architectures, and remarkably long context windows of 10 million tokens, promising significant leaps in performance and broader accessibility for developers and enterprises alike.…
LLM AI inference Rust WebAssembly DeepSeek
Open Source Adventure: Apply to Google Summer of Code 2025 with WasmEdge!

Have you ever dreamed of contributing to real-world tech projects, collaborating with seasoned developers, and getting paid to write code that matters—all while building your resume? Google Summer of Code (GSoC) 2025 is your golden ticket, and WasmEdge wants YOU to join the journey! What’s Google Summer of Code? Google Summer of Code (GSoC) is a global, online program that pays you to work on open source projects during your summer break.…
LLM AI inference Rust WebAssembly DeepSeek
Getting Started with Gemma 3

Gemma-3 is a lightweight, efficient language model developed by Google, part of the Gemma family of models optimized for instruction-following tasks. Designed for resource-constrained environments, Gemma-3 retains strong performance in reasoning and instruction-based applications while maintaining computational efficiency. Its compact size makes it ideal for edge deployment and scenarios requiring rapid inference. This model achieves competitive results across benchmarks, particularly excelling in tasks requiring logical reasoning and structured responses. We have quantized Gemma-3 in GGUF format for broader compatibility with edge AI stacks.…
LLM AI inference Rust WebAssembly DeepSeek
Getting Started with QwQ-32B

Qwen/QwQ-32B is the latest version of the Qwen seriesl. It is the medium-sized reasoning model, designed to excel at complex tasks with deep thinking and advanced problem-solving abilities. Unlike traditional instruction-tuned models, QwQ harnesses both extensive pretraining and a reinforcement learning stage during post-training to deliver significantly enhanced performance, especially on challenging problems with 32.5 billion total parameters. In this article, we will cover how to run and interact with QwQ-32B-GGUF on your own edge device.…
LLM AI inference Rust WebAssembly DeepSeek
Getting Started with DeepSeek-R1-Distill-Qwen-1.5B

DeepSeek-R1-Distill-Qwen is a series of distilled large language models derived from Qwen 2.5, utilizing outputs from the larger DeepSeek-R1 model. These models are designed to be more efficient and compact while retaining strong performance, especially in reasoning tasks. The distillation process allows them to inherit the knowledge and capabilities of the larger model, making them suitable for resource-constrained environments and easier deployment. These distilled models have shown impressive results across various benchmarks, often outperforming other models of similar size.…
LLM AI inference Rust WebAssembly DeepSeek
Getting Started with Mistral Small

Mistral Small 3 is a groundbreaking 24-billion parameter model designed to deliver high-performance AI with low latency. Released under the Apache 2.0 license, it stands out in the AI landscape for its ability to compete with much larger models like Llama 3.3 70B and Qwen 2.5 32B, while being more than three times faster on the same hardware. The model is particularly tailored for agentic tasks — those requiring robust language understanding, tool use, and instruction-following capabilities.…
LLM AI inference Rust WebAssembly Mistral

1
2
3
4
5