Rust

WasmEdge at OSSummit Korea and KubeCon NA 2025: AI at the Edge and Rust Firmware for Voice Agents

WasmEdge, the lightweight, high-performance WebAssembly runtime under the Cloud Native Computing Foundation (CNCF), just wrapped up a whirlwind tour that took us from the vibrant Open Source Summit Korea in Seoul to the massive gathering at KubeCon + CloudNativeCon North America 2025 in Atlanta. Across both continents, one theme was undeniable: the future of AI is open, portable, and running at the edge. Here is a look. WasmEdge at OSSummit Korea 2025: Real-Time AI Voice Agent at the Edge with Rust Miley Fu, WasmEdge founding member and CNCF Ambassador from Second State, took stage at Open Source Summit Korea on November 4-5, presenting “Orchestrating Real-Time Multimodal AI Agents with Rust”…
LLM AI inference Rust WebAssembly Hugging Face
Getting started with OpenAI’s gpt-oss

OpenAI just got a lot more open. OpenAI announced two state-of-the-art open-weight language models: gpt-oss-120b and gpt-oss-20b. Both models provide full chain-of-thought (CoT) and support Structured outputs, tool use, and function calling. According to OpenAI, The gpt-oss-120b model matches the core reasoning performance of OpenAI’s o4-mini while running efficiently on a single 80 GB GPU. Meanwhile, the gpt-oss-20b model delivers results comparable to OpenAI’s o3-mini on standard benchmarks and can run on edge devices with just 16 GB of memory—making it well-suited for on-device applications, local inference, and fast iteration without the need for expensive infrastructure.…
LLM AI inference Rust WebAssembly Hugging Face
Getting Started with SmolLM3‑3B‑GGUF for Long‑Context Multilingual Reasoning

SmolLM3 is a compact 3 billion‑parameter transformer that delivers state‑of‑the‑art performance at the 3B–4B scale, supporting six major languages and extended contexts up to 128 000 tokens. This powerful yet compact model offers capabilities comparable to 4B models, making it lightweight and suitable for edge devices. It excels in long-context reasoning, able to handle up to 128,000 tokens from documents, transcripts, or logs. Furthermore, its multilingual instruction-tuning for English, French, Spanish, German, Italian, and Portuguese makes it ideal for global applications.…
LLM AI inference Rust WebAssembly Hugging Face
Gemma-3n-E2B-it for on‑device LLM applications

Gemma 3n was built hand‑in‑glove with some of the biggest mobile‑chip makers out there. It shares the same clever architecture that’ll power the next‑gen Gemini Nano—so you get rock‑solid, on‑device smarts without ever pinging the cloud. Gemma‑3n‑E2B‑it is Google DeepMind’s newest edge‑first transformer model: a 4.46 B‑parameter MatFormer that behaves like a 2 B model in RAM, runs wholly offline on as little as 2 GB VRAM thanks to Per‑Layer Embeddings (PLE), and still delivers 32 000‑token context and multimodal I/O: ext + image + audio + video.…
LLM Gemma AI inference Rust WebAssembly multilingual long-context
Save $900/Day with WasmEdge: Live Demos on Self-Hosted AI & GenAI Stacks at KubeCon China 2025

The cloud-native and open-source community is buzzing with anticipation for the upcoming KubeCon + CloudNativeCon China 2025, scheduled to take place in Hong Kong from June 10-11, 2025. This once-in-a-year premier event promises to be a remarkable gathering of open-source luminaries, industry leaders, and developers, offering unparalleled opportunities for face-to-face interactions and insights into the future of open source and cloud-native computing. A standout presence at this year's conference will undoubtedly be Second State/ WasmEdge team.…
LLM Osmosis-Structur AI inference Rust WebAssembly
Effortless JSON Generation with Osmosis‑Structure‑0.6B

Osmosis recently open-sourced a specialized small language model called Osmosis-Structure-0.6B. It is optimized for generating structured output. Structured output—such as JSON—is essential for use cases like agents and coding. With just 0.6 billion parameters, this lightweight model is ideal for self-hosting on your own device. Why does this matter? Interestingly, prompting a regular LLM to directly produce structured output like JSON often reduces its performance on complex tasks. A better approach is to let the main LLM generate responses in natural language first, then pass that output to Osmosis-Structure-0.…
LLM Osmosis-Structure Edge AI AI inference Rust WebAssembly structured data JSON generation
LFX Mentorship 2025: Supercharge Your Summer with WasmEdge Projects

Applications are now open for the LFX Mentorship Program – Summer 2025 Term (June to August)! If you're passionate about WebAssembly, AI agents, LLMs, or edge computing, this is your chance to learn from experienced mentors, contribute to CNCF-hosted projects like WasmEdge, and earn a stipend while doing so. 📅 Application Period: May 15 – May 27, 2025 🔗 Apply on the LFX Mentorship Portal Why Should You Join? WasmEdge is a lightweight and fast WebAssembly runtime optimized for cloud-native, edge, and AI workloads.…
LLM Gemma AI inference Rust WebAssembly
Getting Started with Qwen3

Get ready for Qwen3! This is Alibaba's latest and most advanced large language model series! These models range in scale from 0.6 billion to 235 billion parameters, and are designed to excel in a wide range of tasks. Qwen3 is the world's first open-source hybrid reasoning model – integrating both ‘reasoning’ and ‘non-reasoning’ modes within the same model, allowing it to choose between ‘fast thinking’ and ‘slow thinking’ like humans, depending on the question.…
LLM AI inference Rust WebAssembly Alibaba Qwen
Getting Started with Llama 4

Meta AI has once again pushed the boundaries of open-source large language models with the unveiling of Llama 4. This latest iteration builds upon the successes of its predecessors, introducing a new era of natively multimodal AI innovation. Llama 4 arrives with a suite of models, with Llama 4 Scout and Llama 4 Maverick firstly launched and 2 more coming, each engineered for leading intelligence and unparalleled efficiency. This series boasts native multimodality, mixture-of-experts architectures, and remarkably long context windows of 10 million tokens, promising significant leaps in performance and broader accessibility for developers and enterprises alike.…
LLM AI inference Rust WebAssembly DeepSeek
Open Source Adventure: Apply to Google Summer of Code 2025 with WasmEdge!

Have you ever dreamed of contributing to real-world tech projects, collaborating with seasoned developers, and getting paid to write code that matters—all while building your resume? Google Summer of Code (GSoC) 2025 is your golden ticket, and WasmEdge wants YOU to join the journey! What’s Google Summer of Code? Google Summer of Code (GSoC) is a global, online program that pays you to work on open source projects during your summer break.…
LLM AI inference Rust WebAssembly DeepSeek

1
2
3
4
5