Getting Started with CodeGemma-7b-it

Apr 12, 2024 • 4 minutes to read

CodeGemma-7b-it is a small yet powerful “coding assistant” model in the Gemma family. It is designed for the following tasks.

  • Code Completion: Imagine you're writing code and get stuck. CodeGemma 7B can analyze the existing code and suggest likely completions, saving you time and effort.
  • Code Generation: Need a whole new block of code for a specific function? CodeGemma 7B can analyze the surrounding code and generate code snippets based on that context.

Here's what makes CodeGemma 7B stand out:

  • Powerhouse for Code: Trained on a massive dataset of code, it understands the intricacies of programming languages.
  • Pretrained for Efficiency: Comes pre-trained, so you can use it directly without additional training on your specific codebase.
  • Finetuned for conversations: The model is able to carry conversations with users. That allows it to explain code, make further changes, or debug interactively.

  • How to run CodeGemma-7b-it-GGUF on your own device
  • How to create an OpenAI-compatible API service for CodeGemma-7b-it

We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run CodeGemma-7b-it-GGUF on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf | bash -s -- --plugin wasi_nn-ggml

Step 2: Download the CodeGemma-7b-it-GGUFmodel GGUF file. Since the size of the model is 6.14 GB so it could take a while to download.

curl -LO

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:codegemma-7b-it-Q5_K_M.gguf llama-chat.wasm -p gemma-instruct -c 3072

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.

write a hello world progam in Python

Let's think step by steps.
1) **Import Statements:** Import the `sys` module for system-related operations and define a function called "hello_world".

2).**Function Definition**: Define an empty list of strings to store user input in case it is needed later on (not used here but included as part 0f code structure): ```python def helloWorld(): ...```.
3) **Print Statement:** Use the `print()` function from `<iostream>` with a string argument "Hello, World!".

4).**Call Function**: Call or invoke your defined functions in this case:  `hello_world()`.

Here is an example of how to write hello world program using Python.

def main():    # Define the function called 'main'   print(" Hello ,World!")     return (0)

if __name__ == "__builtin__":      sys._execute_( sys .argv[1] )

hello_world() # Call or invoke your defined functions in this case

Create an OpenAI-compatible API service for CodeGemma-7b-it

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.

curl -LO

Then, download the chatbot web UI to interact with the model with a chatbot UI.

curl -LO
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

Next, use the following command lines to start an API server for the model. Then, open your browser to http://localhost:8080 to start the chat!

wasmedge --dir .:. --nn-preload default:GGML:AUTO:codegemma-7b-it-Q5_K_M.gguf \
  llama-api-server.wasm \
  --prompt-template gemma-instruct \
  --ctx-size 3072 \
  --model-name codegemma-7b

From another terminal, you can interact with the API server using curl.

url -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user", "content": "write a hello world in Rust"}], "model":"codegemma-7b"}'

That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try!

Talk to us!

Join the WasmEdge discord to ask questions and share insights.

Any questions getting this model running? Please go to second-state/LlamaEdge to raise an issue or book a demo with us to enjoy your own LLMs across devices!

LLMAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications