AI as a Service: Image Classification Using MobileNet

• 5 minutes to read

MobileNet is a class of Convolutional Neural Networks (CNN) models for computer vision applications. The most common application for MobileNet models is image classification. You can train (or retrain) a MobileNet model to recognize objects that are interesting to your application (e.g., to classify birds in a bird watching application).

Second State FaaS provides a native command API to run Tensorflow-based MobileNet models. In this article, we will use a MobileNet model trained from the ImageNet dataset as an example. We will create a FaaS application that uses this popular model to classify common objects in an input image.

The source code for the image classification example in this article is available on Github.

Prerequisites

Follow these simple instructions to install Rust and ssvmup.

The inference function

The infer() function is written in Rust and compiled to WebAssembly. It does the heavy lifting of data preparation and model preparation. Both tasks are highly dependent on the function’s actual use case scenario. It then calls a native command to execute the tensorflow model, and analyzes and prepares the model’s return values for the function caller.

The native command is safe and portable here because it is reviewed and approved by the FaaS operator before it is made available as part of the FaaS API. It is not user submitted code. All user submitted code is in WebAssembly and runs in the SSVM sandbox.

Below is an annotated and abridged version of the function source code. The comments explain the 7 steps the function performs. In steps #2 and #3, we loaded a generic MobileNet model trained on the ImageNet dataset. You can load your own retrained (or fine-tuned) MobileNet model file and its corresponding classification labels file.

#[wasm_bindgen]
pub fn infer(image_data: &[u8]) -> String {
    // 1. Load the input image and resize it to 224x224, which is the dimension required by this particular MobileNet model.
    let img = image::load_from_memory(image_data).unwrap().to_rgb();
    let resized = image::imageops::resize(&img, 224, 224, ::image::imageops::FilterType::Triangle);

    // 2. Load the frozen saved tensorflow model into a byte array. The model is trained to recognize objects in the input image.
    let model_data: &[u8] = include_bytes!("mobilenet_v2_1.4_224_frozen.pb");
    // 3. Load the classification labels file that corresponds to the model. 
    //    The model output is a series of numbers. The labels file maps those numbers (i.e., line numbers) to actual text description of the object classification.
    let labels = include_str!("imagenet_slim_labels.txt");

    // 4. Execute the tensorflow model via a command
    // ... see next section ...
    // The model return value is in out.stdout
    
    // 5. The command’s return value is an array of floating numbers.
    let stdout_json: Value = from_str(str::from_utf8(&out.stdout).expect("[]")).unwrap();
    let stdout_vec = stdout_json.as_array().unwrap();

    // 6. Each of number in the array corresponds to the probability of the image containing an object defined in the labels file.    
    // 6.1 Find the highest probability ...
    // 6.2 Translate probability into text ...
    // 6.3 Look up the corresponding label text ...
    
    // 7. The text label and probability is returned to the FaaS caller.
    let ret: (String, String) = (label_lines.next().unwrap().to_string(), confidence.to_string());
    return serde_json::to_string(&ret).unwrap();
}

Next, let’s look into how the native command API (step #4) is called to execute the tensorflow model.

The mobilenet_v2 command

The command is provided as an API in the Second State FaaS. Its sole purpose is to take a MobileNet-like tensorflow model, and then to execute it against an image as fast as possible. You can review its source code here. The code segment to call the mobilenet_v2 command is as follows.

pub fn infer(image_data: &[u8]) -> String {
    ... ...
    // Execute the tensorflow model via a command
    let mut cmd = Command::new("mobilenet_v2");
    cmd.arg(model_data.len().to_string())
        .arg("input")
        .arg("MobilenetV2/Predictions/Softmax")
        .arg("224")
        .arg("224");
    cmd.stdin_u8vec(model_data);
    for rgb in resized.pixels() {
        cmd.stdin_u8(rgb[0] as u8)
            .stdin_u8(rgb[1] as u8)
            .stdin_u8(rgb[2] as u8);
    }
    
    let out = cmd.output(); // Call command.
    if out.status != 0 {
      println!("{}", str::from_utf8(&out.stderr).unwrap());
      return out.status.to_string();
    }
    // The model return value is in out.stdout
    ...
}

The command takes five arguments via the chained arg() function calls.

  • The first argument is the size of the model file. It is measured in bytes.
  • The second argument is the tensor name for the input image. This name is dependent on the model. You can find it in the model’s documentation.
  • The third argument is the tensor name for the output probability array. This name is dependent on the model. You can find it in the model’s documentation.
  • The fourth and fifth arguments are the width and height of the input image. For most MobileNet models, it is 224x224.

The model data and input image data are passed to the command via the stdin_u8() function calls. The model data is passed first, and then followed by the image data. The cmd.output() function call executes tensorflow model, and encapsulates the return tensor value (i.e., the probability array) in out.

  • The out.stdout is the byte array of the return tensor value.
  • The out.stderr is the byte array of any error message the command emits.

The infer() function processes the returned probability array, and converts the results to text labels and probability levels.

Classify an image

First, build the function via the ssvmup tool.

$ ssvmup build

Upload the WebAssembly file to the FaaS and get a wasm_id to call later.

$ curl --location --request POST 'https://rpc.ssvm.secondstate.io:8081/api/executables' \
--header 'Content-Type: application/octet-stream' \
--header 'SSVM-Description: mobilenet' \
--data-binary '@pkg/mobilenet_service_lib_bg.wasm'
{"wasm_id":146,"wasm_sha256":"0x469c28daae7aba392076b4bc5ee3b43ec6d667083d8ae63207bf74b1da03fc26","SSVM_Usage_Key":"00000000-0000-0000-0000-000000000000","SSVM_Admin_Key":"7dxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx0c41"}

Now you can call the function to classify an image. Here is an example image showing computer science pioneer Dr. Grace Hopper.

We pass the image to the FaaS function via an HTTP request, and it returns the classification label and confidence in the HTTP response.

$ curl --location --request POST 'https://rpc.ssvm.secondstate.io:8081/api/run/146/infer' \
--header 'Content-Type: application/octet-stream' \
--data-binary '@test/grace_hopper.jpg'
["military uniform","medium"]

Web UI

On a static web page, you can use JavaScript to make an AJAX call to this FaaS function. The AJAX function posts an uploaded image file. The AJAX response is the classification label and confidence the MobileNet tensorflow inferred from the image.

Source code | Live demo

  $.ajax({
      url: "https://rpc.ssvm.secondstate.io:8081/api/run/146/infer",
      type: "post",
      data : $('#input')[0].files[0],
      contentType: "application/octet-stream",
      processData: false,
      success: function (data) {
        $('#result').html(data);
      }
  });

What’s next

Now it is your turn to create functions that use your own MobileNet models to provide AI vision services on the web.

RustJavaScriptWebAssemblyNode.jsFaaSRust FaaSServerlesscloud computing
Fast, safe, portable and serverless Rust functions as services