AI as a Service: Face Detection Using MTCNN

• 5 minutes to read

The MTCNN is a class of Multi-task Cascaded Convolutional Network models. They are very good at detection faces and facial features. You can train (or retrain) MTCNN models with your own faces dataset so that it can accurately detect faces for your application.

Second State FaaS provides a native command API to run Tensorflow-based MTCNN models. In this article, we will use the original MTCNN model trained in the FaceNet dataset as an example. We will create a FaaS application that uses this popular model to detect faces in an input image, and returns an image with all faces outlined in boxes.

The source code for the face detection example in this article is available on Github.

Prerequisites

Follow these simple instructions to install Rust and ssvmup.

The inference function

The infer() function is written in Rust and compiled to WebAssembly. It does the heavy lifting of data preparation and model preparation. Both tasks are highly dependent on the function’s actual use case scenario. It then calls a native command to execute the tensorflow model, analyzes the model’s return values, and then creates a new image with box outlays from the model return values.

The native command is safe and portable here because it is reviewed and approved by the FaaS operator before it is made available as part of the FaaS API. It is not user submitted code. All user submitted code is in WebAssembly and runs in the SSVM sandbox.

Below is an annotated and abridged version of the function source code. The comments explain the 7 steps the function performs. In steps #2 and #3, we loaded a generic MobileNet model trained on the ImageNet dataset. You can load your own retrained (or fine-tuned) MobileNet model file and its corresponding classification labels file.

#[wasm_bindgen]
pub fn infer(image_data: &[u8]) -> Vec<u8> {
    // 1. Load the input image
    let mut img = image::load_from_memory(image_data).unwrap();

    // 2. Load the frozen saved tensorflow model into a byte array. The model is trained to detect faces in the input image.
    let model_data: &[u8] = include_bytes!("mtcnn.pb");
    // 3. Load parameters for the model. E.g., the min_size is the smallest face the model will detect
    let model_params: &str = "{\"min_size\":[40],\"thresholds\":[0.6,0.7,0.7],\"factor\":[0.709]}";    

    // 4. Execute the tensorflow model via a command
    // ... see next section ...
    // The model return value is in out.stdout
    
    // 5. The command’s return value is an array of floating numbers.
    let stdout_json: Value = from_str(str::from_utf8(&out.stdout).expect("[]")).unwrap();
    let stdout_vec = stdout_json.as_array().unwrap();

    // 6. Create arrays of boxes for detected faces
    let mut box_vec: Vec<[f32; 4]> = Vec::new();
    // ... ...
    
    // 7. Create and return a new image with the boxes overlay on the detected faces
    let mut buf = Vec::new();
    img.write_to(&mut buf, image::ImageOutputFormat::Png).expect("Unable to write");
    return buf;
}

Next, let’s look into how the native command API (step #4) is called to execute the tensorflow model.

The mtcnn command

The command is provided as an API in the Second State FaaS. Its sole purpose is to take a MTCNN tensorflow model, and then to execute it against an image as fast as possible. You can review its source code here. The code segment to call the mtcnn command is as follows.

pub fn infer(image_data: &[u8]) -> String {
    ... ...
    let model_params: &str = "{\"min_size\":[40],\"thresholds\":[0.6,0.7,0.7],\"factor\":[0.709]}";    
    ... ...
    // Execute the tensorflow model via a command
    let mut cmd = Command::new("mtcnn");
    cmd.arg(model_data.len().to_string()) // model data length
        .arg("input") // Input tensor name
        .arg("box") // Output tensor name
        .arg(model_params) // Parameter tensor names and values
        .arg(img.width().to_string()) // Image width
        .arg(img.height().to_string()); // Image height
    cmd.stdin_u8vec(model_data);
    for (_x, _y, rgb) in img.pixels() {
        cmd.stdin_u8(rgb[2] as u8)
            .stdin_u8(rgb[1] as u8)
            .stdin_u8(rgb[0] as u8);
    }
    let out = cmd.output(); // Call command.
    // The model return value is in out.stdout
    ...
}

The command takes six arguments via the chained arg() function calls.

  • The first argument is the size of the model file. It is measured in bytes.
  • The second argument is the tensor name for the input image. This name is dependent on the model. You can find it in the model’s documentation.
  • The third argument is the tensor name for the output (e.g., the detected faces coordinates). This name is dependent on the model. You can find it in the model’s documentation.
  • The fourth argument is a JSON dictionary for the tensor names and values of model parameters.
  • The fifth and sixth arguments are the width and height of the input image.

The model data and input image data are passed to the command via the stdin_u8() function calls. The model data is passed first, and then followed by the image data. The cmd.output() function call executes tensorflow model, and encapsulates the return tensor value (i.e., the detected faces coordinates) in out.

  • The out.stdout is the byte array of the return tensor value.
  • The out.stderr is the byte array of any error message the command emits.

The infer() function processes the returned faces coordinates, and draws a new image with boxes around the detected faces.

Detect faces in an image

First, build the function via the ssvmup tool.

$ ssvmup build

Upload the WebAssembly file to the FaaS and get a wasm_id to call later.

$ curl --location --request POST 'https://rpc.ssvm.secondstate.io:8081/api/executables' \
--header 'Content-Type: application/octet-stream' \
--header 'SSVM-Description: MTCNN' \
--data-binary '@pkg/mtcnn_service_lib_bg.wasm'
{"wasm_id":147,"wasm_sha256":"0x469c28daae7aba392076b4bc5ee3b43ec6d667083d8ae63207bf74b1da03fc26","SSVM_Usage_Key":"00000000-0000-0000-0000-000000000000","SSVM_Admin_Key":"7dxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx0c41"}

Now you can call the function to detect faces in an image. Here is an example image the attendees of the 1927 Solvay Conference. Of the 29 scientists in this photo, 17 of them received Nobel Prizes in their careers.

AI as a service

We pass the image to the FaaS function via an HTTP request, and it returns the classification label and confidence in the HTTP response.

$ curl --location --request POST 'https://rpc.ssvm.secondstate.io:8081/api/run/147/infer' \
--header 'Content-Type: application/octet-stream' \
--data-binary '@test/solvay.jpg' \
--output tmp.png

The result image with green face boxes are shown below.

AI as a service

Web UI

On a static web page, you can use JavaScript to make an AJAX call to this FaaS function. The AJAX function posts an uploaded image file. The AJAX response is binary data for a new image with detected faces (bytes). The JavaScript displays the response image on the page.

Source code | Live demo

  $.ajax({
      url: "https://rpc.ssvm.secondstate.io:8081/api/run/147/infer/bytes",
      type: "post",
      data : $('#input')[0].files[0],
      contentType: "application/octet-stream",
      processData: false,
      xhrFields:{
        responseType: 'blob'
      },
      success: function (data) {
        const img_url = URL.createObjectURL(data);
        $('#wm_img').prop('src', img_url);
      }
  });

What’s next

Now it is your turn to create functions that use your own MTCNN models or to use a different output tensor of the standard model (e.g., to get facial features instead of just the boxes).

RustJavaScriptWebAssemblyNode.jsFaaSRust FaaSServerlesscloud computingAI
Fast, safe, portable and serverless Rust functions as services