Check out our repo for resources on hacking the Coral EdgeTPU dev board!

google coral dev board

Earlier this year, Google released their Coral edgeTPU dev board! We were excited to get our hands on one and explore an embedded platform specifically designed for deep learning applications.

The Setup

Getting the dev board up and running wasn’t too complicated. It was somewhat different than setting up a pi with a raspbian image on a microSD card. However, we appreciated that the Google coral team provided different methods for setting it up.

The edgeTPU dev board requires a linux distribution, specifically Mendel Linux, to be flashed directly onto the board. With a microUSB port, you can communicate with the board via serial from any linux/OSX host machine and uses a USB-C port to transfer data and do the actual flashing. Their getting started instructions are simple to follow. Briefly, here are some findings while setting up the board:

  • Make sure you use a USB-C cable that can can also be used to sync when you connect it to the data (OTG) port on the device. This seems like a no brainer, but can be easy to forget and end up stuck not knowing why nothing is showing up on your screen.
  • Some report issues using a regular computer running linux or OSX and some recommend using a raspberry pi to aid in the flashing instead.
  • Flashing mendel onto the dev board got stuck trying to reboot correctly, so we found that the instructions they provide for reflashing your device when you’ve bricked it are really reliable! Also somewhat resembles flashing a raspbian distro onto an sd card for a pi.

The edgeTPU comes with a special API for running inference on the device. If you have played with TFLite, this is very similar as the API will use TFLite files for inference. Check out all the methods supported here!

The Google Coral team also provides an EdgeTPU compiler which lets you optimize a quantized tflite model for running inference on the dev board. For now, it only supports certain models for compilation. They also have an online version of the compiler if you don’t have a 64bit debian based linux computer. Their docs also provide excellent explanations on how the compiler works and all the methods available.

Testing Inference Speed

For one of our projects, YogAI, we used a raspberry pi 3 to run pose estimation as a feature extractor to feed to a simple classifier for pose classification. We found that on a non quantized, tflite model based on Convolutional Pose Machines got us around 2.5 fps. This was just enough to classify more stable poses like Yoga and some simple motion like squats vs deadlifts vs standing. But for better resolution on movement and overall perfomance, it would be great to see if another platform could speed up inference significantly. We used this model as a reference for comparison with aquantized tflite model compiled using the edgeTPU compiler.

Ildoonet’s tf-pose-estimation repo provides several pose estimation frozen graph models that we can quantize and convert to tflite. We chose a mobilenet based model since these are smaller and should perform well on small devices. To convert the frozen graph pb file found here, use the tensorflow 1.13.1 tflite methods from contrib like so:

frozen_graphdef_path = 'mobilenet_v2_small/graph_opt.pb'
input_shapes = {'image':[1, 432, 368, 3]}

converter = tf.contrib.lite.TFLiteConverter.from_frozen_graph(frozen_graphdef_path, ['image'],
['Openpose/concat_stage7'], input_shapes)
converter.inference_type = tf.contrib.lite.constants.QUANTIZED_UINT8
converter.quantized_input_stats = {'image':(0.,1.)}
converter.allow_custom_ops = True
converter.default_ranges_stats = (0,255)

tflite_model = converter.convert()
open("pose_estimation.tflite", "wb").write(tflite_model)

This will produce a quantized tflite version of the model so we can then run it through the edgeTPU compiler. It is very simple to compile the file for the edgeTPU dev board. After installing or using the online version of the edgeTPU compiler, simply run:

$ edgetpu_compiler /path/to/pose_estimation.tflite

From here, we’ll want to transfer the file over to our dev board! You can transfer data to the device by physically connecting to it via serial and the OTG port, but it is much nicer to be able to use ssh and scp to communicate with the device. There’s this nifty command-line tool that the Google Coral team has provided, mdt, that lets you open a shell, scp files around, and more. Follow the docs to install it on your host machine, connect to your dev board using the OTG port and run:

$ mdt shell

From your host machine, scp the compiled tflite model and a sample image to your dev board:

$ mdt push /path/to/compiled_openpose_edgetpu.tflite
$ mdt push /path/to/example.jpg

To run inference using the tflite compiled model, run this snippet:

from edgetpu.basic.basic_engine import BasicEngine
import numpy as np
from PIL import Image
import time
import argparse

if __name__ == "__main__":
	default_model = 'compiled_openpose_edgetpu.tflite'
	default_image = "example.jpg"
	parser = argparse.ArgumentParser()
	parser.add_argument('--model', help='.tflite model path',
						default=default_model)
	parser.add_argument('--image', help='image file path',
						default=default_image)

	args = parser.parse_args()

	'''load the image'''
	target_size=(432, 368)
	image = Image.open(args.image)
	image = image.resize(target_size, Image.ANTIALIAS)
	image = np.array(image).flatten()
	'''load the model'''
	engine = BasicEngine(args.model)
	results = engine.RunInference(image)
	print(engine.required_input_array_size())
	print(results)
	print('processing time is', results[0])
	heat_map = results[1].reshape([54,46,57])
	print('heatmap shape is',heat_map.shape)
	print(heat_map)

	print()
	print()

	print(np.sum(heat_map), (heat_map[1])

	np.save('/home/mendel/heat_map.npy', heat_map)

When we run it on an example image, we’ve found that inference takes ~13 milliseconds, which is reaching ~77 fps! It’s an incredible speed up from simply running a tflite model on a pi. The board conveniently comes with an HDMI and USB port, so we can attach a screen and usb camera.

All in all, the edgeTPU dev board shows great promise for fast inference on the edge and can be used for all kinds of robotics applications. However, this requires installing various libraries that aren’t officially supported yet, since Mendel OS is not a mainstream linux distro. Installing tools like opencv, ROS, and proprietary camera SDKs took some tinkering to get right, so we created a repo with a few guides on how to install common libraries.

Finding similarities

We started by finding which linux distros were most similar to Mendel OS.

One way to do this is by looking at the libc and linux kernel:

# Looking at libc:
$ /lib/aarch64-linux-gnu/libc.so.6
GNU C Library (Debian GLIBC 2.24-11+deb9u4) stable release version 2.24, by Roland McGrath et al.
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 6.3.0 20170516.
Available extensions:
	crypt add-on version 2.1 by Michael Glad and others
	GNU Libidn by Simon Josefsson
	Native POSIX Threads Library by Ulrich Drepper et al
	BIND-8.2.3-T5B
libc ABIs: UNIQUE
For bug reporting instructions, please see:
.
# Looking at kernel
$ uname -a
Linux undefined-calf 4.9.51-imx #1 SMP PREEMPT Tue May 14 20:34:37 UTC 2019 aarch64 GNU/Linux

It is apparent that Mendel OS is a Debian based distro and looking at the kernel, it is similar to Ubuntu 16.04 as this was its default kernel version. The architecture is aarch64 which is similar to ARM64. So whenever there are available packages for Ubuntu 16.04 for ARM64 architecture, we can probably install these on Mendel OS too. Looking at the Debian version gives us:

# Looking at Debian version
$ cat /etc/debian_version
9.7

This version translates to Debian stretch, which means that packages for raspbian stretch could also be compatible for Mendel OS.

The Google Coral Dev board makes running inference on device a lot faster, at times faster than running on a full server. Although it depends on a specialized flavor of Debian maintained by the Coral team, it is familiar enough to work with.