Will Machines See with Movidius?
Intel is planning to “give the power of sight to machines” through the Movidius Neural Compute Stick. But does this USB stick have what it takes to truly bring the power of visual machine learning to any device?
Machine learning via a USB port
Image recognition is a resource-intensive process involving neural networks. Training and operating the neural network is usually performed involving GPU accelerators. For the power- and space-restricted environment, usage of FPGA or specialized chips like Intel’s Nervana (instead of GPU) could be more efficient. Intel proposes the Movidius Neural Compute Stick—based on the Myriad 2 family of vision processing units (VPUs)—as a transitional solution for the low power embedded systems, as well as systems without a GPU. It can be plugged into any device with a USB port to roll out machine learning features.
The Movidius stick has multiple areas of application, including: drone development, robotics, security, and virtual / augmented reality.
The chip itself contains 12 VLIW "SHAVE" processors (Streaming Hybrid Architecture Vector Engine) and has a USB interface for communication with the host. It is obvious that Movidius performance—in terms of images processed per second—depends on the number of SHAVE processors involved.
The best SHAVE for your interface
Our research demonstrates, that 12 SHAVE performance is 2-4x better compared to a single SHAVE. On the other hand, a single SHAVE uses 1.6-2.3x less power compared to the full 12 SHAVE load. Four, eight or 12 SHAVE for image processing provide similar results and are the most cost effective.
For measuring chip performance, we used Caffe implementation of AlexNet, GoogLeNet and SqueezeNet. Networks were compiled to use one, two, four, eight, or 12 Movidius SHAVE processors loaded into the stick. Test images were loaded on to the stick for measuring image processing and loading time. Twenty test images were used for this experiment with a thousand cycles run for data collection.
Processed images per second vs number of SHAVE processors used
Images per dollar vs number of SHAVE processors
Comparing the calculated performance per watt with the obtained results from FPGA/GPU comparison, we can see that Movidius has considerably greater performance capacity for low power consumption devices (sensors, drones, cameras).